VMware Real-Time Issues & Solutions Handbook
- sathyahraj

- Oct 21
- 4 min read
Updated: Oct 27

1. vCenter & ESXi Issues
SI No | Issue | Root Cause | Fix |
1 | ESXi host not entering Maintenance Mode | NSX-T VIFs still attached to VMkernel ports | esxcli network vm list → identify pinned VMs → detach VIFs using nsxcli or API |
2 | vCenter login fails post-upgrade | vmdir or STS token expiration | Restart vmdird and STS service → service-control --restart vmware-stsd vmware-vmdird |
3 | Host disconnected in vCenter | Management network unreachable / hostd service hung | Restart management agents: /etc/init.d/hostd restart and /etc/init.d/vpxa restart |
4 | VM cannot migrate (DRS failure) | Host DRS affinity or pinned CPU | Remove affinity rule → `Get-DrsRule |
5 | vCenter Appliance health degraded | Disk space full (vmdk or /storage/log) | SSH → df -h → cleanup /storage/log, /var/log/audit |
6 | Failed vMotion | Network mismatch or MTU mismatch | Verify vmk ping with jumbo frames: vmkping -d -s 8972 <target IP> |
7 | ESXi upgrade fails via Lifecycle Manager | Component or driver conflict | Review /var/log/esxupdate.log → remove conflicting VIB via esxcli software vib remove -n <vib_name> |
8 | Host shows purple screen (PSOD) | Hardware driver issue / memory corruption | Capture screenshot, validate HCL driver versions |
9 | SSL thumbprint mismatch | Host re-added or certificate mismatch | Renew certificate from vCenter → vpxd_servicecfg certificate refresh |
10 | ESXi not visible in SDDC Manager | vCenter inventory not synced | Refresh inventory in SDDC Manager → vcf api /v1/sddcs/syncInventory |
2. NSX-T Troubleshooting
SI No | Issue | Root Cause | Fix |
1 | Transport Node fails to enter Maintenance Mode | VIFs attached to logical ports | Identify via get logical-ports → detach VIFs manually |
2 | NSX Edge deployment stuck | DNS or IP conflict | Validate ping and /var/log/nsxapi.log |
3 | Tier-0 uplink not reachable | Incorrect VLAN/MTU mismatch | Check VLAN trunk config & get interface ethX |
4 | NSX Manager shows “Cluster Degraded” | Node sync failure / Cassandra partition | get cluster status → restart management plane: restart service manager |
5 | NSX Upgrade fails at 47% | Service dependency mismatch | Reboot Edge → re-run upgrade via API /api/v1/upgrade/retry |
6 | Firewall rules not applying | DFW section policy not published | Force-publish via POST /policy/api/v1/infra/ |
7 | Overlay TEP IP conflict | DHCP overlap / duplicate TEP pool | Check IP pool assignment in Manager GUI |
8 | NSX CLI inaccessible | SSH disabled | Enable via NSX GUI → System → SSH Settings |
9 | Logical segment missing in vCenter | Transport Zone mismatch | Attach correct TZ via POST /policy/api/v1/infra/segments |
10 | Edge cluster showing partial sync | One Edge node stuck in config update | Reboot Edge → verify via get edge-cluster status |
3. vSAN Issues
SI No | Issue | Root Cause | Fix |
1 | Disk group degraded | SSD failed or unclaimed | vdq -q → replace disk, recreate disk group |
2 | vSAN resync never completes | I/O congestion or large rebuild | Check vsan.resync_dashboard and increase Object Repair Timer |
3 | Capacity imbalance | Object placement skewed | Run vsan.rebalance via RVC |
4 | Object inaccessible | Host failure / witness unresponsive | Verify component state → cmmds-tool find |
5 | Cluster health shows “Metadata Health Failed” | Stale UUIDs | Run cmmds-tool cleanup |
6 | vSAN upgrade failed | Disk format mismatch | Run esxcli vsan storage upgrade manually |
7 | Cluster partitioned | MTU or multicast config | Validate jumbo frames and ping ++size 8972 ++netstack=vsan |
8 | vSAN object repair queue full | Excessive failures | Pause rebuild temporarily: vsan.resync_throttle |
9 | vSAN encryption key lost | KMS unreachable | Re-establish KMS trust relationship |
10 | VM deployment fails on vSAN | Thin provisioning limits reached | Check storage policy compliance |
4. SDDC Manager & Lifecycle (VCF)
SI No | Issue | Root Cause | Fix |
1 | Upgrade stuck at “Applying Solution” | ESXi MM issue | Place host manually in MM and retry |
2 | VCF inventory out of sync | API error or vCenter reconnect issue | /opt/vmware/vcf/lcm/lcm-cli refresh inventory |
3 | VCF password rotation fails | Expired credentials in locker | Update manually via vcfcli |
4 | NSX bundle download failure | Proxy or DNS issue | Configure proxy in /opt/vmware/vcf/lcm/lcm.conf |
5 | Domain deletion fails | Dependent components still attached | Detach workload domains first via REST API |
6 | LCM bundle validation error | Version mismatch | Clear cache: /opt/vmware/vcf/lcm/lcm-cli clear-cache |
7 | vCenter registration fails | FQDN mismatch | Edit /etc/hosts on SDDC Manager |
8 | API job stuck in “IN_PROGRESS” | Failed DB sync | Restart LCM service → systemctl restart lcm |
9 | VCF backup fails | Insufficient disk space | Cleanup /var/log/vmware/vcf/ |
10 | Lifecycle rollback incomplete | Missing rollback snapshot | Restore via previous backup or recreate domain |
5. Automation & Scripting
SI No | Issue | Root Cause | Fix |
1 | PowerCLI connection fails | Invalid SSL trust | Set-PowerCLIConfiguration -InvalidCertificateAction Ignore |
2 | PowerCLI script timeout | Session idle | Use -SessionTimeout in Connect-VIServer |
3 | Terraform NSX provider fails | Token expired | Refresh with terraform refresh |
4 | Ansible playbook error 401 | API auth failure | Update token or use service principal |
5 | PowerCLI “object not found” | VM renamed in inventory | Query by MoRef ID instead |
6 | Terraform plan mismatch | State drift | terraform state pull + terraform refresh |
7 | NSX-T API task stuck | Transaction lock | Cancel task via DELETE /api/v1/task/<id> |
8 | SDDC automation rollback failed | Missing JSON schema | Validate payload with /api/schema |
9 | PowerCLI module missing | PS module path invalid | Install-Module VMware.PowerCLI |
10 | Ansible NSX collection missing | Missing ansible-galaxy role | ansible-galaxy collection install vmware.nsxt |
6. Backup & Appliance Exclusions
SI No | Issue | Root Cause | Fix |
1 | Cohesity skips NSX/VC appliances | Appliance exclusion policy | Add exceptions in Cohesity job settings |
2 | Backup fails due to snapshot | Appliance locked by system | Exclude via tags or VM folder |
3 | Backup size abnormal | Log growth | Truncate logs: /storage/log cleanup |
4 | API timeout in backup | High concurrency | Increase backup timeout window |
5 | Appliance in “quiesced state” | VMware Tools snapshot timeout | Disable quiescing in backup policy |
Appendix
🔍 Log Locations
ESXi: /var/log/vmkernel.log, /var/log/hostd.log
vCenter: /var/log/vmware/vpxd/
NSX Manager: /var/log/nsxapi.log, /var/log/controller.log
vSAN: /var/log/vsanhealth.log
VCF: /var/log/vmware/vcf/lcm/lcm.log
📘 Useful Commands
esxcli network ip interface list
vsan.health.cluster.get
get managers, get cluster status (NSX CLI)
Get-VM | Get-NetworkAdapter | Select VM, NetworkName (PowerCLI)





Comments