top of page

Highlights:

Search

VMware Real-Time Issues & Solutions Handbook

sathyahraj
Oct 21
4 min read

Updated: Oct 27

ree

1. vCenter & ESXi Issues

SI No	Issue	Root Cause	Fix
1	ESXi host not entering Maintenance Mode	NSX-T VIFs still attached to VMkernel ports	esxcli network vm list → identify pinned VMs → detach VIFs using nsxcli or API
2	vCenter login fails post-upgrade	vmdir or STS token expiration	Restart vmdird and STS service → service-control --restart vmware-stsd vmware-vmdird
3	Host disconnected in vCenter	Management network unreachable / hostd service hung	Restart management agents: /etc/init.d/hostd restart and /etc/init.d/vpxa restart
4	VM cannot migrate (DRS failure)	Host DRS affinity or pinned CPU	Remove affinity rule → `Get-DrsRule
5	vCenter Appliance health degraded	Disk space full (vmdk or /storage/log)	SSH → df -h → cleanup /storage/log, /var/log/audit
6	Failed vMotion	Network mismatch or MTU mismatch	Verify vmk ping with jumbo frames: vmkping -d -s 8972 <target IP>
7	ESXi upgrade fails via Lifecycle Manager	Component or driver conflict	Review /var/log/esxupdate.log → remove conflicting VIB via esxcli software vib remove -n <vib_name>
8	Host shows purple screen (PSOD)	Hardware driver issue / memory corruption	Capture screenshot, validate HCL driver versions
9	SSL thumbprint mismatch	Host re-added or certificate mismatch	Renew certificate from vCenter → vpxd_servicecfg certificate refresh
10	ESXi not visible in SDDC Manager	vCenter inventory not synced	Refresh inventory in SDDC Manager → vcf api /v1/sddcs/syncInventory

2. NSX-T Troubleshooting

SI No	Issue	Root Cause	Fix
1	Transport Node fails to enter Maintenance Mode	VIFs attached to logical ports	Identify via get logical-ports → detach VIFs manually
2	NSX Edge deployment stuck	DNS or IP conflict	Validate ping and /var/log/nsxapi.log
3	Tier-0 uplink not reachable	Incorrect VLAN/MTU mismatch	Check VLAN trunk config & get interface ethX
4	NSX Manager shows “Cluster Degraded”	Node sync failure / Cassandra partition	get cluster status → restart management plane: restart service manager
5	NSX Upgrade fails at 47%	Service dependency mismatch	Reboot Edge → re-run upgrade via API /api/v1/upgrade/retry
6	Firewall rules not applying	DFW section policy not published	Force-publish via POST /policy/api/v1/infra/
7	Overlay TEP IP conflict	DHCP overlap / duplicate TEP pool	Check IP pool assignment in Manager GUI
8	NSX CLI inaccessible	SSH disabled	Enable via NSX GUI → System → SSH Settings
9	Logical segment missing in vCenter	Transport Zone mismatch	Attach correct TZ via POST /policy/api/v1/infra/segments
10	Edge cluster showing partial sync	One Edge node stuck in config update	Reboot Edge → verify via get edge-cluster status

3. vSAN Issues

SI No	Issue	Root Cause	Fix
1	Disk group degraded	SSD failed or unclaimed	vdq -q → replace disk, recreate disk group
2	vSAN resync never completes	I/O congestion or large rebuild	Check vsan.resync_dashboard and increase Object Repair Timer
3	Capacity imbalance	Object placement skewed	Run vsan.rebalance via RVC
4	Object inaccessible	Host failure / witness unresponsive	Verify component state → cmmds-tool find
5	Cluster health shows “Metadata Health Failed”	Stale UUIDs	Run cmmds-tool cleanup
6	vSAN upgrade failed	Disk format mismatch	Run esxcli vsan storage upgrade manually
7	Cluster partitioned	MTU or multicast config	Validate jumbo frames and ping ++size 8972 ++netstack=vsan
8	vSAN object repair queue full	Excessive failures	Pause rebuild temporarily: vsan.resync_throttle
9	vSAN encryption key lost	KMS unreachable	Re-establish KMS trust relationship
10	VM deployment fails on vSAN	Thin provisioning limits reached	Check storage policy compliance

4. SDDC Manager & Lifecycle (VCF)

SI No	Issue	Root Cause	Fix
1	Upgrade stuck at “Applying Solution”	ESXi MM issue	Place host manually in MM and retry
2	VCF inventory out of sync	API error or vCenter reconnect issue	/opt/vmware/vcf/lcm/lcm-cli refresh inventory
3	VCF password rotation fails	Expired credentials in locker	Update manually via vcfcli
4	NSX bundle download failure	Proxy or DNS issue	Configure proxy in /opt/vmware/vcf/lcm/lcm.conf
5	Domain deletion fails	Dependent components still attached	Detach workload domains first via REST API
6	LCM bundle validation error	Version mismatch	Clear cache: /opt/vmware/vcf/lcm/lcm-cli clear-cache
7	vCenter registration fails	FQDN mismatch	Edit /etc/hosts on SDDC Manager
8	API job stuck in “IN_PROGRESS”	Failed DB sync	Restart LCM service → systemctl restart lcm
9	VCF backup fails	Insufficient disk space	Cleanup /var/log/vmware/vcf/
10	Lifecycle rollback incomplete	Missing rollback snapshot	Restore via previous backup or recreate domain

5. Automation & Scripting

SI No	Issue	Root Cause	Fix
1	PowerCLI connection fails	Invalid SSL trust	Set-PowerCLIConfiguration -InvalidCertificateAction Ignore
2	PowerCLI script timeout	Session idle	Use -SessionTimeout in Connect-VIServer
3	Terraform NSX provider fails	Token expired	Refresh with terraform refresh
4	Ansible playbook error 401	API auth failure	Update token or use service principal
5	PowerCLI “object not found”	VM renamed in inventory	Query by MoRef ID instead
6	Terraform plan mismatch	State drift	terraform state pull + terraform refresh
7	NSX-T API task stuck	Transaction lock	Cancel task via DELETE /api/v1/task/<id>
8	SDDC automation rollback failed	Missing JSON schema	Validate payload with /api/schema
9	PowerCLI module missing	PS module path invalid	Install-Module VMware.PowerCLI
10	Ansible NSX collection missing	Missing ansible-galaxy role	ansible-galaxy collection install vmware.nsxt

6. Backup & Appliance Exclusions

SI No	Issue	Root Cause	Fix
1	Cohesity skips NSX/VC appliances	Appliance exclusion policy	Add exceptions in Cohesity job settings
2	Backup fails due to snapshot	Appliance locked by system	Exclude via tags or VM folder
3	Backup size abnormal	Log growth	Truncate logs: /storage/log cleanup
4	API timeout in backup	High concurrency	Increase backup timeout window
5	Appliance in “quiesced state”	VMware Tools snapshot timeout	Disable quiescing in backup policy

Appendix

🔍 Log Locations
- ESXi: /var/log/vmkernel.log, /var/log/hostd.log
- vCenter: /var/log/vmware/vpxd/
- NSX Manager: /var/log/nsxapi.log, /var/log/controller.log
- vSAN: /var/log/vsanhealth.log
- VCF: /var/log/vmware/vcf/lcm/lcm.log
📘 Useful Commands
- esxcli network ip interface list
- vsan.health.cluster.get
- get managers, get cluster status (NSX CLI)
- Get-VM | Get-NetworkAdapter | Select VM, NetworkName (PowerCLI)

Recent Posts

Automation using Power Cli

<# PowerCLI - vSphere Full Monitoring Automation File: PowerCLI - vSphere Full Monitoring Automation.ps1 Purpose: Complete, production-ready PowerCLI automation script collection for comprehensive vSp

VMware Real Time Scenario Interview Q & A

Part III Scenario 61: VM Network Adapter Type Mismatch Leading to Throughput & Latency Issues In a virtualised environment, several Windows and Linux VMs were upgraded from older hardware generations.

Site Recovery Manager

Site Recovery Manager

bottom of page