top of page

VMware Real-Time Issues & Solutions Handbook

Updated: Oct 27

ree


1. vCenter & ESXi Issues

SI No

Issue

Root Cause

Fix

1

ESXi host not entering Maintenance Mode

NSX-T VIFs still attached to VMkernel ports

esxcli network vm list → identify pinned VMs → detach VIFs using nsxcli or API

2

vCenter login fails post-upgrade

vmdir or STS token expiration

Restart vmdird and STS service → service-control --restart vmware-stsd vmware-vmdird

3

Host disconnected in vCenter

Management network unreachable / hostd service hung

Restart management agents: /etc/init.d/hostd restart and /etc/init.d/vpxa restart

4

VM cannot migrate (DRS failure)

Host DRS affinity or pinned CPU

Remove affinity rule → `Get-DrsRule

5

vCenter Appliance health degraded

Disk space full (vmdk or /storage/log)

SSH → df -h → cleanup /storage/log, /var/log/audit

6

Failed vMotion

Network mismatch or MTU mismatch

Verify vmk ping with jumbo frames: vmkping -d -s 8972 <target IP>

7

ESXi upgrade fails via Lifecycle Manager

Component or driver conflict

Review /var/log/esxupdate.log → remove conflicting VIB via esxcli software vib remove -n <vib_name>

8

Host shows purple screen (PSOD)

Hardware driver issue / memory corruption

Capture screenshot, validate HCL driver versions

9

SSL thumbprint mismatch

Host re-added or certificate mismatch

Renew certificate from vCenter → vpxd_servicecfg certificate refresh

10

ESXi not visible in SDDC Manager

vCenter inventory not synced

Refresh inventory in SDDC Manager → vcf api /v1/sddcs/syncInventory

2. NSX-T Troubleshooting

SI No

Issue

Root Cause

Fix

1

Transport Node fails to enter Maintenance Mode

VIFs attached to logical ports

Identify via get logical-ports → detach VIFs manually

2

NSX Edge deployment stuck

DNS or IP conflict

Validate ping and /var/log/nsxapi.log

3

Tier-0 uplink not reachable

Incorrect VLAN/MTU mismatch

Check VLAN trunk config & get interface ethX

4

NSX Manager shows “Cluster Degraded”

Node sync failure / Cassandra partition

get cluster status → restart management plane: restart service manager

5

NSX Upgrade fails at 47%

Service dependency mismatch

Reboot Edge → re-run upgrade via API /api/v1/upgrade/retry

6

Firewall rules not applying

DFW section policy not published

Force-publish via POST /policy/api/v1/infra/

7

Overlay TEP IP conflict

DHCP overlap / duplicate TEP pool

Check IP pool assignment in Manager GUI

8

NSX CLI inaccessible

SSH disabled

Enable via NSX GUI → System → SSH Settings

9

Logical segment missing in vCenter

Transport Zone mismatch

Attach correct TZ via POST /policy/api/v1/infra/segments

10

Edge cluster showing partial sync

One Edge node stuck in config update

Reboot Edge → verify via get edge-cluster status

3. vSAN Issues

SI No

Issue

Root Cause

Fix

1

Disk group degraded

SSD failed or unclaimed

vdq -q → replace disk, recreate disk group

2

vSAN resync never completes

I/O congestion or large rebuild

Check vsan.resync_dashboard and increase Object Repair Timer

3

Capacity imbalance

Object placement skewed

Run vsan.rebalance via RVC

4

Object inaccessible

Host failure / witness unresponsive

Verify component state → cmmds-tool find

5

Cluster health shows “Metadata Health Failed”

Stale UUIDs

Run cmmds-tool cleanup

6

vSAN upgrade failed

Disk format mismatch

Run esxcli vsan storage upgrade manually

7

Cluster partitioned

MTU or multicast config

Validate jumbo frames and ping ++size 8972 ++netstack=vsan

8

vSAN object repair queue full

Excessive failures

Pause rebuild temporarily: vsan.resync_throttle

9

vSAN encryption key lost

KMS unreachable

Re-establish KMS trust relationship

10

VM deployment fails on vSAN

Thin provisioning limits reached

Check storage policy compliance

4. SDDC Manager & Lifecycle (VCF)

SI No

Issue

Root Cause

Fix

1

Upgrade stuck at “Applying Solution”

ESXi MM issue

Place host manually in MM and retry

2

VCF inventory out of sync

API error or vCenter reconnect issue

/opt/vmware/vcf/lcm/lcm-cli refresh inventory

3

VCF password rotation fails

Expired credentials in locker

Update manually via vcfcli

4

NSX bundle download failure

Proxy or DNS issue

Configure proxy in /opt/vmware/vcf/lcm/lcm.conf

5

Domain deletion fails

Dependent components still attached

Detach workload domains first via REST API

6

LCM bundle validation error

Version mismatch

Clear cache: /opt/vmware/vcf/lcm/lcm-cli clear-cache

7

vCenter registration fails

FQDN mismatch

Edit /etc/hosts on SDDC Manager

8

API job stuck in “IN_PROGRESS”

Failed DB sync

Restart LCM service → systemctl restart lcm

9

VCF backup fails

Insufficient disk space

Cleanup /var/log/vmware/vcf/

10

Lifecycle rollback incomplete

Missing rollback snapshot

Restore via previous backup or recreate domain

5. Automation & Scripting

SI No

Issue

Root Cause

Fix

1

PowerCLI connection fails

Invalid SSL trust

Set-PowerCLIConfiguration -InvalidCertificateAction Ignore

2

PowerCLI script timeout

Session idle

Use -SessionTimeout in Connect-VIServer

3

Terraform NSX provider fails

Token expired

Refresh with terraform refresh

4

Ansible playbook error 401

API auth failure

Update token or use service principal

5

PowerCLI “object not found”

VM renamed in inventory

Query by MoRef ID instead

6

Terraform plan mismatch

State drift

terraform state pull + terraform refresh

7

NSX-T API task stuck

Transaction lock

Cancel task via DELETE /api/v1/task/<id>

8

SDDC automation rollback failed

Missing JSON schema

Validate payload with /api/schema

9

PowerCLI module missing

PS module path invalid

Install-Module VMware.PowerCLI

10

Ansible NSX collection missing

Missing ansible-galaxy role

ansible-galaxy collection install vmware.nsxt

6. Backup & Appliance Exclusions

SI No

Issue

Root Cause

Fix

1

Cohesity skips NSX/VC appliances

Appliance exclusion policy

Add exceptions in Cohesity job settings

2

Backup fails due to snapshot

Appliance locked by system

Exclude via tags or VM folder

3

Backup size abnormal

Log growth

Truncate logs: /storage/log cleanup

4

API timeout in backup

High concurrency

Increase backup timeout window

5

Appliance in “quiesced state”

VMware Tools snapshot timeout

Disable quiescing in backup policy

Appendix

  • 🔍 Log Locations

    • ESXi: /var/log/vmkernel.log, /var/log/hostd.log

    • vCenter: /var/log/vmware/vpxd/

    • NSX Manager: /var/log/nsxapi.log, /var/log/controller.log

    • vSAN: /var/log/vsanhealth.log

    • VCF: /var/log/vmware/vcf/lcm/lcm.log

  • 📘 Useful Commands

    • esxcli network ip interface list

    • vsan.health.cluster.get

    • get managers, get cluster status (NSX CLI)

    • Get-VM | Get-NetworkAdapter | Select VM, NetworkName (PowerCLI)

 

 
 
 

Recent Posts

See All
Automation using Power Cli

<# PowerCLI - vSphere Full Monitoring Automation File: PowerCLI - vSphere Full Monitoring Automation.ps1 Purpose: Complete, production-ready PowerCLI automation script collection for comprehensive vSp

 
 
 
VMware Real Time Scenario Interview Q & A

Part III Scenario 61: VM Network Adapter Type Mismatch Leading to Throughput & Latency Issues In a virtualised environment, several Windows and Linux VMs were upgraded from older hardware generations.

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page