VMware Level 3 Interview Q&A
- sathyahraj

- Oct 21
- 6 min read

1. vSphere / ESXi / vCenter
Q1. What happens internally during a vMotion?
Answer:- vCenter coordinates source and destination hosts.- Memory pre-copy begins while the VM continues running.- Dirty pages are tracked and copied in iterative passes.- VM is stunned briefly for final sync, then resumed on the destination host.- Inventory metadata and networking details are updated.
Q2. What is EVC and how does it work?
EVC (Enhanced vMotion Compatibility) masks CPU features to expose a consistent baseline instruction set across cluster hosts by intercepting CPUID calls.
Command:
vim-cmd hostsvc/evc/info
Q3. Why does an ESXi host go into ‘Not Responding’ state?
Root Causes:- Hostd or vpxa daemon crash- Network failure on management vmk- vCenter trust certificate expired
Fix:
/etc/init.d/hostd restart/etc/init.d/vpxa restart/etc/init.d/netlogond restart
Q4. Explain vCenter HA architecture.- Active, Passive, and Witness nodes.- Passive node syncs database and config.- Failover <30 seconds.
Q5. How do you troubleshoot vCenter login failure after upgrade?- Verify STS token validity: /var/log/vmware/sso/- Restart identity management:
service-control --restart vmware-stsd vmware-vmdird
Q6. What happens when a vCenter database is full?- Services fail to start (vpxd, inventory, or performance metrics).Resolution: Extend /storage/db partition and vacuum Postgres DB.
Q7. DRS vs vMotion difference?
- vMotion = one-time migration.- DRS = automated decision engine that uses vMotion for balancing.
2. NSX-T Data Center
Q8. NSX Manager vs Control Plane?- Manager:
Configuration, REST API, and UI layer.- Control Plane: Routing, BGP, topology calculation.
Q9. Troubleshoot NSX-T Edge tunnel failure.
get tunnel-interfacesget logical-routersping <TEP-IP> size 8972 df-bit enable
Root cause often MTU mismatch or VLAN trunking error.
Q10. NSX-T upgrade stuck at 47%
.- Collect logs from /var/log/nsxapi.log and /var/log/upgrade-coordinator.log.- Restart service:
systemctl restart upgrade-coordinator
Q11. Explain T0-T1 communication.- Tier-1 DR performs local routing within hypervisor kernel.- Traffic requiring NAT or North-South flows to Tier-0 SR.
Q12. Logical port missing from one host.
nsxcli -c get logical-portsnsxcli -c get interface
Root cause: Transport node profile not applied.
Q13. Edge node sync degraded.
get edge-cluster status
Reboot affected Edge node.
Q14. DFW (Distributed Firewall) rules not applying.
Force-publish via API:
POST /policy/api/v1/infra/
Q15. NSX Manager cluster degraded.
get cluster status
Resolution: Restart management service and ensure Cassandra partition healed.
3. vSAN
Q16. How is data replicated in vSAN?
Each object is split into components (based on FTT). Example: FTT=1 => 2 data + 1 witness.
Q17. Object inaccessible errors?
Check via:
cmmds-tool find -t DOM_OBJECTesxcli vsan debug object list
Cause: disk group failure or host isolation.
Q18. vSAN resync slow.
Monitor: vsan.resync_dashboardThrottle rebuilds:
vsan.resync_throttle --iops-limit 1000
Q19. Cluster partitioned?
esxcli vsan cluster getesxcli vsan cluster unicastagent list
MTU mismatch is common.
Q20. Disk group degraded?
vdq -q
Replace failed SSD or disk and recreate disk group.
4. VMware Cloud Foundation (VCF)
Q21. LCM (Lifecycle Manager) role?
Responsible for orchestrating upgrades of NSX, ESXi, vCenter, and vSAN.API: /v1/lcm/upgrade-bundles
Q22. VCF upgrade failure during ESXi patch?-
Host in MM failure (VIF attached).- Detach via NSX CLI and retry.
Q23. Password rotation failure.
Expired credentials in locker.Update manually using:
vcfcli password update --component nsx
Q24. Inventory out of sync.
/opt/vmware/vcf/lcm/lcm-cli refresh inventory
Q25. Domain deletion stuck.
Detach workloads first:
DELETE /v1/domains/<id>
5. Automation (PowerCLI / Terraform / Ansible)
Q26. PowerCLI connection fails (SSL).
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore
Q27. Terraform state drift issue.
terraform refreshterraform plan
Q28. Ansible NSX authentication fails.
Refresh token or update credentials in ansible.cfg.
Q29. PowerCLI to find VMs with old snapshots:
Get-VM | Get-Snapshot | Where-Object {$_.Created -lt (Get-Date).AddDays(-7)}
Q30. NSX-T API task stuck.Cancel via:
DELETE /api/v1/task/<id>
6. Scenario-Based Questions
Q31. VM not reachable after migration.-
Check VLAN trunking and dvSwitch config.- Validate vmkping.
Q32. vSAN stretched cluster resync not completing.
Use Observer:
vsan.observer --run-webserver --force
Q33. NSX Edge rebooted and lost routing.
Reapply T0 configuration using policy API.
Q34. vCenter services failing intermittently.
Review /var/log/vmware/vmon/vmon-svc.log.
Q35. VM performance issue on vSAN.Check congestion via:
vsan.vm_perf_stats
7. Log Locations and Commands
Key Commands
· esxcli network ip interface list
· nsxcli -c get managers
· vsan.health.cluster.get
· Get-VMHost | Get-VM | Measure-Object
Section 1: vSphere / ESXi / vCenter
1. What happens internally when you vMotion a VM?Answer:
· vCenter coordinates source and destination hosts.
· Memory pages are copied using iterative pre-copy.
· Dirty pages are tracked via shadow page tables.
· VM is briefly stunned, final pages are copied, then resumed on destination.
· vCenter updates inventory and DRS/HA metadata.
2. Explain how vCenter HA works.Answer:
· Consists of three nodes: Active, Passive, and Witness.
· Passive node maintains a synchronous DB replication.
· Witness ensures quorum (prevents split-brain).
· vCenter HA heartbeat traffic uses a dedicated network.
· Failover usually occurs within ~10–30 seconds.
3. Why would an ESXi host go into “Not Responding” state?Root Causes:
· Hostd or vpxa daemon crash
· Management network failure
· vCenter trust or certificate issue
· CPU/Memory exhaustion causing watchdog triggerFix:SSH → restart management agents:
/etc/init.d/hostd restart/etc/init.d/vpxa restart/etc/init.d/netlogond restart4. What’s the difference between DRS and vMotion?Answer:
· vMotion = Live migration of a single VM.
· DRS = Intelligent cluster-level placement (uses vMotion).
· DRS evaluates CPU/memory utilization every 5 minutes.
· Uses cost-benefit analysis before triggering migration.
5. How does EVC mode work internally?
Answer:EVC masks CPU feature sets at hypervisor level by intercepting CPUID instructions.It ensures all hosts in a cluster expose a common baseline CPU instruction set.
Section 2: NSX-T Data Center
6. Difference between NSX Manager and NSX Controller in NSX-T?
Answer:
· Manager Plane: Policy/config management, UI, API.
· Control Plane: Routing, MAC learning, topology computation.
· NSX-T integrates both in a cluster (unlike NSX-V which had separate controllers).
7. What happens when an NSX-T Edge node loses connectivity to TEP network?
Answer:
· Overlay tunnels (GENEVE) fail.
· East-West traffic drops; North-South may continue if VLAN-based uplink.
· Verify via:
· get tunnel-interfaces· get logical-routers· Root cause: MTU mismatch, VLAN config, or TEP pool conflict.
8. How do you troubleshoot NSX-T upgrade failure at 47%?
Answer:
· Usually stuck during Edge service reconfiguration.
· Collect logs: /var/log/nsxapi.log, /var/log/upgrade-coordinator.log
· Restart the upgrade service:
· systemctl restart upgrade-coordinator· Retry via API: POST /api/v1/upgrade/retry.
9. Explain how NSX-T handles routing between Tier-0 and Tier-1 gateways.
Answer:
· Tier-1 → Tier-0 communication uses SR (Service Router) and DR (Distributed Router).
· DR performs local routing in kernel module (hypervisor).
· SR handles centralized services like NAT, DHCP, LB, etc.
· Traffic uses an internal Geneve tunnel between DR and SR.
10. How to check logical switch connectivity at ESXi level?
nsxcli -c get logical-portsnsxcli -c get interfaceesxcli network vswitch dvs vmware vxlan listSection 3: vSAN
11. Explain vSAN object components.
Answer:Each vSAN object = multiple components (based on failures to tolerate).
· Example: FTT=1 → 2 replicas + witness = 3 components.
· Stored across disk groups for redundancy.
12. Why does “vSAN object inaccessible” occur?
Root Causes:
· Host partition
· Disk group failure
· Stale CMMDS entryFix:
cmmds-tool find -t DOM_OBJECTesxcli vsan debug object list13. How to handle vSAN resync taking long time?
Answer:
· Check resync queue: vsan.resync_dashboard
· Verify congestion levels
· Optionally throttle resync IOPS via:
· vsan.resync_throttle --iops-limit 100014. How to troubleshoot vSAN cluster partition?
Answer:
· Check cluster UUID and member view:
· esxcli vsan cluster get· esxcli vsan cluster unicastagent list· MTU mismatch or physical network issue often root cause.
Section 4: VMware Cloud Foundation (SDDC Manager)
15. What is the role of the LCM (Lifecycle Manager) service in VCF?
Answer:LCM handles version orchestration of vCenter, NSX, ESXi, and vSAN.
· It validates bundles, dependencies, and applies updates domain by domain.
· API path: /v1/lcm/upgrade-bundles.
16. Why does VCF upgrade fail during “Apply Solution” step?
Answer:ESXi host cannot enter MM → VIF or VM pinned.Check:
nsxcli -c get logical-ports | grep <hostname>Manually place host into maintenance mode and resume upgrade.
Section 5: Automation (PowerCLI, Terraform, Ansible)
17. PowerCLI script to find orphaned VMs:
Get-VM | Where-Object {$_.Folder -eq $null} | Select Name18. How to force Terraform to recreate an NSX object?
terraform taint nsxt_logical_switch.edge_switchterraform apply19. Ansible NSX playbook fails with 401 error — cause?
Answer:Expired API token or role missing in NSX Manager.Update ansible.cfg and re-authenticate via ansible-galaxy collection install vmware.nsxt.
20. PowerCLI to list all VM snapshots older than 7 days:
Get-VM | Get-Snapshot | Where-Object {$_.Created -lt (Get-Date).AddDays(-7)} | Select VM, Name, CreatedSection 6: Scenario-based Questions
21. During VCF upgrade, one NSX Edge fails to come up post-reboot. Steps?
1. Check console via DCUI or vSphere Client
2. Verify /var/log/nsxapi.log
3. If boot corruption → redeploy Edge using API /api/v1/edge-nodes.
22. VM is not pingable after migrating to another host.Answer:
· Likely missing VLAN trunk on uplink.
· Check dvPortgroup VLAN tag.
· Validate physical switch port config.
23. How to restore vCenter when the appliance is corrupted?
Answer:
· Use VCSA file-based backup.
· Restore via installer → “Restore from Backup.”
· Alternatively, restore DB from /storage/db/vpostgres.
24. NSX-T logical segment missing on one host only.
Answer:Transport node profile not applied.Reapply via API:
POST /api/v1/transport-node-collections/<id>/apply-profile25. vSAN stretched cluster resync not completing.
Answer:Check site latency & witness connectivity:vsan.observer --run-webserver for analysis.





Comments