Troubleshooting an ESXi host using esxtop

THIS POST IS NOT COMPLETED YET

The esxtop utility is probably the most useful utility to troubleshoot a high load on an ESXi host using a CLI. There are eight views:

  • c (default): CPU, sorted by CPU USED by default.
  • d: disk adapter
  • i: interrupt
  • m: memory, sorted by MEMSZ by default.
  • n: network
  • p: power mgmt
  • u: disk device
  • v: disk VM

Other commands are available:

  • f: add or remove field (see table below);
  • s: set the default delay and refresh time (default is 5 seconds);
  • #: set the number of processes to show;
  • k: kill a World (use the LWID);
  • e: show processes inside of a World (use GID);
  • V: shows VMs only;
  • L: change the size of the NAME column (0 to reset;
  • l: shows a single World only (use GID, 0 to reset);
  • h: show help.

From a very useful post:

Esxtop uses worlds and groups as the entities to show CPU usage. A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds. [...] Let's use a VM as an example. A powered-on VM has a corresponding group, which contains multiple worlds. [...] There are other groups besides VM groups. Let's go through a few examples: - The "idle" group is the container for the idle worlds, each of which corresponds to one PCPU. - The "system" group contains the VMKernel system worlds. - The "helper" group contains the helper worlds that assist VMKernel operations.

Obtaining GID or LWID

GID is the Group ID and is needed to filter out the esxtop output. Each running VM has a group of Worlds (processes): GID is the ID of the group. Each VM uses:

  • one World dedicated to the vmx;
  • one World dedicated to vmast;
  • two Worlds dedicated to vmx-vthread;
  • one World dedicated to vmx-mks (Mouse/Keyboard/Screen);
  • one World dedicated to vmx-svga (Video card);
  • one World for each vmx-vcpu (vCPU) configured.

LWID is the Leader World Id, called also World Group Id or VMX Cartel ID. It’s used on the “ps”, “vscsiStats”, “kill” and other commands.

# ps | egrep "WID|dc1"
WID  CID  World Name            Command
35073      vmm0:dc1
35101 35065 vmx-vthread-4:dc1    /bin/vmx
35108 35065 vmx-vthread-5:dc1    /bin/vmx
35112 35065 vmx-mks:dc1          /bin/vmx
35143 35065 vmx-svga:dc1         /bin/vmx
35158 35065 vmx-vcpu-0:dc1       /bin/vmx

35065 is the LWID, not the GID.

# ps -cC | egrep "WID|dc1" | grep -v egrep
WID  CID  World Name            Command
35065 35065 vmx                  /bin/vmx -s sched.group=host/user -# product=2;name=VMware ESX;version=5.5.0;buildnumber=1623387;licensename=VMware ESX Server;licenseversion=5.0; -@ duplex=3;msgs=ui /vmfs/volumes/539eb65f-8e1b457a-bfae-001b78b813fc/dc1/dc1.vmx

The above command shows the full command line instead of single Worlds (processes). The following command show another way to identify the LWID given the VM name:

# vmdumper -l | grep dc1
wid=35073       pid=-1  cfgFile="/vmfs/volumes/539eb65f-8e1b457a-bfae-001b78b813fc/dc1/dc1.vmx" uuid="56 4d 71 d7 ed bf c4 e3-17 c6 2a 60 d1 48 62 02"  displayName="dc1"       vmxCartelID=35065

LWID is also known as vmxCartelID. The GID can be obtained from a LWID (and vice versa) using sched-stats:

# /usr/bin/sched-stats -t groups | egrep "pgid|35065"
vmgid name                pgid pname              size vsmps       usedsec   amin   amax minLimit    units ashares  resvMHz availMHz   bmin   bmax bshares   emin demand       DERatio            vtime          vtlimit           vtaged
 4352 vm.35065               4 user                  9     7     18457.574      0     -1       -1      mhz    1000        0     7574      0  41792    4238   1060     53             5 5376584720397314                0 5318787234770882
 4358 vm-vmx.35065        4352 vm.35065              0     0         0.000      0     -1        0      pct       0        0        0      0      0       0      0      0    4294967295 5376583438769766                0 5376583438769766
 4366 vm-vmm.35065        4352 vm.35065              0     0         0.000      0     -1        0      pct       0        0        0      0      0       0      0      0    4294967295 5376583438769766                0 5376583438769766

GID or vmgid is the first value: 4352.

CPU (c)

By default calling esxtop the CPU view is showed and data are sorted by %USED:

 2:02:40pm up 790 days 21:33, 402 worlds; CPU load average: 0.31, 0.29, 0.28
PCPU USED(%): 9.5 2.3 3.2 3.6 4.0  34 5.6 2.8  26  21 3.7  75  30  18  23  38 AVG:  18
PCPU UTIL(%):  11 6.1 5.2 8.1 5.6  33 4.8 5.6  33  18 6.9 100  34  19  25  39 AVG:  22
CORE UTIL(%):  15     9.6      36     8.1      47     100      50      55     AVG:  40

      ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD  %SWPWT
 9751870  9751870 vSphere Data Pr    10   46.94   47.78    0.11  934.59    0.00    6.86  341.91    0.15    0.14    0.00    0.00
    4352     4352 dc1                 7   15.15   15.19    0.03  676.17    0.15    1.54   82.53    0.07    0.00    0.00    0.00
   35658     5456 vmx                 1    0.20    0.06    0.14   98.72       -    0.15    0.00    0.00    0.00    0.00    0.00
   35664     5456 vmast.35663         1    0.05    0.05    0.00   98.88       -    0.00    0.00    0.00    0.00    0.00    0.00
   35666     5456 vmx-vthread-5:v     1    0.00    0.00    0.00   98.93       -    0.00    0.00    0.00    0.00    0.00    0.00
   35669     5456 vmx-vthread-6:v     1    0.00    0.00    0.00   98.93       -    0.00    0.00    0.00    0.00    0.00    0.00
   35670     5456 vmx-mks:vcenter     1    0.00    0.00    0.00   98.93       -    0.00    0.00    0.00    0.00    0.00    0.00
   35671     5456 vmx-svga:vcente     1    0.00    0.00    0.00   98.93       -    0.00    0.00    0.00    0.00    0.00    0.00
   35672     5456 vmx-vcpu-0:vcen     1    8.91    8.89    0.00   85.47    1.38    4.56   84.09    0.05    0.02    0.00    0.00
   35673     5456 vmx-vcpu-1:vcen     1    2.13    2.09    0.00   87.07    0.09    9.78   86.98    0.02    0.00    0.00    0.00

The first three lines of the esxtop output shows:

  • current time (UTC) and uptime;
  • number of Worlds, VMs and vCPUs configured and running inside the host;
  • CPU load average in 1/5/15 minutes;
  • PCPU USED(%): shows the amount of “effective work” that has been done by this PCPU
  • PCPU UTIL(%): indicates how much time a PCPU was busy during last interval. If Intel hyper-threading is enabled, it’s referred to a single lane of the physical core.
  • CORE UTIL(%): available only if Intel hyper-threading is enabled, CORE refers to a physical core utilization, each one with two lane or PCPU.

The above output has been captured from a physical host with Intel processor with HT enabled: 8 Cores each one with two lanes. An OS running on an HT enabled Core can see two CPUs. On the physical core two parallel lanes are present, but the core can still execute one instruction per time. Having two parallel lanes optimizes the scheduling process. PCPU USED and PCPU UTIL differs because HT or Power Management which can lower the CPU frequency. If PCPU UTIL is near 100% for at least one PCPU, one or more VM is affected. The Ready Time should be investigated to find which VM is suffering. If PCPU UTIL is heavy unbalanced and the DRS cluster is active, there are too many heterogeneous VMs. For example in the same host many 1 vCPU, few 2 vCPU and many 4vCPU are running, and the scheduler is unable to fill all CPU slots every time. Moving all 4 vCPU VMs to a dedicated cluster can help a lot. The following column can be displayed:

Field Key to toggle Key to sort Show (by Default) Description
ID A Process ID. If ID=GID the single processes owned by the GID are hidden. Use the "e" keyword to show each single process ID.
GID B N Group ID, used to filter the output.
LWID C Leader World Id, World Group Id or VMX Cartel ID. It's used on the "ps", "vscsiStats", "kill" and other commands.
NAME D Name of the process/VM. If PID=GID the name of the VM is showed.
NWLD E Number of Worlds used by the GID.
%USED F U (default) Percentage of physical CPU used by the World. Can be higher than 100% if more vCPUs are configured (it's the sum of each single PCPU). %USED = %RUN + %SYS - %OVRLP
%RUN (!) F Percentage of pure PCPU scheduled time for the World. Can be higher than 100% if more vCPUs are configured (it's the sum of each single PCPU). Does not necessarily means a problem, further investigation should be made at OS level. %USED = %RUN + %SYS - %OVRLP %WAIT + %RDY + %CSTP + %RUN = 100%
%SYS (!) F Percentage of non World scheduled time (like IO, system Worlds, interrupts...). Should be as lower as possible (up to 20 should be acceptable, dependin by the application responsiveness). Usually means high IO. %USED = %RUN + %SYS - %OVRLP
%WAIT F Percentage of time spent in wait state (included IO, idle...). Can be very high if the VM is almost idle. %IOWAIT ≈ %WAIT - %IDLE (estimate) %WAIT + %RDY + %CSTP + %RUN = 100%
%VMWAIT (!) F Percentage of time spent in wait state waiting for events. Includes %SWPWT and the time the VM is blocked for when a device is unavailable. Should be as lower as possible.
%RDY (!) F R Percentage of time the world was ready to run, waiting in a queue to be scheduled. Should be as lower as possible (up to 10 should be acceptable, depending by the application responsiveness). Oversubscription of physical CPUs can lead to high values. %WAIT + %RDY + %CSTP + %RUN = 100%
%IDLE F Percentage of time spent in the idle loop. Make sense for vmx-vcpu Worlds only.
%OVRLP (!) F Percentage of time (system service time) spent by the system for the World (overlap). It's accounted in %RUN and not in %USED. It's usually means high IO. %USED = %RUN + %SYS - %OVRLP
%CSTP (!) F Percentage of time that the SMP VM is ready to execute commands, but there is not enough physical CPUs available: a VM with multiple vCPUs can execute an instruction if all vCPUs are ready (SMP architecture limit). Should be as lower as possible (up to 3 should be acceptable, depending by the application responsiveness). Oversubscription of physical CPUs can lead to high values. %WAIT + %RDY + %CSTP + %RUN = 100%
%MLMTD (!) F Percentage of time the vCPU was ready to run but a CPU limit prevent it to be scheduled. Should be 0, unless a valid reason exists.
%SWPWT (!) F Percentage of time waiting for swapped pages to be read from disk. Should be 0, or the VM will be unresponsiveness very soon (and probably the entire system). High memory oversubscription can cause high values.
SWTCH/s G Number of world switches (out of run state).
MIG/s (!) G Total number of migrations: can be intra socket (across the cores on the same socket) or inter sockets (across cores on different sockets). On NUMA servers, it can lead to latency with bad hardware architecture.
QEXP/s G Number of quantum expirations.
WAKE/s G Number of wakeups (from wait state)
AMIN H Minimum allocation, should be 0, unless a CPU reservation has been configured.
AMAX (!) H Maximum allocation, should be -1, unless a CPU limit has been configured.
ASHRS H Allocated shares. Should be -3, unless a per VM shares has been configured (-4 High, -3 Normal, -2 Low, other values for Custom).
AMLMT H Minimum limited allocation.
AUNITS H Allocated unit (mhz for VM).
%LAT_C I Percentage of CPU latency.
%LAT_M I Percentage of Memory latency.
%DMD I Percentage of vCPU Demand.
EMIN I Effective Min (MHz).
TIMER/s I The timer rate the World is currently requesting for.
AFFINITY_BIT_MASK I Bit mask showing the current scheduling affinity for the World (only on a per World view, use 'e' to expand).
CPU I The physical or logical processor the World was found to be running on (only on a per World view, use 'e' to expand).
POWER J Not sure about he meaning (only on a per GID view, do not use 'e' to expand), should be 0.

Memory (m)

By default the memory view is sorted by MEMSZ:

10:29:36am up 23 days 21:33, 452 worlds, 3 VMs, 7 vCPUs; MEM overcommit avg: 0.00, 0.00, 0.00
PMEM  /MB: 32764   total:  1581     vmk,  9143 other,  22039 free
VMKMEM/MB: 32603 managed:   940 minfree,  4655 rsvd,  27947 ursvd,  high state
NUMA  /MB: 16379 (12883), 16383 ( 8996)
PSHARE/MB:  3815  shared,   172  common:  3643 saving
SWAP  /MB:     0    curr,     0 rclmtgt:                 0.00 r/s,   0.00 w/s
ZIP   /MB:     0  zipped,     0   saved
MEMCTL/MB:     0    curr,     0  target,  7817 max

     GID NAME               MEMSZ    GRANT    SZTGT     TCHD   TCHD_W    SWCUR    SWTGT   SWR/s   SWW/s  LLSWR/s  LLSWW/s   OVHDUW     OVHD  OVHDMAX
    4352 dc1              4096.00  4051.98  1269.70    81.92    40.96     0.00     0.00    0.00    0.00     0.00     0.00    10.28    94.89    93.90
10584791 vSphere Data Pr  4096.00  4096.00  3973.49  1433.60  1187.84     0.00     0.00    0.00    0.00     0.00     0.00     7.39   168.34   192.31
    5456 vcenter1         4096.00  4095.37  3823.17   942.08   655.36     0.00     0.00    0.00    0.00     0.00     0.00    10.61   152.62   158.42
    2196 hostd.33955        71.34    44.04    48.44    10.99    10.99     0.00     0.00    0.00    0.00     0.00     0.00     0.00    44.04    71.34
    3379 vpxa.34563         25.04    16.84    18.53     3.75     3.75     0.00     0.00    0.00    0.00     0.00     0.00     0.00    16.84    25.04
    4533 sfcb-ProviderMa    18.61     6.79     7.46     2.79     2.79     0.00     0.00    0.00    0.00     0.00     0.00     0.00     6.79    18.61
    4530 sfcb-ProviderMa    16.20    13.52    14.87     3.01     3.01     0.00     0.00    0.00    0.00     0.00     0.00     0.00    13.52    16.20
     962 vobd.33214         13.20     1.57     1.72     0.12     0.12     0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.57    13.20
[...]

The first lines of the memory view shows:

  • current time (UTC) and uptime;
  • number of Worlds, VMs and vCPUs configured and running inside the host;
  • MEM overcommit avg: average memory overcommit in 1/5/15 minutes (can be more than 0, but check if swapping);
  • PMEM/MB: total MB of RAM available on the host and how it’s used (vmk = VMKernel, other = VM and other non VMKernel processes);
  • VMKMEM/MB: the total amount of machine memory managed by VMKernel (minfree = minimum RAM that should always be available, rsvd = reserved RAM which includes minfree also, ursvd = unreserved RAM, state = related to host performance).
  • NUMA/MB: for each NUMA node (usually a socket) shows total and available RAM;
  • PSHARE/MB: RAM shared by the VM (common = RAM in common between VMs, saving = RAM saved by the host, sahred = common + saving);
  • SWAP/MB: current swap usage (rclmtgt = how much the host expects to swap, r/s = read rate in seconds, w/s = write rate in seconds);
  • ZIP/MB: RAM zipped by compressing memory pages (saved = RAM saved by compression);
  • MEMCTL/MB: physical reclaimed by balloon driver (target = total ballooned memory expected, max = maximum amount of physical memory reclaimable).

The following column can be displayed:

Field Key to toggle Key to sort Show (by Default) Description
ID A Process ID. In memory view ID=GID, process IDs cannot be expanded.
GID B N Group ID, used to filter the output.
LWID C Leader World Id, World Group Id or VMX Cartel ID. It's used on the "ps", "vscsiStats", "kill" and other commands.
NAME D Name of the process/VM.
NWLD E Number of Worlds used by the GID.
AMIN F Minimum allocation, should be 0, unless a Memory reservation has been configured.
AMAX (!) F Maximum allocation, should be -1, unless a Memory limit has been configured.
ASHRS F Allocated shares. Should be -3, unless a per VM shares has been configured (-4 High, -3 Normal, -2 Low, other values for Custom).
AMLMT F Minimum limited allocation.
AUNITS F Allocated unit (kb for VM).
NHN G
NMIG G
NRMEM G
NLMEM G
N%L G
GST_ND0 G
OVD_ND0 G
GST_ND1 G
OVD_ND1 G
GST_ND2 G
OVD_ND2 G
GST_ND3 G
OVD_ND3 G
MEMSZ H M (default) Amount of configured guest memory. MEMSZ = GRANT + MCTLSZ + SWCUR + never used
GRANT H Amount of guest memory granted by the physical host. GRANT can be lower than MEMSZ if the guest has never used all granted memory or if it has been reclaimed by the balloon driver and never used again. GRANT refers to used memory, not to mapped (malloc) only memory. GRANT includes SHRD and does not include OVHD. MEMSZ = GRANT + MCTLSZ + SWCUR + never used
SZTGT H Amount of physical memory to be allocated (target) by the VMKernel. Includes the overhead memory for a VM.
TCHD H Amount of physical memory recently (5-7 mins) used by the VM (estimated by VMKernel).
TCHD_W H Amount of physical memory recently (5-7 mins) written by the VM (estimated by VMKernel).
%ACTV I
%ACTVS I
%ACTVF I
%ACTVN I
MCTL? J
MCTLSZ (!) J B Amount of guest memory reclaimed by balloon driver. Should be 0, or the VM will be unresponsiveness very soon (and probably the entire system). High memory oversubscription can cause high values. MEMSZ = GRANT + MCTLSZ + SWCUR + never used
MCTLTGT J
MCTLMAX J
SWCUR (!) K Amount of guest memory swapped out to disks. Note that it is the VMKernel swapping not the guest OS swapping. Should be 0, or the VM will be unresponsiveness very soon (and probably the entire system). High memory oversubscription can cause high values. MEMSZ = GRANT + MCTLSZ + SWCUR + never used
SWTGT (!) K Amount of swap to be used (target). Should be less than SWCUR or the VMKernel will start to swap.
SWR/s (!) K Amount of MB read in a second (swap in). Should be 0, or the VM will be unresponsiveness very soon (and probably the entire system). High memory oversubscription can cause high values.
SWW/s (!) K Amount of MB wrote in a second (swap out). Should be 0, or the VM will be unresponsiveness very soon (and probably the entire system). High memory oversubscription can cause high values.
LLSWR/s L Amount of MB read in a second from a SSD disk when "host cache" feature is enabled.
LLSWW/s L Amount of MB wrote in a second to a SSD disk when "host cache" feature is enabled.
CPTRD M
CPTTGT M
ZERO N
SHRD N
SHRDSVD N
COWH N
OVHDUW O Amount of overhead memory reserved for the vmx. It depends on vCPUs and vMEM.
OVHD O Amount of overhead memory currently consumed by a VM. It depends on vCPUs and vMEM.
OVHDMAX O Amount of reserved overhead memory for the entire VM. Should be lower than OVHD but I experienced OVHD > OVHDMAX with proper working VMs.
MCMTTGT P
CMTTGT P
CMTCHRG P
CMTPPS P
CACHESZ Q
CACHEUSD Q
ZIP/s Q
UNZIP/s Q

Network (n)

Disk adapter (d)

Disk device (u)

Virtual Disk (v)

Interrupt (i)

Power (p)

References

  • https://communities.vmware.com/docs/DOC-9279
  • http://www.yellow-bricks.com/esxtop/
  • http://www.virtuallyghetto.com/2010/11/how-to-obtain-gid-and-lwid-from-esxtop.html
  • http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017926
  • http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.monitoring.doc_50%2FGUID-4B6BD1C0-AA99-47F1-93EF-4921D56AE175.html
  • http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.resmgmt.doc%2FGUID-B42C72C1-F8D5-40DC-93D1-FB31849B1114.html
Posted on 11 Sep 2014 by Andrea.
  • Gmail icon
  • Twitter icon
  • Facebook icon
  • LinkedIN icon
  • Google+ icon