VMWare : Troubleshooting VM Performance

VMWare Troubleshooting VM Performance in ESX/vSphere

Firstly we’re going to be using esxtop, which can be executed from the local console or an SSH session on ESXi, as well as vCenter.

General guidelines;

  1. Physical : vCPU ratio should be around 1:5
  2. Avoid memory oversubscription in production environments
  3. Reserve memory for virualised SQL servers using Lock Pages in Memory
  4. Reserve memory for virtualised RDS/Citrix Servers
  5. Always create datastores using the vSphere client, this will ensure that the VMFS partition is aligned – this will reduce IO and potentially increase performance.
  6. Align guest data disks (for Windows, any version earlier than 2008 must be manually aligned) – this will reduce IO and potentially increase performance.
  7. Look to keep the number of VM’s per Datastore/LUN to around 10-15, this will help to reduce SCSI reservation contention.
  8. vCPU’s ; less is more. From my own testing I’ve found that Citrix servers perform best with 2 vCPU’s over 4 vCPU’s. This is not only better for my users but also the ESXi hosts as there is less co-scheduling.

Lets begin…..

1) Investigate CPU contention/exhaustion using esxtop (press ‘c’ from esxtop, and shift-v for per-VM stats only):

  1. Check host PCPU usage using esxtop
  2. Look at %RDY, if this is equal or greater than 10% there is a performance issue – this can indicate CPU contention.
  3. Look at %MLMTD, if this is high it would indicate a CPU limit is being imposed on the VM. %RDY – %MLMTD gives a true indication of CPU contention.
  4. If %RDY is truluy above 10% the first step is to lower the number of active vCPUs configured on the ESX/vSphere server, next you’re looking at reducing the number of VM’s on the server.
  5. Investigate co-scheduling/SMP related issue – are VM’s using all presented vCPU’s? From esxtop press ‘c’ then ‘e’  – then take a look at %CSTP. If these values are high this could indicate issue as this represents the overhead in co-scheduling CPU’s from a co-stopped to co-started state.

For example, if you have 16 cores, the maximum vCPUs that should be defined across all active VMs should not exceed 80.

2) Investigate memory usage via esxtop and vCenter (press ‘m’ from esxtop, and ‘shift-v’ for per-VM stats only):

  1. Check host memory utilisation using esxtop
  2. For full-fat ESX only – the service console may be low on RAM, you can adjust this by following these instructions: http://www.vmware.com/pdf/esx_performance_tips_tricks.pdf
  3. Watch out for memory balooning, this can have a significant impact on VM performance. You can track memory balooning in vCenter and esxtop; MCTLTGT is the VMKernel’s desired memory baloon size, MCTLSZ is the actual size. If the target is greater than the size the baloon is increased/inflated, if it is smaller it is decreased/deflated. VM memory limits can also trigger balooning.
  4. Transparent Page Sharing (TPS) allows a host to share memory with other VM’s on the host – only used when memory resources are low/overcommitted.
  5. Check esxtop for SWCUR (currently used SWAP), SWTGT and SWCUR. If SWTGT is less than SWCUR swapping will take place. Swapping is slow so should be avoided at all costs.  If sawpping is unavoidable use SSD’s; There’s a -12% degradation with local SSD versus -69% for Fiber Channel and -83% for local SATA storage. (more information here)
  6. SWPWT represents the ammount of time a Virtual Machine is waiting for memory to be swapped in and should always be below 5%
  7. SWR/SWW represent Swap Reads/Writes from disk to memory and vice versa.

3) Using esxtop investigate storage (press ‘u’ for per-datastore or ‘d’ for per-hba stats:

  1. Investigate DAVG – represents the roud-trip time bewteen HBA and storage, should be less than 30ms ideally
  2. Investigate KAVG – represents actual latency due to VMKernal
  3. Investigate GAVG – represents the round-tripfor IO requests sent form the host to storage, again lower is better, ideally less than 30ms.
  4. Check the CONS/s – this indictaes SCSI reservation conflicts generated by metadata updates on the same LUN at a given time.
  5. vscsiStats (more info here)  will report per-VMDK/RDM

4) Finally, consider the network subsystem:

  1. Check bandwidth availability
  2. Using esxtop check %DRPTX and %DRPR, if the latter is high consider increasing the Rx buffer from device manager (yes, Windows only…?linux configuration) on the VM

If all else fails check advisories on your hardware platform, I’ve run into issues in the past that have been device firmware specific so dont rule out the siplist of things.

UPDATE 22/02/2010 : Check out the new esxtop article here for further performance troubleshooting tips.

VMWare VCB : Improving Performance of VCB

VCB Backup Essentials

Having recently introduced VCB backups into Dataprotector 6.0 I thought I would share a few useful tips for ensuring that backup speeds are as fast as possble.

1) Ensure that all VM’s have a scheduled task to zero-out free space prior to VCB running. Windows, when you delete a file does not zero-out the disk space (populate the data blocks with zero’s) – soif you had a 20GB drive that contained 15GB of data, then you delete 10GB of data, unoless you zero out this space the backup will still be 15GB.

I use the free ‘SDELETE‘ tool from sysinternals (now Microsoft) to do this, and simply execute a scheduled task before the backup is due to run. SDELETE can eb found here: http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx

2) When running VCB, check the disk queue performance counter on the VCB Proxy server, the storage to which the VCB snapshot is taken can be a serious bottleneck for VCB performance. Initially I was running VCB over fibre, to a fibre attached SAN disk. I found that after 1.97GB the backup would grind to a halt – 200Kb/sec!!! By changing the VCB snapshot drive to local RAID0 storage this increased to over 2.2GB/min, or 37.5MB/sec. Your hardware may be capable of significantly faster speeds.

3) Disable additional disk paths on the VCB Proxy Server: VCB does not like MPIO/multiple paths to LUNS. This step is probably the biggest potential speed gain you’ll get. Disable the additional disk objects in Windows device manager, test you backups once complete, if they don’t work enable the path you disabled and disable a different one. This can see speed improvements of 100MB/sec.

4) Run multiple VCB snapshots at the same time. Your SAN containing the VM’s will, more than likely, support more than 35MB/sec. Just ensure you change the snapshot directory otherwise your backup application may backup multiple snapshots at once!

Windows NTFS Compression : Decompress Entire Volume

Windows NTFS Filesystem Compressiopn: Uncompressing An Entire Volume

I recently came across a perofrmance issue on an old x86 WinTel server, The issue, after regular diagnosis showed no obvious cause, appeared to be that the root drive was compressed in order to increase available disk space.

The one problem with NTFS compression is this: 

‘When you open a compressed file, Windows automatically decompresses it for you, and when you close the file, Windows compresses it again. This process may decrease your computer performance'(http://support.microsoft.com/kb/307987)

Using the COMPACT command line tool it is possible to both identify all compressed files within a folder and its subfolders. This can be acheived using the command:

compact /I /S

To uncompress all files (assuming you have enough free disk space to do so) you can use the following command to uncompress all compressed files within the current folder and all subdirectories:

compact /U /I /S

After disabling file system compression on the root drive the server is now performing as-expected.