ESX 4.X : Increased RAM use on Intel EPT Enabled Processors

 

Why does memory utilisation appear higher on ESX 4.X when using Intel EPT Enabled Processors?

 

There is an simple reason why memory usage on your ESX 4.X boxes looks higher than on 3.5; ESX3.5 doesn’t support Intel EPT for HW MMU this is why you’ve not seen this in ESX 3.5. Had you been running AMD RVI capable processors you’d have encountered this with an upgrade from 3.0 to  3.5; ESX 4 was the first ESX platform to support Intel EPT; http://www.vmware.com/files/pdf/software_hardware_tech_x86_virt.pdf

 

Interpretation of esxtop counters is key here, counters we’re interested in:

·         “GRANT” (the amount of physical memory granted to a VM/pool)

·         “SHRD” (the shared portion of  “GRANT”.)

·         “SHRDSVD” (estimated saving due to TPS)

·         “COWH” (indication of memory which can be reclaimed by TPS)

 

When using Intel EPT – http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf  – (or AMD RVI) and where a VM supports HWMMU the ESX kernel will use 2MB pages instead of 4KB pages. On these VM’s you will see low values for SHRD/SHRDSVD.

 

HWMMU is supported on all versions of Windows from 2003 onwards; it is also the default setting for VM’s running these operating systems on hardware that supports Intel EPT; http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020524

 

 

How can we interpret memory usage on ESX 4.X when using HWMMU?

 

Using esxtop you will find that Linux based VM’s will have high values for SHRD/SHRDSVD, this is because they do NOT use Intel EPT and therefore do not use HWMMU. As MMU is virtualised (software) small pages are used which play nicely with TPS. Windows 2003+ VM’s will use HWMMU and therefore will use large pages.

 

You can check the HWMMU status of a VM by checking the vmware.log and looking for “HV Settings” –look at the value for “virtual mmu” if ‘software’ the VM is not using HWMMU (Intel EPT) if ‘hardware’ it is.

 

Large Page support can be disabled (therefore forcing TPS to work regardless of contention), but there is a CPU performance impact (not massive to be fair but mileage may vary on this). At the end of the day TPS will kick in a claim back memory (significant amounts – http://vmnomad.blogspot.com/2011/08/shared-memory-effectiveness-of.html) at 94% used.

  

I hope this makes sense and sheds some light on why the ESX 4 boxes appear to be using more RAM.

vSphere : Converter Windows 2000 and vApp Issues

vSphere : P2V Windows 2000 and vApp Issues

In order to perform a Windows 2000 P2V you will be unable to use the most recent version of the VMWare converter. The last version to support Windows 2000 is 4.0.1 build-161434.

A big caveat with this version is that it does not support vApp’s as defined on vCenter. If you try to import to vCenter the converter will crash, the following event will be logged in the event log:

The VMware vCenter Converter Server service terminated unexpectedly.  It has done this 1 time(s).

On loading the converter application again you will receive an error:

VMware vCenter Converter Server is installed but not running. When Converter Server is not running you will not be able to connect to local server.
Do you want to start it now? 

The workaround for this issue is to connect directly to the/an ESX host instead of the vCenter.

The physical machine must be able to connect to the ESX host on ports 443 and 903 in order for the conversion to work.

HP BL465c G7 : ESXi 4.1 Issues

HP BL465c G7 : ESXi 4.1 Issues

Having just built a new ESXi 4.1 cluster comprised of BL465c G7’s we’ve run into some interesting issues I thought I would share.

Six BL465c G7 servers have been installed in 2 new C7000 G2 enclosures each with 2x Flex-10 Virtual Connect Modules and 2×20-port 8GB Fiber Channel Virtual Connect Modules. Servers are split 3 per-chassis;

In breif the issues are as follows:

  1. PSOD on reboot and random crashes with PSOD
  2. Corrected Memory Error threshold exceeded
  3. Hang on reboot
  4. ILO Duplicate SSL Certificate Serial Number

  Continue reading “HP BL465c G7 : ESXi 4.1 Issues”

VMWare : Increasing the HBA / Device Queue Depth

VMWare : Increasing the HBA / Device Queue Depth

ESX 3.5 ships with a standard HBA / LUN queue  depth of 32. For QLogicHBA’s a setting of 64 may improve storage performance. You can identify a storage IO bottleneck using esxtop from the ESX command line. When running esxtop view LUN queue statistics by pressing ‘u‘ – monitor the QUED, ACTV and LOAD stats. If LOAD is above 1.0 constantly, and therefore QUED is greater than 0, increasing the queue depth above 32 may increase performance. As always, apply the ‘if it’s not broke then don’t try to fix it‘ philosophy!

There are two settings that must bechanged to increase the queue depth.

The steps below apply to QLogic HBA’s only.

Adapter Queue Depth

Find the current module name:
   esxcfg-module -l | grep -i ql

Check the current module queue depth setting:
   cat /proc/scsi/qla2300/? | grep -i “queue depth”

This will return the value: Device queue depth = 0x20 (0x20 is HEX this is ‘32’ in decimal)

Change the queue depth using the following command, note there is no output from the command:
   esxcfg-module -s ql2xmaxqdepth=64 qla2300_707

Verify the change has been written to the esx.conf file:
   cat /etc/vmware/esx.conf

Reboot the ESX server
, then check the module configuration:
   cat /proc/scsi/qla2300/? | grep -i “queue depth”

This should return the value: Device queue depth = 0x40

Now, if you stop here you’ll find that the DQLEN will change from 32 to 64 and back again when viewing the LUN statistics in esxtop. It will keep changing randomly unless you perform the step below.

Disk.SchedNumReqOutstanding

Now we must increase the Disk.SchedNumReqOutstanding to 64, otherwise the setting above will have no effect:
   esxcfg-advcfg -s 64 /Disk/SchedNumReqOutstanding

This setting does NOT require a restart of the ESX environment.

Additional Considerations

Disabling Device Resets (which can cause backup interruptions and flooding of NSR’s) and limiting resets to an individual LUN which will not affect the entire SAN should be considered. ESX 3.X by default enables device resets and disables LUN resets. The following commands can be executed from the ESX console:

   esxcfg-advcfg -s 1 /Disk/UseLunReset
   esxcfg-advcfg -s 0 /Disk/UseDeviceReset

VMWare : Capturing Performance Statistics

VMWare : Capturing Performance Statistics

The following process will allow you to capture Windows Performance counter compatible CSV files from any ESX server using the ‘esxtop’ utility which is an integral part of VMWare ESX.

First we must create a couple of script files. The first being ‘ftp.sh‘ I have created the scripts on a datastore which houses NO Virtual Servers. Be careful where you place this data as filling up a datastore with VM’s will stop those VM’s working. You will need to modify the text in RED to ensure the script works in your environment. The text in RED is simply the path where the script fiels are located, and the path where the csv files will be generated.

This script will generate the CSV file and ‘trim’ it down to the stats we require. By default esxtop will generate an insanely large csv file. Once ‘trimmed’ it will upload the csv file to an FTP server of your choice and finally gzip/archive the file for future reference.

# Every 24 Hours FTP todays stats
#!/bin/bash
#
echo $(date +%R)
# Perform streamlining of CSV file
#
dm1=$(date –date=’1 day ago’ +%Y-%m-%d)
cat /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv | cut -d “,” -f 1,`head -1 /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv | tr “,” “\12” | egrep -n “\\\\\Memory\\\\\Free MBytes|Physical Disk\(vmhba1\)\\\\\Reads/sec|Physical Disk\(vmhba1\)\\\\\Writes/sec|Physical Disk\(vmhba1\)\\\\\MBytes Written|Physical Disk\(vmhba1\)\\\\\MBytes Read|\\\\\Physical Disk\(vmhba2\)\\\\\Reads/sec|Physical Disk\(vmhba2\)\\\\\Writes/sec|Physical Disk\(vmhba2\)\\\\\MBytes Written|Physical Disk\(vmhba2\)\\\\\MBytes Read |Physical Disk\(vmhba2\)\\\\\Commands/sec|Physical Disk\(vmhba1\)\\\\\Commands/sec|Physical Cpu\(_Total\)” | cut -d “:” -f 1 | tr “\12” “,”` > /vmfs/volumes/LOCAL_ATTACHED/esxtop/trim_$HOSTNAME_$dm1.csv

sed -i”.bak” “2d” /vmfs/volumes/LOCAL_ATTACHED/esxtop/trim_$HOSTNAME_$dm1.csv
rm /vmfs/volumes/LOCAL_ATTACHED/esxtop/trim_$HOSTNAME_$dm1.csv.bak -f
rm /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv -f
mv /vmfs/volumes/LOCAL_ATTACHED/esxtop/trim_$HOSTNAME_$dm1.csv /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv

HOST=’ftp.domain.local’
USER=’username’
PASS=’password’

# Connect to FTP Server
ftp -inv $HOST << EOF
user $USER $PASS
lcd /vmfs/volumes/LOCAL_ATTACHED/esxtop
put $HOSTNAME_$dm1.csv
bye
EOF

# GZIP and archive stats
#
gzip /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv
mv /vmfs/volumes/LOCAL_ATTACHED/esxtop/$HOSTNAME_$dm1.csv.gz /vmfs/volumes/LOCAL_ATTACHED/esxtop/archive/


Secondly create the ‘capturestats.sh‘ script which will launch esxtop and capture the statistics you require. Again, modify the text in RED to suit you environment. This script will capture stats every 60 seconds 1439 times – there are 1440 minutes in a day, and we want the script to start again at midbight, so thisscript will run 00:00 to 23:59.

# capture.sh
#!/bin/bash
#
today=$(date +%Y-%m-%d)
#
# There are 1440 minutes in a day, we want to capture 00:00 > 23:59 so we’ll specify 1439 captures at 60 second intervals.
#
esxtop >>
/vmfs/volumes/LOCAL_ATTACHED/esxtop/EUVMTST1_$today.csv -d 60 -n 1439 -c /root/.esxhoststats

Next, create the esxtop config file under /root/.esxhoststats. This will ensure that we capyure only what we need, CPU stats, Memory Useage and Disk I/O stats. You can modify your own config file to meet your own requirements.

abcdefgh
abcdefghijklmno
AbcdefghIjklm
abcdefghijk
abcdefghijk
ABcDEFGHIJKLm
5u

Finally, under root acount context (accessed via sudo su –) execute the ‘controb -e‘ command. Add the following lines to the file:

#!/bin/bash
00 00 * * * /vmfs/volumes/LOCAL_ATTACHED/esxtop/capturestats.sh >/dev/null
00 01 * * * /vmfs/volumes/LOCAL_ATTACHED/esxtop/ftp.sh >/dev/null

 This will cause the capturestats.sh script to run at midnight every day and the ftp.sh script to run at 01:00 everyday.

VMWare : Troubleshooting VM Performance

VMWare Troubleshooting VM Performance in ESX/vSphere

Firstly we’re going to be using esxtop, which can be executed from the local console or an SSH session on ESXi, as well as vCenter.

General guidelines;

  1. Physical : vCPU ratio should be around 1:5
  2. Avoid memory oversubscription in production environments
  3. Reserve memory for virualised SQL servers using Lock Pages in Memory
  4. Reserve memory for virtualised RDS/Citrix Servers
  5. Always create datastores using the vSphere client, this will ensure that the VMFS partition is aligned – this will reduce IO and potentially increase performance.
  6. Align guest data disks (for Windows, any version earlier than 2008 must be manually aligned) – this will reduce IO and potentially increase performance.
  7. Look to keep the number of VM’s per Datastore/LUN to around 10-15, this will help to reduce SCSI reservation contention.
  8. vCPU’s ; less is more. From my own testing I’ve found that Citrix servers perform best with 2 vCPU’s over 4 vCPU’s. This is not only better for my users but also the ESXi hosts as there is less co-scheduling.

Lets begin…..

1) Investigate CPU contention/exhaustion using esxtop (press ‘c’ from esxtop, and shift-v for per-VM stats only):

  1. Check host PCPU usage using esxtop
  2. Look at %RDY, if this is equal or greater than 10% there is a performance issue – this can indicate CPU contention.
  3. Look at %MLMTD, if this is high it would indicate a CPU limit is being imposed on the VM. %RDY – %MLMTD gives a true indication of CPU contention.
  4. If %RDY is truluy above 10% the first step is to lower the number of active vCPUs configured on the ESX/vSphere server, next you’re looking at reducing the number of VM’s on the server.
  5. Investigate co-scheduling/SMP related issue – are VM’s using all presented vCPU’s? From esxtop press ‘c’ then ‘e’  – then take a look at %CSTP. If these values are high this could indicate issue as this represents the overhead in co-scheduling CPU’s from a co-stopped to co-started state.

For example, if you have 16 cores, the maximum vCPUs that should be defined across all active VMs should not exceed 80.

2) Investigate memory usage via esxtop and vCenter (press ‘m’ from esxtop, and ‘shift-v’ for per-VM stats only):

  1. Check host memory utilisation using esxtop
  2. For full-fat ESX only – the service console may be low on RAM, you can adjust this by following these instructions: http://www.vmware.com/pdf/esx_performance_tips_tricks.pdf
  3. Watch out for memory balooning, this can have a significant impact on VM performance. You can track memory balooning in vCenter and esxtop; MCTLTGT is the VMKernel’s desired memory baloon size, MCTLSZ is the actual size. If the target is greater than the size the baloon is increased/inflated, if it is smaller it is decreased/deflated. VM memory limits can also trigger balooning.
  4. Transparent Page Sharing (TPS) allows a host to share memory with other VM’s on the host – only used when memory resources are low/overcommitted.
  5. Check esxtop for SWCUR (currently used SWAP), SWTGT and SWCUR. If SWTGT is less than SWCUR swapping will take place. Swapping is slow so should be avoided at all costs.  If sawpping is unavoidable use SSD’s; There’s a -12% degradation with local SSD versus -69% for Fiber Channel and -83% for local SATA storage. (more information here)
  6. SWPWT represents the ammount of time a Virtual Machine is waiting for memory to be swapped in and should always be below 5%
  7. SWR/SWW represent Swap Reads/Writes from disk to memory and vice versa.

3) Using esxtop investigate storage (press ‘u’ for per-datastore or ‘d’ for per-hba stats:

  1. Investigate DAVG – represents the roud-trip time bewteen HBA and storage, should be less than 30ms ideally
  2. Investigate KAVG – represents actual latency due to VMKernal
  3. Investigate GAVG – represents the round-tripfor IO requests sent form the host to storage, again lower is better, ideally less than 30ms.
  4. Check the CONS/s – this indictaes SCSI reservation conflicts generated by metadata updates on the same LUN at a given time.
  5. vscsiStats (more info here)  will report per-VMDK/RDM

4) Finally, consider the network subsystem:

  1. Check bandwidth availability
  2. Using esxtop check %DRPTX and %DRPR, if the latter is high consider increasing the Rx buffer from device manager (yes, Windows only…?linux configuration) on the VM

If all else fails check advisories on your hardware platform, I’ve run into issues in the past that have been device firmware specific so dont rule out the siplist of things.

UPDATE 22/02/2010 : Check out the new esxtop article here for further performance troubleshooting tips.