Logical Block Addressing is a common addressing scheme for disks on PCs. However, under this scheme the master boot record causes partitions to start at a block that isn’t a power of 2. This isn’t a huge deal for individual disks, but for shared storage where a LUN is actually striped across many different disks a single read or write by the guest OS causes twice as much I/O on the storage array. The misaligned partition has blocks that straddle the stripes on the array, and instead of reading a single stripe the array has to read from, or write to, two stripes. This isn’t a big problem on one or two VMs, but when hundreds of VMs have misaligned I/O the effect is crippling.
Source: Multiple, including EMC & NetApp product documentation, some blog posts. NetApp also supplies tool, mbrscan & mbralign, to help you identify and fix these situations.
This should be obvious, but unfortunately a lot of sites don’t install the VMware Tools. These tools add drivers for the paravirtualized network and SCSI adapters, add graphics drivers, and also enable the graceful shutdown & reboot functionality from vCenter. Seriously, just install them.
VMware includes specialized hardware for network and SCSI adapters that can drive down CPU utilization and improve performance of individual VMs (and whole environments if they are widely used). It’s easy to use the VMXNET3 adapter, for instance, but there are a few caveats to using the VMware Paravirtual SCSI adapter, like not being able to use it for boot volumes. Regardless, if you can use it you should, because these settings improve performance.
Note that this isn’t the same as using VMI Paravirtualization. There are a lot of caveats to using that, and on any modern hardware with a Memory Management Unit that is virtualization-friendly you should leave VMI Paravirtualization off.
Again, seems obvious, but Linux distributions tend to install a lot of stuff that you’ll never use. By not installing it you save lots of disk space, at least. Furthermore, a lot of Linux distributions assume that if X Windows (X.org, KDE, GNOME, etc.) is installed you are using the machine as a desktop, and may install screensavers and other things that will sap your performance. As a general rule, if you don’t have a physical screen you don’t need a screensaver, so disable it or set it to blank & lock.
If you are comfortable running with a command line interface you can save a bunch of RAM and CPU cycles by booting into runlevel 3. Change the line in /etc/inittab
that reads:
id:5:initdefault
to
id:3:initdefault
and on boot you will not get X Windows. You can always start it later by running ’startx’ from the command line (but you’ll probably find that you can do everything you’d like from the command line).
Linux distributions often have some common system maintenance tasks scheduled automatically, like log rotations, locate database updates, etc. These can be quite I/O intensive. Likewise, things like system monitoring tasks often are scheduled to run at the same time on all hosts (at 0, 15, 30, and 45 minutes after the hour, for example). If you can introduce a delay in these tasks that would help spread the load out.
One super simple trick I use in /etc/cron.daily/logrotate
and /etc/cron.daily/mlocate.cron
is to sleep for a random amount of time. The bash shell has $RANDOM
, which generates a random number between 0 and 32767. For example, if you can wait up to 5 minutes to do these things try adding:
/bin/sleep $((RANDOM/109))
to the scripts (109 = 32767/300 seconds). I do this with log rotations, mlocate database updates, monitoring system scripts, backup jobs, and anything that runs at a common time, changing the divisor to meet whatever timeframe I need.
First, use NTP and not the VMware Tools time synchronization to keep the system clock up to date. Second, use the recommended kernel parameters for the kernel and distribution you’re running. Third, run the newest kernel you can. For example, every update that comes out for Red Hat Enterprise Linux 5 has new virtualization optimizations in it. Getting timekeeping right is important, not just for accurate system time for logs and scheduling but because cryptographic operations rely on system time, too.
Check VMware KB article 1006427 for information on the kernel parameters and some suggestions for setting up NTP.
The Linux kernel has different ways to schedule disk I/O, using schedulers like deadline, cfq, and noop. The ‘noop
’ — No Op — scheduler does nothing to optimize disk I/O. So why is this a good thing? Because ESX is also doing I/O optimization and queuing! It’s better for a guest OS to just hand over all the I/O requests to the hypervisor to sort out than to try optimizing them itself and potentially defeating the more global optimizations.
You can change the kernel’s disk scheduler at boot time by appending:
elevator=noop
to the kernel parameters in /etc/grub.conf
. If you need to do this to multiple VMs you might investigate the ‘grubby’ utility, which can programmatically alter /etc/grub.conf.
This is along the same lines as “don’t run anything you don’t need” but it deserves mention separately, because a lot of people are doing it. The Performance Best Practices document from VMware puts it well:
Timing numbers measured from within virtual machines can be inaccurate, especially when the processor is overcommitted… Measuring performance from with virtual machines can fail to take into account resources used by ESX for tasks it has offloaded from the guest operating system, as well as resources consumed by virtualization overhead.
If you can avoid polling the guest OS for performance data you gain performance by not having to do that work, plus you get more accurate data.
A lot of software vendors specify that their products need 16 GB of RAM, or 4 CPUs, or some other generic amount of resources that may be a complete waste in your environment. Because you can see the actual RAM and CPU utilization easily within vCenter you can opt to undersize your VMs, and only allocate more resources when it’s shown that they need them. This improves performance, as it’s easier for ESX to schedule VMs with fewer CPUs, saves swap file disk space, RAM, and time during VMotions. Memory overcommit can help cope with overallocation, but it is better if you just don’t overallocate to start with.
You can enable the hot-add memory and CPU features in vCenter for your VMs, if you are running recent operating system releases. Need more memory? Just add it. Need another CPU? Just add it. VMware KB 1015501 has information on configuring the CPU hot-add features in Linux so the new CPU is automatically activated.
File systems keep track of when files are created, modified, and accessed. The operations to update the last accessed times become extra writes, which are expensive in terms of I/O. As such, if you don’t need to do them, don’t. This probably won’t gain you a lot of performance on an individual VM (IBM says 0 to 10% depending on the workload), but in aggregate across hundreds of VMs you will likely see improvements.
To disable access time updates add ‘noatime
’ to the mount options in /etc/fstab
. For example, this line:
/dev/Volume00/LogVol00 / ext3 defaults 1 1
becomes
/dev/Volume00/LogVol00 / ext3 defaults,noatime 1 1
It requires a remount of the file system to take effect, such as a reboot.