Sunday, October 5, 2014

Understanding the vCPU: How too many vCPUs can negatively affect performance

I believe there is still a lot of confusion on the differences between the physical and virtual world.
I hope that after reading the below explanation and best practice, you will understand and recognize my daily struggle as the private-cloud gate-keeper to try and balance between specific applications and all site resources&performance impact:

What is ESX?
VMware ESX and VMware ESXi are bare metal embedded hypervisors for guest virtual servers that run directly on host server hardware without requiring an additional underlying operating system.

hypervisor, also called virtual machine manager (VMM), is hardware Virtualization techniques allowing multiple operating systems, termed guests (VM's), to run concurrently on a host computer. The hypervisor presents to the guest operating systems a virtual hardware platform. Multiple instances of a variety of operating systems may share the virtualized hardware resources.
What is vCPU?
vCPU stands for virtual CPU, which is similar to physical CPU (pCPU)in the physical world.

vCPU is a virtual processor, you can assign multiple vCPUs to a Virtual Machine [but you should never exceed the number of physical sockets you have, for example if you have a 2 CPU server you should only assign a maximum of 2 vCPUs to a VM].
The number of Virtual CPUs you run per core depends on the workload of the VMs and the amount of resources you expect to use on your ESX. It's all down to doing your maths before hand and working out what you can safely configure on each ESX. [4-8 VMs per core is the norm, better to stick closer to 4 if you are looking for performance].

Why too many vCPUs can negatively affect performance?
In VMware ESX (and ESXi) the vmkernel handles the scheduling of CPU resources to the virtual machines. Now with multi-processor virtual machines there is a catch. If you have a dual CPU virtual machine the scheduler must have two processors available at the same time for the virtual machine or it will wait until the proper amounts of resources are available. This wait time waiting is called %ready.

To elaborate, CPU Ready is a metric that measures the amount of time a VM is ready to run against the physical CPU i.e. how long a vCPU has to wait for an available core when it has work to perform. So while it’s possible that CPU utilization may not be reported as high, if the CPU Ready metric is high then your performance problem is most likely related to the CPU. Usually, this was caused by assigning four vCPUs and in some cases eight for each Virtual Machine. So why was this happening?
Well firstly the hardware and its physical CPU resource are still shared. Coupled with this the ESX Server itself also requires CPU to process storage requests and network traffic, etc. Then add the situation that sadly most organizations still suffer from the ‘silo syndrome’ and hence there still isn’t a clear dialogue between the System Admin and the Application owner. The consequence being that while multiple vCPUs are great for workloads that support parallelization but this is not the case for applications that don’t have built in multi-threaded structures. So while a VM with 4 vCPUs will require the ESX server to wait for 4 pCPUs to become available, on a particularly busy ESX server with other VMs this could take significantly longer than if the VM in question only had a single vCPU.

To explain this further let’s take an example of a four pCPU host that has four VMs, three with 1 vCPU and one with 4 vCPUs. At best only the three single vCPU VMs can be scheduled concurrently. In such an instance the 4 vCPU VM would have to wait for all four physical CPUs to be idle. In this example the excess vCPUs actually impose scheduling constraints and consequently degrade the VM’s overall performance, typically indicated by low CPU utilization but a high CPU Ready figure. With the ESX server scheduling and prioritizing workloads according to what it deems most efficient to run, the consequence is that smaller VMs will tend to run on the pCPUs more frequently than the larger overprovisioned ones. So in this instance overprovisioning was in fact proving to be detrimental to performance as opposed to beneficial. The VMKernel still has to manage every vCPU, a complete waste if the VM’s application doesn’t use them!

Even if the guest operating system doesn’t use some of its vCPUs, configuring virtual machines with those vCPUs still imposes resource requirements on ESXi that translate to real CPU consumption on the host. For example:

* Unused vCPUs still consume timer interrupts in guest operating systems (VM's).
* Maintaining a consistent memory view among multiple vCPUs consume additional resources, both in the guest operating system and in ESXi.

* Most guest operating systems execute an idle loop during periods of inactivity. Within this loop, most of these guest operating systems halt by executing the HLT or MWAIT instructions. Some guest operating systems, however, use busy-waiting within their idle loops. This results in the consumption of resources that might otherwise be available for other uses (other virtual machines, the VMkernel, and so on).
* ESXi automatically detects these loops and de-schedules the idle vCPU. Though this reduces the CPU overhead, it can also reduce the performance of some I/O-heavy workloads.

* The guest operating system’s scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality.

Conclusion:
“Best practices” when deploying a new VM in your environment is: The less is more. The default should always be a single vCPU.

But if you are going to be installing multiprocessor aware applications like SQL server, and you know it is going to see some use, then give it 2 or more vCPUs. But don’t over-provision just to over-provision. That will reduce! the number of VMs you can put on a single host&blade&ESX.
Only allocate two or more virtual CPUs to a virtual machine if the operating system and the application can truly take advantage of all the virtual CPUs. Otherwise, physical processor Resources may be consumed with no application performance benefit and, as a result, Other virtual machines on the same physical machine will be penalized!

------------------------------------------------------------------------------------
So, now, after you are more familiar with the Virtualization shared resources technology, I am sure you will help keep our virtual environment more efficient and use the limited physical resources better.

…..Size isn’t always everything…… it’s what you do with your CPU that counts

No comments:

Post a Comment