Sunday, October 5, 2014

CPU Ready

One of the key performance counters in a vSphere environment is: CPU ready (%rdy in ESXTOP)
CPU ready is the time a virtual CPU is ready to run but is not being scheduled on a physical CPU. This would under normal circumstances indicate that there is not enough physical CPU resources on an ESX/ESXi host. This is the first go-to counter when your users complain about bad performance.
It is generally normal for VMs to have small values for CPU Ready Time accumulating even if the hypervisor is not over subscribed or under heavy activity, it’s just the nature of shared scheduling in virtualization. For SMP VMs with multiple vCPUs the amount of ready time will generally be higher than for VMs with fewer vCPUs since it requires more resources to schedule/co-schedule the VM when necessary and each of the vCPUs accumulates the time separately.
At what point does CPU Ready Time start to affect performance?
VMware had a recommendation that for a SMP VM anything over 5% per vCPU is typically a warning level and anything over 10% per vCPU is critical. The reason this specifically says per vCPU is that each vCPU allocates 100% to the VM’s scheduling total, so a 4 vCPU VM would have a scheduling total of 400%. A 10% CPU Ready on a 4 vCPU VM only equates to 2.5% per vCPU.
Beware that if the VM has a CPU Limit placed on it, whenever the VM exceeds its allocated limit it will accumulate CPU Ready time while it waits to be allowed to execute again.
Using the formula from the KB article to convert a summation value to percent:
Real time (20sec) xxxms x100 / 20,000ms = ready-time%
Day (5min) xxxms x100 / 300,000ms = ready-time%
Week (30min) xxxms x100 / 1,800,000ms = ready-time%
Month (2 hours) xxxms x100 / 7,200,000ms = ready-time%
 
Take the result and divide it by number of vCPU's
Lower than 5% is good
Higher than 10% is problematic.
 
As a shortcut, you can use the following formulas for the default chart update intervals to get the CPU ready %:
•Realtime: CPU summation value / 200
•Past Day: CPU summation value / 3000
•Past Week: CPU summation value / 18000
•Past Month: CPU summation value / 72000
•Past Year: CPU summation value / 864000
Example: A realtime CPU summation value of 1000 is divided by 200 to give a CPU ready % of 5.

BTW, for real time graph you can probably make life easier on yourself by using the "latency" CPU counter in vSphere, this is the CPU Ready Time %.
What cause high CPU Ready times?
The most common reason tends to be host over subscription, where too many vCPUs have been allocated per physical CPU ratio. While ESX 5 supports a maximum of 25 vCPUs per physical CPU, this is definitely a case where just because you can, doesn’t mean it’s good to do. typically problems start when a host is in the range of 2-2.5X over subscribed for server workloads.

The second common scenario where CPU Ready times are high is when a larger SMP VM, for example one with 4-8 vCPUs is running on a host that has a lot of smaller VMs with 1-2 vCPUs for application servers. The larger resource allocation for the SMP VM results in it having to wait longer for the hypervisor to supply the necessary physical CPUs to schedule/co-schedule the workload. Often in cases where this occurs, after asking some questions I find that the number of vCPUs for the Server was increased from 4 to 8 due to performance problems for the VM. Unfortunately, if CPU Ready time was the original problem, increasing the vCPUs actually doesn’t improve performance, it generally makes things worse.
What do I do if this is actually a problem?
when CPU Ready is a problem for your VMs there are a couple of different things that can be done. The correct one depends on your virtual infrastructure. If the problem is purely host over subscription vCPU to pCPU ratio wise, start off by evaluating whether the VMs need to have the number of configured vCPUs to determine if any of them can be reduced to lower the ratio. If this can’t be done, the only real answer is to add additional hosts to allow the load to be balanced better and reduce the over subscription rates. Evaluate whether you can consolidate the larger VMs onto one or most hosts and move the smaller VMs to the other hosts to separate the VMs based on their sizes.

No comments:

Post a Comment