Thursday, August 30, 2012

Deep Dive: VMWare ESX/ESXi CPU Scheduler

Reviewing the concepts covered in Larry Loucks book "Critical VMware Mistakes you Should Avoid", we will first look at the CPU Scheduler. Larry's book does a great job of giving an overview of the CPU scheduler and core concepts of how it works, I hope this post builds on top of the core concepts and I hope this post feels more like a deep dive than a general overview, soak it up you nautical beast.

The VMware CPU scheduler is no hidden secret and many people including you reading this post probably know what it is and how it works. That aside I love learning about these kind of topics, and as I have said in previous posts, these topics to me are what make IT interesting. This first post will be a relatively high level over view of how ESX (referring to both ESX and ESXi) schedules CPU, in later posts we will dive a little deeper. Lets get to it!

Both the CPU Scheduler and Hypervisor (viewing them temporarily as separate entities) play a big part in CPU scheduling. The CPU scheduling is obviously done by the hypervisor, but and in lamens terms the CPU scheduler's role is to present and assign world (think VM) executions to the physical processors keeping within the constraints of acceptable performance for the VMs while prioritizing processor requests. While the CPU scheduler determines which world (think VM) to schedule, the hypervisor holds the task of presenting the physical CPUs to each VM( here is a duh moment when thinking about hypervisors but just to say it) as if they each own the physical CPUs. Lets figure out how the CPU scheduler does its job of prioritizing processor requests.

If you are any bit familiar with vSphere you most likely understand the concepts of shares, reservations and limits in regards to physical resources, these are the user-set configuration values for each VM within a vSphere Datacenter. For reviews sake, lets take a look at them very briefly:
  • Shares: Shares are simply used to prioritize resources for use by VMs when there is active resource contention. If you give a VM 1000 shares of CPU and another 500, when resource contention arises, the VM with 1000 shares will be granted access to twice as much CPU as the VM with 500 shares.
  • Reservations: This is a guaranteed amount of resources RESERVED for a particular VM. When a VM is powered on the reserved amount of resources are given to that VM and are "eliminated" from the pool of open resources that other VMs can use. A VM can(should)not power on if a the host it resides on does not have enough resources to meet the reservations of that VM. (DRS rocks)
  • Limits: A limit is as easy to explain as reservations, it is a limit on the amount of resources that a VM can use. I can never grasp this concept completely because personally if I wanted a VM to have a "limit" of 10 GB of RAM, I would only allocate it that much. Plus if you use limits and there are resources abundant, you could be potentially wasting available resources.
These configuration settings are known as entitlements and are what the CPU scheduler uses to determine priority when scheduling worlds (think VMs) to the physical processors. If you dont know where to edit these settings, just right click any VM and edit its settings within the vSphere client and go to the "Resources" tab. Peep the screen shot if you need a reference! BAM!


In basic terms the CPU scheduler looks at a world's entitlements compared to its used CPU resources, then the CPU scheduler affixes a numeric value to that request. The lower the number, the higher priority. If a world requests processor time and its entitlements are higher, its CPU usage is lower, and its assigned numeric value is lower than the requests for processor time from other worlds, it will be given priority. One of the core concepts is that when a world requests access to the physical CPUs, it needs access to as many cores as it has vCPUs. Although this method is becoming more "relaxed" the basic ideals apply. I highly recommend you go read my co-worker Jim Hannan's post about the CPU Co-Scheduler and its evolution from the strict Co-Scheduler in 2003 to the now [extremely] relaxed Co-Scheduler. Heres the link to his post:

http://www.houseofbrick.com/blog/vsphere-5-advantages-for-vbca-part-1.html?blogger=solutionsarchitects

To further illustrate how the CPU scheduler works and its fundamentals (although they are always being adjusted), lets take a visual looksie!:
Example1 (click for largification):


Basically what is shown here is an ESXi host with 2 CPU sockets each dual core. Currently the processor has 3 of 4 cores scheduled, which means that the only other VMs that could/will be scheduled will be either one of VM9, VM5, or VM6. Which VM will actually get the spot will be determined by the entitlements algorithm. Whichever VM has the lower numeric rating will take the last single core schedule. Both VM1 and VM3 are "disqualified" from getting  scheduled at the moment because they both have more vCPUs than cores are available. This is event is known as CPU Ready Time, which by definition means that the VM is ready and needs CPU, but there are not enough resources to accomidate the VM, so it waits....bad stuff happens if the CPU Ready time gets out of hand. My next blog post will be about CPU Ready and a real world example I just experienced at work this week. Lets take a look at another example.
Example 2 (click for largification):


In example 2 we can see that of the 4 cores of the system, only one is being accessed at the moment with 1 scheduled vCPU request. With a total of 3 cores available at the moment, VM9, VM5, VM3, and VM6 are all potential candidates for being scheduled. It will come down to entitlements once again. Lets assume that VM3 has been given more entitlements, one factor that alludes to this is that it has been assigned 2 vCPU instead of 1. So our buddy VM3 gets scheduled on 2 cores which leaves one last core for which ever of our single core VMs is entitled higher.

There are cases obviously where entitlements match and there is no real "winnner", I would refer you to the white paper mentioned at the bottom of this post for more details on that matter. SOOOO, we got all cores scheduled, good job ESX CPU Scheduler, one problem, our poor 4 vCPU, VM1, is still left out in the cold, cranking up his CPU Ready time. If CPU ready time gets to high, a VM could be potentially be waiting secondS (no I didnt fat finger the the "S", just emphasizing the point that theres multiple) for its CPU requests/needs to be met. Here is a design hint, keep VMs with like vCPU allocation on the same hosts, this will keep CPU Ready time lower. If you have a host with mostly 2 vCPU VMs, they will get scheduled quicker because 2 free cores will free up more often because other 2 vCPU VMs were just scheduled. Also, try to limit the vCPUs assigned to VMs to only the number they truely need, granted this is probably a "DUH" moment, people sometimes get a little vCPU, this is a classic case of how over allocation can hurt ya in the VMware wold. Check the VM performance in the vSphere client, my favorite at the moment is choose an advanced chart under performance and select the CPU usage by percentage and also view the CPU Ready time. CPU Ready time is safe at around 100ms to 150ms (real-time), the danger zone (welcome) for a real time graph would be 500ms. Look for my next post to get me details on this.

I am a firm believer that these concepts about the CPU Scheduler in ESX/ESXi should be known and understood to better grasp how to monitor and size your VMs, that being said, these concepts are changing and evolving with each vSphere release. Once again I will refer you to Jim Hannan's post that I linked up above, he discusses in detail the the evolution and current version of the CPU Scheduler. To make a point about the evolution of the CPU Scheduler, in both of the visual examples from up above, the relaxed Co-Scheduler will now allow some multi vCPU machines to be scheduled even if all cores are not available. Take a look at Jim's post DANGIT!

I leave you with this final screen shot, check it out, be stunned, pick your jaw up off the ground. Compare the safe and danger zones in millisecond I just shared with you for CPU Ready time, to what you see in this graph. I will talk about this is my next post. CPU Ready time for the win! (click for largification)


The link below is a white paper that I used as a companion while reading "Critical VMware Mistakes That You Should Avoid", its a REALLY good read. Many of the initial concepts found in the paper I mention in this post, but it goes way way ... way deeper than my post. You will notice its for vSphere 4.1, I feel like VMware is no longer writing these types of beneficial deep dives, so referencing the old docs is what you a lot of times have to result to.
http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf

1 comment:

  1. I admire the valuable information you offer in your articles. I will bookmark your blog and have my friends check up here often. I am quite sure they will learn lots of new stuff here than anybody else! Regards, vmware jobs in hyderabad

    ReplyDelete