Thursday, August 30, 2012

Deep Dive: VMWare ESX/ESXi CPU Scheduler

Reviewing the concepts covered in Larry Loucks book "Critical VMware Mistakes you Should Avoid", we will first look at the CPU Scheduler. Larry's book does a great job of giving an overview of the CPU scheduler and core concepts of how it works, I hope this post builds on top of the core concepts and I hope this post feels more like a deep dive than a general overview, soak it up you nautical beast.

The VMware CPU scheduler is no hidden secret and many people including you reading this post probably know what it is and how it works. That aside I love learning about these kind of topics, and as I have said in previous posts, these topics to me are what make IT interesting. This first post will be a relatively high level over view of how ESX (referring to both ESX and ESXi) schedules CPU, in later posts we will dive a little deeper. Lets get to it!

Both the CPU Scheduler and Hypervisor (viewing them temporarily as separate entities) play a big part in CPU scheduling. The CPU scheduling is obviously done by the hypervisor, but and in lamens terms the CPU scheduler's role is to present and assign world (think VM) executions to the physical processors keeping within the constraints of acceptable performance for the VMs while prioritizing processor requests. While the CPU scheduler determines which world (think VM) to schedule, the hypervisor holds the task of presenting the physical CPUs to each VM( here is a duh moment when thinking about hypervisors but just to say it) as if they each own the physical CPUs. Lets figure out how the CPU scheduler does its job of prioritizing processor requests.

If you are any bit familiar with vSphere you most likely understand the concepts of shares, reservations and limits in regards to physical resources, these are the user-set configuration values for each VM within a vSphere Datacenter. For reviews sake, lets take a look at them very briefly:
  • Shares: Shares are simply used to prioritize resources for use by VMs when there is active resource contention. If you give a VM 1000 shares of CPU and another 500, when resource contention arises, the VM with 1000 shares will be granted access to twice as much CPU as the VM with 500 shares.
  • Reservations: This is a guaranteed amount of resources RESERVED for a particular VM. When a VM is powered on the reserved amount of resources are given to that VM and are "eliminated" from the pool of open resources that other VMs can use. A VM can(should)not power on if a the host it resides on does not have enough resources to meet the reservations of that VM. (DRS rocks)
  • Limits: A limit is as easy to explain as reservations, it is a limit on the amount of resources that a VM can use. I can never grasp this concept completely because personally if I wanted a VM to have a "limit" of 10 GB of RAM, I would only allocate it that much. Plus if you use limits and there are resources abundant, you could be potentially wasting available resources.
These configuration settings are known as entitlements and are what the CPU scheduler uses to determine priority when scheduling worlds (think VMs) to the physical processors. If you dont know where to edit these settings, just right click any VM and edit its settings within the vSphere client and go to the "Resources" tab. Peep the screen shot if you need a reference! BAM!


In basic terms the CPU scheduler looks at a world's entitlements compared to its used CPU resources, then the CPU scheduler affixes a numeric value to that request. The lower the number, the higher priority. If a world requests processor time and its entitlements are higher, its CPU usage is lower, and its assigned numeric value is lower than the requests for processor time from other worlds, it will be given priority. One of the core concepts is that when a world requests access to the physical CPUs, it needs access to as many cores as it has vCPUs. Although this method is becoming more "relaxed" the basic ideals apply. I highly recommend you go read my co-worker Jim Hannan's post about the CPU Co-Scheduler and its evolution from the strict Co-Scheduler in 2003 to the now [extremely] relaxed Co-Scheduler. Heres the link to his post:

http://www.houseofbrick.com/blog/vsphere-5-advantages-for-vbca-part-1.html?blogger=solutionsarchitects

To further illustrate how the CPU scheduler works and its fundamentals (although they are always being adjusted), lets take a visual looksie!:
Example1 (click for largification):


Basically what is shown here is an ESXi host with 2 CPU sockets each dual core. Currently the processor has 3 of 4 cores scheduled, which means that the only other VMs that could/will be scheduled will be either one of VM9, VM5, or VM6. Which VM will actually get the spot will be determined by the entitlements algorithm. Whichever VM has the lower numeric rating will take the last single core schedule. Both VM1 and VM3 are "disqualified" from getting  scheduled at the moment because they both have more vCPUs than cores are available. This is event is known as CPU Ready Time, which by definition means that the VM is ready and needs CPU, but there are not enough resources to accomidate the VM, so it waits....bad stuff happens if the CPU Ready time gets out of hand. My next blog post will be about CPU Ready and a real world example I just experienced at work this week. Lets take a look at another example.
Example 2 (click for largification):


In example 2 we can see that of the 4 cores of the system, only one is being accessed at the moment with 1 scheduled vCPU request. With a total of 3 cores available at the moment, VM9, VM5, VM3, and VM6 are all potential candidates for being scheduled. It will come down to entitlements once again. Lets assume that VM3 has been given more entitlements, one factor that alludes to this is that it has been assigned 2 vCPU instead of 1. So our buddy VM3 gets scheduled on 2 cores which leaves one last core for which ever of our single core VMs is entitled higher.

There are cases obviously where entitlements match and there is no real "winnner", I would refer you to the white paper mentioned at the bottom of this post for more details on that matter. SOOOO, we got all cores scheduled, good job ESX CPU Scheduler, one problem, our poor 4 vCPU, VM1, is still left out in the cold, cranking up his CPU Ready time. If CPU ready time gets to high, a VM could be potentially be waiting secondS (no I didnt fat finger the the "S", just emphasizing the point that theres multiple) for its CPU requests/needs to be met. Here is a design hint, keep VMs with like vCPU allocation on the same hosts, this will keep CPU Ready time lower. If you have a host with mostly 2 vCPU VMs, they will get scheduled quicker because 2 free cores will free up more often because other 2 vCPU VMs were just scheduled. Also, try to limit the vCPUs assigned to VMs to only the number they truely need, granted this is probably a "DUH" moment, people sometimes get a little vCPU, this is a classic case of how over allocation can hurt ya in the VMware wold. Check the VM performance in the vSphere client, my favorite at the moment is choose an advanced chart under performance and select the CPU usage by percentage and also view the CPU Ready time. CPU Ready time is safe at around 100ms to 150ms (real-time), the danger zone (welcome) for a real time graph would be 500ms. Look for my next post to get me details on this.

I am a firm believer that these concepts about the CPU Scheduler in ESX/ESXi should be known and understood to better grasp how to monitor and size your VMs, that being said, these concepts are changing and evolving with each vSphere release. Once again I will refer you to Jim Hannan's post that I linked up above, he discusses in detail the the evolution and current version of the CPU Scheduler. To make a point about the evolution of the CPU Scheduler, in both of the visual examples from up above, the relaxed Co-Scheduler will now allow some multi vCPU machines to be scheduled even if all cores are not available. Take a look at Jim's post DANGIT!

I leave you with this final screen shot, check it out, be stunned, pick your jaw up off the ground. Compare the safe and danger zones in millisecond I just shared with you for CPU Ready time, to what you see in this graph. I will talk about this is my next post. CPU Ready time for the win! (click for largification)


The link below is a white paper that I used as a companion while reading "Critical VMware Mistakes That You Should Avoid", its a REALLY good read. Many of the initial concepts found in the paper I mention in this post, but it goes way way ... way deeper than my post. You will notice its for vSphere 4.1, I feel like VMware is no longer writing these types of beneficial deep dives, so referencing the old docs is what you a lot of times have to result to.
http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf

Tuesday, August 28, 2012

Powershell: Get the Services Running and Their Service Accounts on a list of Remote Machines


Time for me to play Powershell Ranger and throw a quick script at you! BAM. A co-worker asked me today to whip up a quick script that could retrieve the services and their corresponding service accounts (user account who the system uses to run the service) from a list or array of remote hosts. I first thought, "Hey no Biggie....Smalls!, I will use the Get-Service cmdlet". The issue with that as seen in the screen shot below the Get-Service cmdlet, when expanding out the full results for one service, does not include a property which details the Service Account.














So instead of using the Get-Service cmdlet we will turn to the Get-WMIObject cmdlet and query the win32_service class to get the values we need. As seen in the below screenshot this method of retrieving a systems Service details contains much more data and it even inlcudes the value "StartName", which contains the Service Account name.


I recommend trying both the methods your self to see the difference:
    Get-Service -Name WinRM | Select *
    Get-WmiObject win32_service | Where {$_.name -eq "WinRM"} | Select *

Now to the Script:

  • Let's review whats going on in this script. First off we create a variable " ServerArray", here is where you will add the names of the servers you wish to run this script against. Follow the instructions in the comment below to add the Server names correctly to create a variable array. If you have a list of servers in a text file, you can have Powershell grab the content of that file and create an array of server  names that way. To do that just remove everything after the "=" sign after ServerArray and enter the following code: Get-Content -Path <path to text file>, that's all! Remember for simplicitys' sake, one server name per line in the text file.
  • Next we let the user running the script either define the location to save the out put file that this script will make or let the script create the default directory which is "C:\Temp\SvcsandSvcAccounts\".
  • Next we have the script verify that the path exists and if it doesn't, the script will create the path to the save location using the New-Item cmdlet. Give a path to the New-Item cmdlet and it will create the file structure it needs to get to the final location.
  • After we define the path to the save location and create the directory structure to it, we begin to loop through each server in $ServerArray. We tell the user with the Write-Host cmdlet which host we are currently grabbing the information for and then go through and grab 4 values for each service.
  • The Get-WMIObject cmdlet uses the -ComputerName flag to connect to remote servers. In our case we pass the server names to the -ComputerName flag using the $Server variable in the foreach loop. Use the Get-WMIObject alias if you want also: gwmi.
  • In this script and many others of mine, I decided to use the calculated property capabilities of Powershell only to rename the Headings on some of the returned value columns, but this is only scratching the surface of the Powershell calculated property ability. I hope to discuss calculated properties in a later blog post. a calculated property example found in the script is any line that loks like this:  @{N="Name";E={$_.Expression}}, you can define your own Names for values in the "N" field, and do calculations in the "E" field. In my example I decided to just regurgitate the $_ variable with an associated property instead of doing anything more complicated with it.
  • In the final steps we do two very simple things: Sort the output using the Sort-Object cmdlet and out put the entire table to a text file called $Server-Services.txt, -or in our case, localhost-services.txt

# List multiple systems in the SystemsArray Variable: quote each system name and seperate with a comma, # this creates an array. Ex: "system1","system2","system3"


$ServerArray="localhost"
$DefineSaveLocation=""
if ($DefineSaveLocation -eq "")
    {$DefineSaveLocation="C:\Temp\SvcsandSvcAccounts\"}
$SaveLocaPath=Test-Path $DefineSaveLocation
if ($SaveLocaPath -eq $False)
    {New-Item -ItemType directory -Path $DefineSaveLocation}
cd $DefineSaveLocation
Foreach ($Server in $ ServerArray )
{
Write-Host "Retrieving Servers for $Server "    
Get-WmiObject win32_service -ComputerName $Server  | select Name,
@{N="Startup Type";E={$_.StartMode}},
@{N="Service Account";E={$_.StartName}},
@{N="System Name";E={$_.Systemname}} | Sort-Object "Name" > ".\$Server -Services.txt"
}

The generated output from the script looks like this, but you could do CSV, html, excel...whatever you like:














This was a super simple Powershell script, but with Powershell the possibilities are endless. More to come!

Monday, August 27, 2012

Shout Out: David Klee "Virtualizing Business-Critical SQL Servers – Part 2: Understanding the Physical Workload"

Another Shout Out coming full speed, course correcting to ram right into you. This time a post from David Klee, a Solutions Architect at House of Brick Technologies. In this Shout Out, bring on the bench marking, performance charting, workload tracking pie and give me the whole thing. David's posts is second part to a multipart blog post series on Virtualizing Business Critical SQL Servers. His post gives you a pretty detailed map on the areas to focus on so that you can know what you need to about your physical workload. Following his guidelines can help you effectively create baselines from scheduled periodic benchmarks and from there effectively tune your SQL Server to get the best performance. His blog post includes the following subjects:
  • Benchmarks
  • Disk Performance
  • SQL Server and Windows Metrics
  • Query Performance
  • Baselines
This was a really great post and there were a lot of things I got out of it, here is a snippet of one of my favorite parts:

"Now, what good is a benchmark if you do not have a running average metric to compare it against?
A baseline is a rolling average of your repeatable benchmarks. You should routinely (and not just once a year either) benchmark your systems and compare against a rolling baseline to see how things are performing. Once completed, update your benchmarks accordingly with the updated data.
[...]
The bottom line with benchmarks is that not only do you have an objective measure of the average performance of your systems, but, in the event of a problem, you have an objective means of defending your system. It can even help point out performance problems in other systems. If a performance problem does show up in your system, you have the ability to quickly determine the area that requires more focus, and the means to prove when it is resolved."


Baselines are to often over looked because of a lack of bench marking. Too commonly as a consulting company we are asked to perform health checks on customer systems, how much more beneficial those health checks would be if we had baselines to use to contrast against the new numbers and figures....sigh....

There was one customer who had baselines back 5 years, done quarterly, in easy to read and understand documents with graphing and metadata...oh wait....no......I guess it was all a dream, right Biggie? I used to read Word Up magazine.

Heres the link:
http://www.houseofbrick.com/blog/virtualizing-business-critical-sql-servers-a-part-2-understanding-the-physical-workload.html?blogger=solutionsarchitects

Also checkout David's personal blog @:
http://www.davidklee.net/

Or follow him on Twitter:
@kleegeek

Shout Out: Jim Hannan "vSphere 5 Advantages for VBCA -- Part 4 vMotion"

Call me silly, ....nevermind....don't. But I get amazed at features within vSphere that now a days are taken for granted a lot of times. I can't seem to ever get enough of the nitty gritty details of how all the components like vMotion, HA, DRS, NUMA, etc work and orchestrate together. Good thing for me, I work with a bunch of people who share that same desire to know and understand how the vSphere components work. Check out this blog post from Jim Hannan, a co-worker of mine and a Solutions Architect at House of Brick Technologies. His post is about Virtualizing Business Critical Applications/Systems and what are the gained benefits of doing so with vSphere 5.0.
Heres a snippet from his post that I particularly enjoyed:

"vMotion, originally introduced in ESX 2.0 with vCenter 1.0, offered a feature unlike any in the industry. For many organizations, it was the quintessential reason to virtualize their workloads.Care to guess what interactive application was used by VMware to demo the first migrations?Answer: Pinball."

I wasnt in the industry at that time, but I would have loved to see that moment in person:

~~~~*Que the harp music; Movie transition to a memory in the past that didn't happen*~~~~

Me: "Did you see that?"
Guy Next to Me: "Yes, I am sitting right here.....*pause*......watching the same thing you are?"
Me: "Did you see that.....S E R I O U S L Y?"
Guy Next to Me: "Yes....."
Me: "He saved the ball right when it was about to go in!...holy crap."
Guy Next to Me: "You're missing the poin........nevermind"
Guy Next to Me: *Maybe if I dont move....or breathe...he wont know I am here, Jurassic Park Style*
Me: *Guy totally missed that awesome save, VMware rocks*

Heres the link, spend 5 minutes and read it over, its an awesome quick post:
http://www.houseofbrick.com/blog/vsphere-5-advantages-for-vbca-part-4-vmotion.html?blogger=solutionsarchitects

Book Review: "Critical VMware Mistakes You Should Avoid" by Larry Loucks


Awhile back I was doing some shopping on Amazon for a few technical books that I could read in my free time, co-workers of mine had recommended a handful of books that they thought I would like. My day job provides a book allowance for employees and I had planned to take advantage of the whole shabang at once. I had funneled down to the last bit of allotted money, and had stayed true to the list compiled by co-worker recommendations, as I poked around some VMware related books, Larry Loucks' "Critical VMware Mistakes You Should Avoid" popped up as a recommendation, "Buy this book and get this one with it for a crazy deal!". I decided to check it out and while glancing at the details page I wondered what a 115 page book could really cover in comparison to 700 page books detailing all things that had anything to do with VMware. I decided to to take the plunge and grab the book with my last remaining bit of allotment. This turned out to be an awesome idea. The easiest way for me to tell you what I think about Larry's book is to throw a bulleted list at ya, so here it is:


  1. If asked to sum up what I think about the book in one sentence I would probably say something along these lines: "This book is a must read for IT professionals dealing with VMware who would like to understand some crucial deep concepts explained in a fashion that all degrees of understanding can benefit from."
  2. The books starts slow, at first I found my self thinking, "OK I get what you are saying, lets move on", but after a few pages, I could not put the book down. I was traveling at the time I was reading the book, and if I wasn't working I was pretty much trying to spend my time reading this book. Basically you start a section and the section starts relatively slow by explaining the basics and in what ways people do things incorrectly, and then he goes in for the deep dive and he explains why you do it the way he suggests and ties it all together. This is the prime reason this book turned out to be so valuable to me, to see how the simple creates the foundation for the complicated, and then getting the explanation how the complicated works.
  3. One of my favorite things about the book is the real world stories he shares from his consulting experiences. Larry Loucks has been in IT for over 24 years and for the past seven has specifically focused on virtualization and worked for VMware.  And as it says in the first paragraphs of the book, hind sight is 20/20. The idea of his book is to present you with that "hind sight" that he has gained to help you now.
  4. This book expects that you already have a VMware foundation. While at times he does cover relatively simple basic things, you will need that foundation for when he hits you with the awesome stuff like broadcast storms and proper vLANing, transparent page sharing, CPU scheduling, details of performance metrics, etc... 
  5. Lets say for instance you are an IT professional with the VCDX under the belt or a certification similar to that and above, you may not get a heck of a lot out of this book. My guess is that if you are that high in the food chain the advanced topics in the book, when they come, may not teach you anything you don't already know. Having said that, I work with some of the smartest guys in the industry for VMware performance and design. While talking with a couple of them about the content in the book, they were interested in reading it, and were excited to hear some of the principles we share with our customers which we hold as best practices reiterated in this book.
  6. Pictures and Graphs....gimmie gimmie. I originally started my dive into computers and tech in the graphics and web design realm. I am right brained person whos let the left side slowly in, but I still learn best with visuals and I believe I always will. Although the graphics in this book aren't fine art or detailed beyond belief, they made a world of difference while reading through this book. Now when I think about certain deep dive topics that were discussed in this book, instantly the visuals used come to mind.
  7. Don't be surprised when the authorship of the book isn't up to par with some of the more professional, or should I say, more "Official...eehhh heemmm....Boring, but more polished" authors out there. Lucky for me, Larry's writing style is how I personally enjoy reading books. His wording and structure is easy to read, you don't need to spend time wondering what he just said every 5 seconds, and I like when the author just seems relaxed and has a good time writing, which he pulls off well. To me, and maybe I'm crazy, but I think that when an author is relaxed, can make jokes in his wirting, writes in simple terms, and can still relay an excellent message, they know the stuff better than most others do.
  8. Typos, their r qwiete aye feuw of tthem. This is my biggest complaint about the book. I am a reader that when I am in the zone reading and trying to absorb, when I hit a typo it totally throws me off. Maybe thats just me as a reader....but I dont think I am alone. But, when my biggest complaint about a book is typos, I think it has accomplished its mission.
In my next few blogs, I will take a few key topics from "Critical VMware Mistakes You Should Avoid", explain them in detail and describe why they are important to know and hopefully provide some visuals for those of you who are like me. Topics will include: Transparent Page Sharing (*sigh* vRAM licensing......YAY Its gone!), CPU Scheduling, and proper vLANing to avoid broadcast storms.

I give this book:





9 out of 10 monsterrrrrrs: "Highly Recommended"