Sunday, September 30, 2012

VCP5: Resources and Refrences for Study

The VCP5 exam is a well written and structured test. It was a surprisingly difficult exam, and I was caught off guard. Based on mock exams and practice exams I was taking, as one portion of my preparation, I should have blown the test right out of the water. I feel like nothing I had done had properly helped me prepare for the exam so because of that fact I felt a little mad when I had failed the test on try 1. So in an effort to provide a better reference and study companion for people interested in taking the VCP5 exam, I plan to write a large group of blog posts containing key topics and areas of focus that will aid in helping people understand what they are studying and why they should know the information presented to pass the exam.
I took the VMware 5.0 Fast Track course in Chicago about 6 months ago (if you dont know, attending a VMware certified course is a requirement for becoming certified). I must warn everyone, taking the classes will not provide you with 90% of the knowledge you need to pass the exam as my teacher so commonly mentioned. I consider the classes to give a person about 20% of what they need to know to pass the exam. If the exam asked questions like "How do I power on a VM", then sure, the class would give you sufficient knowledge to pass. But I will tell you, you wont find little baby questions like that on the exam :)
What you WILL find in the VCP exam is that topics are covered in a way that someone might say to them selves, "Well that was a twist on what I thought the question might be". For instance, someone might study how to set the Maximum Transmission Unit to allow for Jumbo Frames. They might know from start to finish how to set the MTU, on what object you can set the it, and that 9000 is the number to configure for Jumbo Frames. But when they sit down for the test it may ask what is the default MTU, and that person may have spent so much time focused on the Jumbo Frame MTU size that they totally spaced on the default size. This to me would be a perfect example of how VMware successfully "twists" the questions of the exam to determine if the examinee has a deeper understanding of the material.
The main purpose of this post is to provide a list of references that I found helpful in my study for taking the exam. Sooooo, BOOM!, here's a bulletted list containing those references:

  • My favorite reference for studying became the vSphere Documentation Center found here http://pubs.vmware.com/vsphere-50/index.jsp. A couple reasons why it was my favorite are that it is extensive and covers nearly ever topic, it is easy to understand, and most important is that it gives you steps to follow when troubleshooting common issues that a feature may have. The exam covers a lot of troubleshooting.
  • The VMware Knowledge base. Found here, http://kb.vmware.com, the VMware KB is such an awesome library. My method of study was to find a topic and then physically go through the steps in a lab I have availability to at work. When I would run into an error or couldn't quite get something, the KB was invaluable.
  • VMware vSphere Clustering Technical Deepdive by Duncan Epping and Frank Denneman was an awesome source for learning the internals of Clustering in vSphere.
  • The VCP5 Exam Blue Print Found here: http://communities.vmware.com/servlet/JiveServlet/previewBody/16726-102-7-23055/VCP510%20Exam%20Blueprint%20Guide%201.4.pdf. I used this to know what I NEEDED to know, and then I would go about doing it in the lab and documenting what I did using screen shots.
  • The Official VCP5 Certification Guide is a newly released book from VMware. If I had to summarize this book, I would say it is a more helpful version of the classes. I like how the author writes and enjoy his simplicity. The book comes with testing software which is nice, but very misleading because of how easy it is. The ebook version comes with even more testing software that is unfortunately misleadingly easy too.

I recommend having access to some degree of a vSphere lab, although I will not directly recommend using a production environment, if you only had access to that type of environment, create your self a read-only account so you can safely poke around and you wont have to be worried about accidentally hitting something you didnt want to! You do not necessarily need to be able to vMotion and enable FT, as long as you have done them before, but knowing where to find these features and know where to configure them coupled with the references above should be sufficient to be successful on the exam. Look for future posts where the title starts with VCP5, these will contain gold for studying. Good Luck!


Wednesday, September 26, 2012

VMware Certified Professional 5

It has been awhile since I composed a new blog post, there was a really good reason for it...I swear! I had been studying for quite awhile to take the VCP5 exam. If you have never attempted this test, be forewarned, it is extremely hard. I had studied super hard, I had even bought the new Official VCP5 Certification Guide published by VMware and blew through the book in a week, and felt very comfortable with my knowledge of the subjects found in it. The book comes with testing software and if you buy the "Special" edition eBook it comes with more advanced testing software than just the book comes with. I was blowing those mock exams, provided in the testing software, out of the water. I had been scoring 480 out of 500, and taking only 30 minutes to do it too. I thought I was set! I signed up and took the exam and I was metaphorically pants'd. My first attempt was a 275 out of 500, needing 300 to pass. I was surprised at the difficulty of the test, they truly want you to have a deep knowledge of the material and troubleshooting methods. I felt pretty frustrated because a lit if hard work had been done, but I knew I needed to change my preparation methods. The key for me was doing more hands on versus just reading more. I signed up for the next soonest exam date I could (you have to wait 7 days), and through a lot of hard work and help from my wife and co workers I passed the exam on 20th. I plan to share my studying techniques and, recommend in detail areas of study in my next few posts while it is fresh in my mind. Thanks to everyone who helped me get to this awesome milestone in my career, thanks to all of those co-workers at House of Brick Technologies who helped me out, answered questions when I had them, and was supportive of me taking on this exam, you know who you are! More to come on prepping for the exam! WOO HOO

Tuesday, September 4, 2012

Deep Dive: CPU Ready Time

In my last post we tackled the topic, "VMWare ESX/ESXi CPU Scheduler",with extreme prejudice, this week will be an extension of that post. Lets talk about CPU Ready Time, how bout' it? (if you said "No", Click Here...). To put it simply: CPU Ready time is your enemy, but, to give a more formal definition of what CPU Ready Time: CPU Ready Time is when a Virtual Machine exists in a state when it is ready for processor time but cannot be scheduled on the processor. The CPU is at a point when it is "Ready" for processor time. I mentioned in my last post that CPU Ready time can be a performance killer and over allocation of vCPUs is most likely at the heart of the problem. To throw some data and thoughts at you really quick, I love me a good bulleted list and I hope you do too:
  • When possible, keep like CPU allocated VMs on the same hosts or clusters. This will help reduce CPU Ready time to a safe amount. VMware suggests 5% CPU Ready time is within the safe limits for a VM to live with. We will discuss safe zones a little later in the post. Regardless of recommendations, everyone should establish their own safe and red zone numbers. Even ours don't coincide with VMware's recommendation.
  • An ESXi host will try its best to keep a VM's requests for processor time, running on the same physical cores for caching purposes. If the benefits of moving the VM's processors requests to new cores is more efficient than waiting for the CPU scheduling to allow for processor time (CPU Ready Time), the ESXi host will clear the processor cache and move the existing and new processor requests to different cores, this move takes time and the effects can be felt in performance but the host has determined this to be the better choice than to let it accumulate any more CPU Ready time.
  • Many times CPU over allocation occurs when P2V'ing a physical machine into your Virtual environment. Although your super serious amazing awesome database server needed 12 CPUs in the physical world, monitor overtime and determine how many of those you can knock out of the picture. Maybe it could be taken down to 8 or even 6 vCPUs. 
  • In many cases, I would bet that 60-75% of workloads can fit within the 1-2 vCPU range, and I would also put money on 75-90% of workloads needing no more than 4 vCPUs. As a side note: I was reading this article on Ars Technica and found some of the hypervisor host stats for Hyper-V compared against VMware, pretty crazy and honestly, unnecessary. By the time I need 320 CPU hosts (yay Hyper-V), I would have hoped I would have been smary enough and separated that load into multiple hosts and created maybe a new cluster for high availability instead. I don't doubt there are some massive companies with an insane virtual infrastructure that could benefit from that size of hosts, but most companies and workloads don't come close to that. END OF SIDE RANT.
  • I recommend keeping a close eye, however you go about doing that in your environment, on the CPU utilization and CPU Ready Time, and always look for opportunities to reduce vCPU assignment or create like allocated vCPU Virtual Machines.
  • If you have hosts or clusters that have a huge amount of resources that are barely getting used by the VMs in your environment, you may be able to get away with not caring to much about mixing like vCPU assigned VMs. The effects of CPU Ready time are being masked by the fact that there are a bunch of physical resources. Wait for your work load get up to a medium amount of usage and your performance will start to ache because of that hidden CPU Ready time. 
With my awesome bullet-ed (spell checker wanted a hyphen, lame!) list aside, lets take a look at what in the world was going on in the "cliff hanger" picture I left at the bottom of my last post. Here is it once more:








So what are we looking at here? This is obviously a vSphere performace chart, it is showing the CPU Usage and CPU Ready time for a production VMware View Connection Server. To make a long story as short as I can, I was looking at the currently assigned Memory for this machine when I noticed the VM had been assigned 4 vCPUS.











I thought to my self, "Why in the world does this VM need 4 vCPUs?" Turns out after reading some additional View documentation, 4 vCPUs is the default recommended setting. So anyways, I thought just for fun I would take a look at the CPU usage to see how much this baby was actually using  Here is what I initially saw once I enabled the CPU Ready chart metric:









This VM was getting CPU Ready times on the "Real-time" chart of 2,500 to 3,700 milliseconds consistently . Its one thing to get some bad CPU Ready time in a blip or moment, but this was consistent and "reliable" CPU Ready time. This VM's performance was basically getting destroyed. To put these numbers into perspective lets take a look at how to interpret these millisecond numbers. I recommend learning and understanding what good and bad millisecond values are for CPU Ready Time, since this is how the vSphere performance charts presents this metric. Many people want % values, which are not bad, but because vSphere charting goes off of millisecond, I find it more beneficial to read the millisecond value and know what they mean.

If you know anything about how VMware saves statistical data this wont be news to you. But when you look at any performance chart BUT the "Real-time" view or the "Refreshes every 20 seconds view", the numbers you see present you with what looks like "higher" numbers. This seems this way because the stats are rolled up and its actually an average of the statistics saved.

To further describe roll-up effect, here are some basics to work off of.
  • The "Real-time" performance chart is updated every 20 seconds. As a good base line, 120 - 175ms is acceptable for CPU Ready Time on the "Real-time" chart. 500 ms is our red zone. For those of you who want percentages here is the equation: 
    • <currentValueINMilliseconds>/<intervalINMilliseconds>*100
    • For example 170/20,000*100 which gives us a percentage of .8% CPU Ready time. 
    • Doing the math, so you dont have to, our red zone is about 2.5%.
  • The "Day" performance chart shows the rolled up statistics for a day in 5 minute intervals. So if our Real-time interval is 20 seconds and 500ms is the red zone, our 5 minute interval red zone would be 7500ms. 
    • (5*60/20)*500=7500.  OR to simplify. From here on out our red zone is 1500ms for every minute.
  • The "Week" performance chart shows the rolled up stats for a week in 30 minutes intervals. So 30*1500=45000 is the red zone for the week chart.
Although I could do the math, because I am wicked smart, I will let you all do the rest of the math for the Month and Year charts and for any custom charts you make. Make sure you check your stats intervals on your vCenter(s) as the defaults are not always used. Access the interval settings this from the "vCenters Settings" button on the home page inside of vCenter. Heres a screen shotters for assistance:



So let me finish the story from earlier about the View Connection Server I was monitoring. To sum things up really quickly, the "Real-Time" chart screen shot I shared earlier with 2,500 to 3,700ms means that for 16% of the time this VM was not able to do anything, it was just sitting there and waiting! When would numbers like that ever be acceptable?

INITIATE SCIENTIFIC STUDY BENCHMARKS

Advanced Scientific Study #1:
IT Manager: Why is this VM so dang slow?
IT Employee: Uhhhh, well it works 84% of the time...
Outcome: NOT ACCEPTABLE

Advanced Scientific Study #2:
PotentialCustomer: What kind of guarntee can we expect for up time if we decide to go with your facility?
PotentialDatacenterFacility: Uhhhh, we are up about 84% of the time around here...
Outcome: NOT ACCEPTABLE

Advanced Scientific Study #3:
Wife: How much do you love me?
Husband: Uhhhh, about 84%...
Outcome: DIVORCED

To finish up here (this post is SOOOO LONG) I asked for a downtime window from the customer, and went in and removed 2 vCPUs. The end result of removing those 2 vCPUs is what is shown in the initial screen shot highlighted by the "BOOM!". The CPU Ready time drops off and is almost non existent. One last picture, here is a screenshot showing the "Real-time" view with the CPU Ready metric active. This screenshot was taken about 30 minutes after I changed the vCPU assignment. Check out the dive CPU Ready time took, AWESOME.








I hope that this post has been a helpful tool in helping you understand CPU Ready time more fully. I hope the real world scenario described in the post helps tie it all together. Be a Monsterrrrrr at work, destroy CPU Ready time.