If it wasn’t for the cost, I would…
Cost is one of the major barriers to doing almost anything. With enough money and resources, a person can do anything, but this makes a lot of things unfeasible because we don’t have an unlimited supply of money.
When we tried to create a solution to mobilize Windows applications for 500 users, cost was a major concern. How can we create this solution while keeping costs in check?
Let’s use local storage!
Of course anytime you talk about local storage, you get tons of negative reasons why it won’t work.
When it comes to storage, the fear is that you won’t have enough performance to keep up with user demands. This is understandable, especially as servers get faster and traditional disk spindles remain the same, spinning at 15,000 RPMs. However, XenDesktop App Edition (XenApp) is different. It is different because you don’t have a single OS for each user; you have a single OS for many users. And because of this one important point, storage performance is not what you would expect.
But this is mostly theory. Theory is good, but I like to see theory put to the test. As I’ve said before, we wanted to validate that this solution is in fact viable, which is why we had the Citrix Solutions Lab help us with the testing.
First, we need to understand our Read/Write ratio. For Machine Creation Services, we have typically said that for Windows 7 desktops, we have about a 40/60 ratio between reads/writes; Provisioning Services is 10/90. What about when doing a Windows Server 2012 workload? As we’ve seen in previous versions Provisioning Services has a similar R/W ratio regardless of the operating system. What about Machine Creation Services? This is the first release where Machine Creation Services can do Windows Server 2012. Will it resemble a Windows 7 desktop R/W ratio?
Not even close
I will be completely honest with you, this result completely shocked me. It surprised me so much that we ran the test 3 different times and got very similar results. I was still skeptical and had them re-run the test a 4th time roughly 3 weeks later. Same results (all using Windows Server 2012 with Hyper-V, by the way).
So the R/W ratios are very different between Windows 7 and Windows Server 2012. What about steady state IOPS per user? Just so you know, when trying to determine steady state IOPS, I prefer to look at the 95th percentile, instead of an average. That way we make sure we don’t under-allocate storage. If we look at the Windows Server 2012 test using Machine Creation Services, you get the following results
Regardless of which of the 4 tests I looked at, the numbers and graphs were almost identical. This is the highest of the 4 tests resulting in 6 IOPS per user at the 95th percentile (average is roughly 5 IOPS).
So what does this mean? It means that local storage, as we configured it within the Mobilizing Windows Applications design guide, is a viable, low-cost option.
Daniel – Lead Architect
Two years ago, I wrote a blog called “Lessons Learned with vCPU allocation“. This was still fairly early in the world of virtual desktops. But with numerous successful projects, we were able to start generating sizing estimates for virtual desktops. We talked about how many vCPUs we should allocate, how many users we expect to get per physical core, how much RAM we need and how many IOPS will we generate. I wanted to go back and see if some of the best practices I offered years ago still stand up to scrutiny. If not, I want to know why.
First, let’s look at how CPU recommendations have changed (BTW, I’m only looking at Windows 7).
|User Group||2011 Estimate
(Pooled VDI with PVD)
|Light||8-10||Dual Socket: 13
Quad Socket: 11
|Dual Socket: 11
Quad Socket: 9
|Normal||6-8||Dual Socket: 10
Quad Socket: 8
|Dual Socket: 8
Quad Socket: 6
|Heavy||2-4||Dual Socket: 5
Quad Socket: 4
|Dual Socket: 4
Quad Socket: 3
A few things to notice: We know that having a quad socket server does not scale linearly from a dual socket server. We also can see that the number of users we expect to get from a core has also increased. This basically means we can pack more users onto a server. What changed? Why are we higher now than 2 years ago? Processors have gotten faster and many of the software-based hypervisor-related instructions have been incorporated into the chip. Plus, we’ve learned to properly optimize our desktops, which will help increase user density.
Do you think anything changed with RAM recommendations?
|User Group||2011 Estimate
(Pooled VDI with PVD)
|Light||1-1.5 GB||1 GB||1 GB|
|Normal||1.5-2 GB||2 GB||2 GB|
|Heavy||4 GB||4 GB||4 GB|
Looks the same, which is not unexpected. We are using the same operating system, the same applications, etc.
|User Group||2011 Estimate
(Pooled VDI with PVD)
The original IOPS numbers for 2011 were based on Provisioning Services. Looking at the 2013 estimates, we are still in line with our recommendations from 2 years ago for Provisioning Services. Again, same OS and applications generating the same load.
All in all, with slight modifications to user density based on CPU, our original recommendations are still valid. You can find these latest recommendations in the Citrix Virtual Desktop Handbook, as well as so many other good best practices.
I also just got back from BriForum 2011 – Chicago and attended two sessions that furthered my beliefs that blanketing antivirus across all of my virtual desktops probably isn’t the best thing. First, Jim Moyle focused his session on a deep dive into Windows IOPS and showed how different actions impact IOPS requirements in a virtual desktop. Let’s just say the graph for certain Antivirus and security products were absolutely crazy. Basically, if you run antivirus in a virtual desktop, you might as well double your IOPS requirements (this is not news to me or many people in the crowd, but the graph was so telling). Michael Thomason, who presented on how to mitigate IOPS requirements also said their Antivirus killed their storage and that they had to take drastic measures by limiting what was being scanned. Then, I remembered looking at Citrix’s recommendations for Antivirus in a virtual desktop. Basically, you should only scan writes to local files where the data changes while excluding a bunch of other folders. Basically, it says you should scan as little as possible.
Three different areas and I get the same result: Antivirus has a noticeable disk impact.
So what we have is a situation where we will double storage requirements for something that everyone believes is a requirement, but we take drastic steps to limit how much/how often it runs to try and reduce storage requirements. Does anyone see the problem here? People think they need it but take steps to limit it. Many believe that what was once good for the desktop is still good for the virtual desktop. Fortunately, things have changed and we have to question our old beliefs. Unfortunately, changing old beliefs, especially anything to do with security of IT systems in an enterprise, is a very big uphill battle. How many of you want to go into a financial company and say remove your antivirus software from you desktops. They would laugh at you while security threw you out the front door.
However, with the traditional desktop, the costs of using antivirus were minor. We just did it because it provided a sense of security. We never cared about storage optimization and performance on a traditional endpoint (at least I never did). With virtualization, things changed. We do care about storage performance. I know more about IOPS now than I really care to know. ProjectVRC Phase III tests show how to reduce and optimize IOPS. So why is no one asking the question if one of the biggest IOPS consumers is really a requirement? No one dares to ask the question because it is almost a forbidden topic.
Now let me make this clear… I do not have a virus scanner on my laptop, I do not have a virus scanner on my home desktop, I have never had a virus scanner on any of these devices, and somehow, I have never had a virus. Now the smart ones reading this are asking “But if you don’t have a virus scanner, how do you know you don’t have a virus?” Because every so often (maybe yearly or every ½ a year), I run a free scanner that doesn’t require an install just to see if everything is still clean (it always is)
How can I go so long without getting a virus? Is it because I don’t go online? Is it because I’m completely disconnected from the network? No. I work like anyone else. Being virus free used to be pretty hard to do, but it has gotten so much easier over the past few years. There are systems in place protecting me from doing very stupid things. As I see it, there are only a few places where I will get a virus, but systems protect me.
- The Citrix IT team is running virus protection on the Exchange email servers. I can feel pretty confident that I am safe with corporate email.
- Google, Yahoo and Microsoft have virus protection running on their email systems. When I receive attachments and try to open, each one scans the file first (although they are probably just reading my email and realize I lead a pretty boring life). This scan helps protect my personal email.
Internet: I usually stay on pretty well-known and safe sites (especially on my work computer), but sometimes I accidently hit a pop-up and next thing you know, I’m somewhere I don’t want to be. Luckily, the browsers are much smarter than they used to be
- Some will tell me if the site I’m going to isn’t safe
- Some will ask before downloading anything
- Some will scan downloaded files for viruses
- Even Windows 7 doesn’t install anything unless I tell it that it is ok
- Most run with user privileges and not administrative privileges
- Sharing USB drives: I don’t. If someone wants a file from me, I usually just ask for email address. This doesn’t happen very often though as I have nothing of value on my laptop J And if you are running a virtual desktop, you can simply disable this functionality.
- Network: If someone else gets a virus, there is a chance that the virus will worm itself across the network and infect other desktops. With my firewall enabled, this provides some level of protection.
Do these protect me completely? No, and I’m not so naive to believe that they do, but antivirus solutions don’t completely protect me either. My point is that these other solutions provide enough protection for the level of risk I can tolerate. Does this mean you should dump your antivirus from all of your virtual desktops? No. But I do encourage you to look to see if you need it on every desktop. Maybe you would be better off
- Splitting your XenDesktop sites into security levels where only the most secure desktops have antivirus because they are dealing with your company’s secret recipe.
- Setting up your environment in such a way that you have blocks of desktops where the one block cannot infect other blocks. That way, in case a virus does get through, the area to attack is much smaller and easier to contain.
- Hosting mission critical applications as XenApp resources with antivirus enabled to a non-antivirus enabled virtual desktop. That way you still keep that warm fuzzy feeling of having an antivirus solution but it doesn’t have nearly as large of a resource hit as putting it on every desktop.
Whatever you do, think about the decision, the ramifications and your tolerance for risk. Citrix says one size doesn’t fit all for virtual desktops, and I say the same statement can be made for Antivirus.
Deciding between PVS and MCS is a tough decision for many organizations. Although MCS is limited in that it can only do virtual machines, it does appear to be easier to setup than PVS. In fact, MCS just works while PVS requires another server and configuration of bootstrap processes like TFTP/PXE. So it sounds like we should be using MCS for everything. Right? Not so fast. We need to look at the resource requirements, beyond servers, as this might negate the benefit of easier setup/configuration.
First, we know that using PVS will require, at least, 2 additional servers (remember I’m including fault tolerance). MCS doesn’t require any extra hardware besides the hypervisor. Now let’s look at storage requirements, and with storage I’m talking about IOPS, our favorite topic.
If you look at PVS, we do all reads from one location and do all of our writes on another location. Because of this, we can optimize the systems. We know that on the PVS server, we allocate enough RAM so the reads happen in RAM and not disk, thus greatly reducing read IOPS.
MCS is different. The reads and writes happen on the same storage. This is a big deal. Look at the graph below.
We know that during desktop startup, we have a huge amount of read IOPS, during logon, it is evenly split, and during the steady state (working), the ratio moves towards writes. Most people are concerned with the boot and logon storms. Because these are more read intensive, you would think that PVS would be the better option for large boot/logon storms as we cache the vDisk in PVS RAM. This line of thinking is correct.
Now before people say, “Hey, my SAN can cache”. You are correct. Of course there are SAN caching solutions, but these cost money. With PVS, it is just part of the Windows operating system. Because of this, we can see a MCS implementation generating more IOPS than a PVS implementation. How much more? I’ve seen as much as 1.5X more IOPS. For deployments with 50 desktops, this might not be a big deal, but what if you are talking about 20,000 desktops?
Some might be thinking about using XenServer IntelliCache to bring these more inline. This has a potential to help lower MCS IOPS requirements, but the products aren’t integrated yet, so I’ve got no data points to share.
But regardless, you need to take the resource requirements into consideration before making your PVS/MCS decision.
I have seen a lot of scalability reports lately around desktop virtualization. This is good in that we can start to see how the different things we do can provide better capacity. However, one thing that does trouble me is when I see tests only allocating 512 or 768MB of RAM to a Windows 7 VM. Sure it works. And yes it does successfully complete the scalability test, but remember what the scalability test is testing. It is not telling you how many users YOU will get. It is telling you how well the infrastructure can scale and what bottlenecks we might experience when the hardware is stressed. Unfortunately, because of these tests, too many people believe that they too can roll out a virtual Windows 7 desktop on 512MB of RAM. I wish that was the case. In fact, I bet Microsoft wishes that was the case as well. But I’m sorry to say, but sadly it’s true that it is not.
I wanted to provide you with what we (myself, Nicholas Rintalan, Doug Demskis and Dan Allen) figure is a reasonable estimation for resource allocation for Windows 7 and Windows XP desktops when delivered in the hosted VM-Based virtual desktop model (or VDI for short).
|User Group||Operating System||vCPU Allocation||Memory Allocation||Avg IOPS (Steady State)||Estimate Users/Core|
|Light||Windows XP||1||768MB-1 GB||3-5||10-12|
|Windows 7||1||1-1.5 GB||4-6||8-10|
|Normal||Windows XP||1||1-1.5 GB||6-10||8-10|
|Windows 7||1||1.5-2 GB||8-12||6-8|
|Power||Windows XP||1||1.5-2 GB||12-16||6-8|
|Windows 7||1-2||2-3 GB||15-25||4-6|
|Heavy||Windows XP||1||2 GB||20-40||4-6|
|Windows 7||2||4 GB||25-50||2-4|
See anything shocking? How about 1.5 GB of RAM for light Windows 7 users? Remember, we are talking about the typical implementation that we have seen. That means the desktop image includes antivirus agents, malware agents, monitoring agents and line-of-business applications. These agents and applications add up (especially Line-of-Business apps). Even though the user is a light user, that means they only run 1 or 2 applications. However, those applications are more than Microsoft Word. They are the main Line of Business application. So even though they don’t hit the CPU hard, they still consume a lot RAM (of course these implementations could just put the line of business app on XenApp and not worry about providing a true Windows 7 desktop for light users).
Are you wondering what defines the four groups of users? Here is how we define them:
|Light||One or two applications no browser-based activity|
|Normal||Multiple applications with limited browser-based activity|
|Power||Many simultaneous applications with extensive browser-based activity and Internet-based applications.|
|Heavy||Few applications but have heavy system resource requirements. Data processing, compiling, or graphics manipulation are common applications.|
I’m hopeful that as you start planning your XenDesktop environment, you use realistic approximations on your virtual desktop specifications.
If you want to know more about resource allocation as well as many other areas for planning a XenDesktop environment, then sign up for the XenDesktop Design Handbook. This helps guarantee that you have the latest and greatest design information available.
Daniel – Lead Architect