As you saw in a previous blog, XenDesktop 7.5 is able to achieve an average IOPS value of less than 1/10th per user. Of course when you put out unbelievable results like this you hear a lot of comments trying to find holes in the results or test procedures. This is as it should be as it is part of any good scientific method.
In order to show a more complete picture of the value of the new Provisioning Services Ram Cache with Disk Overflow, we gathered additional details from the Citrix Solutions Lab’s tests. This set of data includes details for the duration of the entire test that included logons for roughly 100 users (sorry but it didn’t include boot. However, booting is mostly a read operation that PVS can handle with server-side cache).
The results are still just as stunning as the steady state:
For a physical host, we accumulated IOPS numbers for each of the virtual desktop sessions then combined into a single graph. As you can see, during the logon portion of the test we had a peak, and I mean maximum IOPS value, of 12 IOPS.
What if we don’t break it down by user, what would the host’s total IOPS graph look like?
The absolute peak is 155 IOPS on a host that is running 100 VDI VMs.
Tests details were as follows:
- LoginVSI 4.0 medium workload
- Hypervisor: Hyper-V 2012R2 and vSphere 5.5
- Virtual Machine: Windows 7, 2 vCPU and 2.5 GB of RAM (512 MB as defined for the RAM Cache)
Based on results like this, I’m left to wonder how many users I can support on my mid-1990s college PC (Pentium 486 with a 420MB hard drive) J
From the virtual mind of Virtual Feller
It is amazing when you’ve been focused on a technology for so long that you start to see major improvements. In 2010, I provided my original guidance on XenDesktop IOPS. Four years later, have we seen any major improvement? See for yourself.
As you might be aware, I’ve been working with the Citrix Solutions Lab on validating standardized designs. These validated designs are published as Citrix Design Guides. Part of this latest round of testing with XenDesktop 7.5 and XenApp 7.5 was focused on the new Provisioning Services write cache option “RAM Cache with Overflow to Disk”. When looking at XenApp 7.5, we observed some astounding results, as detailed in the following blogs:
In addition to these, Dan Allen also released a blog on “Turbo-Charging Your IOPS – Part 2” that showed additional impressive results.
But back to the Solutions Lab testing… I’ve finally started going through the results from the XenDesktop 7.5 portion of the test. And what we see is even more amazing than the XenApp tests (which were impressive).
Using the medium workload from LoginVSI 4.0, we observed the steady state IOPS following:
- MCS: 10 IOPS
- PVS with Disk Cache: 9.2 IOPS
- PVS (RAM Cache with Overflow to Disk): 0.09 IOPS
And before you ask, we saw very similar results with vSphere 5.5
We used Windows 7 virtual machines with 2 vCPU and 2.5 GB of RAM (512MB was defined for the RAM cache)
With this one feature within Provisioning Services, we got steady state IOPS to be less than 1/10th of an IOPS without any special configurations. Impressive
The main challenge I see is how the Provisioning Services team will improve upon this. All I can say is “Good luck!”
From the virtual mind of Virtual Feller
As technology changes, so too does a recommendation.
For years when you deployed XenApp servers with Provisioning Services, the storage Read:Write ratio would be 10:90. This is still the case in most scenarios. But in analyzing the latest data from the Citrix Solutions Lab, who were testing the “RAM Cache with Overflow to Disk” option, we encountered some results that will make us revisit some of our old recommendations.
- IOPS: For a medium workload on XenApp 7.5 on Hyper-V 2012R2, the average IOPS per user is 1, as explained in the previous blog.
- R:W Ratio: When using the new write cache option on Hyper-V 2012R2, the read:write ratio changes to 40:60. (Note: These numbers are taken at the physical host layer and not the VM layer)
Why is this? Why the change?
Think about what the RAM Cache with Disk Overflow does… It uses a section of allocated VM RAM to cache disk activity. As this cache fills up, it will start to move portions to disk. If you allocated enough RAM, you significantly reduce the number of IOPS (especially write IOPS). Look at the differences between PVS Disk and RAM Cache options
We’ve significantly reduced write activity because writes go to RAM. And whatever writes do make it to disk from the RAM Cache are bigger block sizes, thus also helping to reduce IOPS.
And finally, if you look at the disk idle time on the physical host, you can clearly see that the disks have a higher idle percentage when using the new RAM Cache with Disk Overflow option within PVS because we have less data going to the disk.
So far, the RAM Cache with Disk Overflow option is looking very promising. Soon, I’ll show you what it can do for Windows 7 workloads
For this setup, we used
- LoginVSI 4.0 with a medium workload
- Hyper-V 2012R2
- XenApp 7.5 running on Windows Server 2012R2
- 6 vCPU, 16GB RAM, 2 GB RAM Cache
- 7 VMs per physical host
From the virtual mind of Virtual Feller
As we all know, IOPS are the bane of any application and virtualization project. If you don’t have enough, users will suffer. If you have too many, you probably spent too much money and your business will suffer. So we are always trying to find ways to more accurately estimate IOPS requirements as well as finding ways to reduce the overall number.
About 5 months ago, I blogged about IOPS for Application Delivery in XenDesktop 7.0. In the blog, I explained that for the XenApp workload, Machine Creation Services, when used in conjunction with Windows Server 2012 Hyper-V, required a significantly fewer number of IOPS than Provisioning Services. With the release of the 7.5 edition of XenApp and XenDesktop, I wanted to see what the latest numbers were on Windows Server 2012R2 Hyper-V while using the same LoginVSI medium workload. In addition, after reading Miguel Contreras’s blog on “Turbo Charging IOPS“, I wanted to see what impact the Provisioning Services “Ram Cache with Overflow to Disk” option would have on the results.
If you aren’t familiar with this caching option, it is fairly new and recently improved and I suggest you read Miguel’s blog “Turbo Charging IOPS“, to learn more. But essentially, we use RAM for the PVS write cache that will move portions to disk as RAM gets consumed, thus overcoming stability issues you would have with just a RAM Cache only option as the cache filled up. For this test, we allocated 2GB per XenApp VM. We only have 7 XenApp 7.5 VMs on the host, requiring 14GB total.
The results are impressive (at least to me they are).
On average, each XenApp 7.5 user requires 1 IOPS. If you want to be safe and go with the 95th Percentile, you have roughly 2 IOPS per user. We were able to achieve this without any complex configurations. We were able to achieve this without adding any new hardware to our servers. We were able to achieve this by literally flipping a switch within Provisioning Services. This is done with a little RAM and spinning disks only, no SSDs.
Some of you might also be wondering why MCS is lower than PVS (Disk Cache). This is one of the intrinsic benefits of MCS when deploying a Windows 2012R2 XenApp server on Windows Server 2012R2 Hyper-V. MCS is able to take advantage of the larger block sizes within the new .VHDX files, thus reducing IOPS requirements.
I keep hearing people say that Citrix needs to show some love to Provisioning Services as it is such a great product. I think the RAM Cache with Overflow to Disk helps.
Coming soon… VDI workloads as well as other hypervisors.
Virtual Feller’s virtual thoughts
If it wasn’t for the cost, I would…
Cost is one of the major barriers to doing almost anything. With enough money and resources, a person can do anything, but this makes a lot of things unfeasible because we don’t have an unlimited supply of money.
When we tried to create a solution to mobilize Windows applications for 500 users, cost was a major concern. How can we create this solution while keeping costs in check?
Let’s use local storage!
Of course anytime you talk about local storage, you get tons of negative reasons why it won’t work.
When it comes to storage, the fear is that you won’t have enough performance to keep up with user demands. This is understandable, especially as servers get faster and traditional disk spindles remain the same, spinning at 15,000 RPMs. However, XenDesktop App Edition (XenApp) is different. It is different because you don’t have a single OS for each user; you have a single OS for many users. And because of this one important point, storage performance is not what you would expect.
But this is mostly theory. Theory is good, but I like to see theory put to the test. As I’ve said before, we wanted to validate that this solution is in fact viable, which is why we had the Citrix Solutions Lab help us with the testing.
First, we need to understand our Read/Write ratio. For Machine Creation Services, we have typically said that for Windows 7 desktops, we have about a 40/60 ratio between reads/writes; Provisioning Services is 10/90. What about when doing a Windows Server 2012 workload? As we’ve seen in previous versions Provisioning Services has a similar R/W ratio regardless of the operating system. What about Machine Creation Services? This is the first release where Machine Creation Services can do Windows Server 2012. Will it resemble a Windows 7 desktop R/W ratio?
Not even close
I will be completely honest with you, this result completely shocked me. It surprised me so much that we ran the test 3 different times and got very similar results. I was still skeptical and had them re-run the test a 4th time roughly 3 weeks later. Same results (all using Windows Server 2012 with Hyper-V, by the way).
So the R/W ratios are very different between Windows 7 and Windows Server 2012. What about steady state IOPS per user? Just so you know, when trying to determine steady state IOPS, I prefer to look at the 95th percentile, instead of an average. That way we make sure we don’t under-allocate storage. If we look at the Windows Server 2012 test using Machine Creation Services, you get the following results
Regardless of which of the 4 tests I looked at, the numbers and graphs were almost identical. This is the highest of the 4 tests resulting in 6 IOPS per user at the 95th percentile (average is roughly 5 IOPS).
So what does this mean? It means that local storage, as we configured it within the Mobilizing Windows Applications design guide, is a viable, low-cost option.
Daniel – Lead Architect