With desktop virtualization, we hear more and more about how important IOPS are to being able to support the virtual desktop. I’ve had a few blogs about it and plan to have a few more. What I wanted to talk about was an interesting discussion I recently had with 3 Senior Architects within Citrix Consulting (Doug Demskis, Dan Allen and Nick Rintalan). There are 3 smart guys who I talk to fairly regularly and the discussions get quite interesting.
This particular discussion was no different. We were talking about the importance of IOPS, RAID configs, spindle speeds with regards to an enterprise’s SAN infrastructure. (Deciding if you are going to use a SAN for your virtual desktops is a completely different discussion that I’ve had before and Brian Madden had more recently). But for the sake of this article, let’s say you’ve decided “Yes, I will use my SAN.” If your organization already has an enterprise SAN solution, chances are that the solution has controllers with plenty of cache. Does this make the IOPS discussion a moot point? If we simply use an IOPS calculator (at least the ones I’ve seen) and do not take into account the caching capabilities of the SAN controllers, won’t we over-provision our virtual desktop environment and end up wasting more money/resources?
Many of us who are familiar with XenDesktop knows that changes made to the golden disk image, when delivered via Provisioning services, is stored in a PVS Write Cache. From numerous tests and implementations, we know that 80-90% of the IO activity from a virtual desktop will be writes. If we configure the SAN Controllers to be 75% write (assuming we have battery-backed write cache controllers), we allow the controllers to allocate more cache for write operations, thus helping to offload the write IO to the disk, which raises the number of effective IOPS the storage infrastructure can support. Think of the controller’s caching capabilities as a large buffer for our disks. If our disks can only support so many write operations, the controller cache stores the writes until the disk is able to write it to the platter. This cache allows the infrastructure to keep moving forward with new operations even though the previous operations were not written to the disk yet. They are all buffered. Just remember, we aren’t reducing the total number of IO operations, we are just buffering them with the controller cache.
Think about it another way. If we encounter a storm where each user will require 10MB of write operations and the storage controller has a 4GB cache, that one controller can support 400+ simultaneous users for this particular storm, and we haven’t even talked about the disk IOPS yet!!! With this scenario, wouldn’t a single disk spindle be able to support this particular storm because the controller is buffering everything? And what’s also interesting is those write operations are being flushed to disk continuously so the number of users the controller will be able to support would be much, much higher.
So if we have cache on our controllers, which most SAN controllers I’ve seen lately have, are we over designing the storage infrastructure by only focusing on IOPS? (this is assuming you are using SAN and not local disks on your hypervisor which I talk about a lot as well). Just remember that those write operations must eventually get written to disk. So if we know what our controller cache is capable of, and we know the amount of storage required for a particular storm (logon, boot, logoff, etc), can’t we support more users (and I mean a lot more users) on the SAN?
What do you think?
Daniel – Lead Architect