Tag Archives: IOPS

Machine Creation Services RAM Cache and XenServer IntelliCache


As I was discussing the storage optimization capabilities in the Machine Creation Services vs Provisioning Services debate, I mentioned the use of a XenServer RAM-based read cache. This can be misunderstood as XenServer IntelliCache (a mistake I’m sad to say I’ve made in the past).

XenServer IntelliCache (released with XenServer 5.6 SP1) and XenServer RAM Cache (released with XenServer 6.5) are two different capabilities of XenServer, both of which tries to reduce the  IO impact on shared storage.

Let’s walk through different deployment scenarios with Machine Creation Services in XenApp and XenDesktop 7.9.

Scenario 1: Shared Storage on any Hypervisor

When defining a host connection, the default storage management option is to use shared storage.

DefaultThis configuration results in the following architecture

Default Storage

The virtual machines read (denoted by the black lines) from the master OS disk on shared storage. Writes (denoted with the red lines) from the virtual machine are captured in the differencing disk, located on the same shared storage as the master OS disk.

It is also important to note that there will also be reads coming from the virtual machine’s differencing disk back to the VM.

Scenario 2: Shared Storage with Optimized Temp Data on any Hypervisor

With XenApp and XenDesktop 7.9, admins, when creating their host connection configuration, can select the “Optimize temporary data on available local storage” option.

Storage ConfigSelecting this option results in the following changes to the architecture:

Opt Storage

For non-persistent desktops, instead of the temporary writes going into the shared storage differencing disk, the writes are now captured within the write cache disk on the local hypervisor storage.

Persist optBut for persistent desktops, the optimize temporary data setting is not used as all data is permanent. The writes are captured on shared storage within the differencing disk.

The value is that we don’t waste shared storage performance with data we don’t care about. We instead shift the storage IO to local, inexpensive disks.

Scenario 3: Shared Storage with Optimized Temp Data and RAM Caching on any Hypervisor

With XenApp and XenDesktop 7.9, a portion of the virtual machine’s RAM can be used to cache disk writes in order to reduce the impact on local/shared storage. During the creation of a new machine catalog, an admin defines the size of the RAM and disk cache.

CacheThe RAM cache operation adjusts the architecture as follows

mcs cacheFor non-persistent desktops with a RAM-based write cache on a non-persistent desktop, the writes first go into the non-paged pool portion of RAM within the VM. As the RAM cache is consumed, the oldest data is written to the temporary disk-based write cache.

However, this option is not applicable for persistent desktops due to the risk of data loss. If disk writes are cached in volatile RAM and the host fails, those writes will be lost, potentially resulting in lost data and corruption.

For non-persistent desktops, when used in combination with optimizing temporary data, we not only shift our write performance to low-cost local disks, but we also reduce the amount of write IO activity going to those disks.  This should further help reduce the costs by not requiring the use of local SSDs.

Scenario 4: Shared Storage with Optimized Temp Data and RAM Caching on Citrix XenServer

If the environment is deployed on Citrix XenServer, the architecture automatically includes a RAM-based read cache on each host.

Non-persistent on XenServer

Persistent desktopFor both, non-persistent and persistent desktops, portions of the master OS image is cached within the XenServer’s Dom0 RAM so subsequent requests are retrieved from the local RAM instead of generating read IOPS on shared storage.

This is valuable because we significantly reduce the master image reads from our shared storage. If you have 50 XenServer hosts, with each running 100 Windows 10 virtual machines, each virtual machine will read the same data from the same master image. This will add significant amounts of read IO activity on shared storage. By caching the reads in local RAM for each XenServer host, we can drastically reduce our impact on shared storage.

We also have a RAM-based read cache in Provisioning Services.  This capability increased boot performance by 4X.  I would expect to see similar results with this XenServer feature.

Scenario 5: Shared Storage with Optimized Temp Data and RAM Caching on Citrix XenServer with XenServer IntelliCache

When the admin defines the host connection properties, Studio includes the IntelliCache option if the host connection is XenServer.

XS ICFor non-persistent and persistent desktops, a local, disk-based cache of the master OS image is captured on each XenServer host, reducing read IOPS from shared storage. As items are accessed, they are placed within XenServer’s RAM-based cache.

The write operations are different based on whether the desktop is non-persistent or persistent.

Non-persistent with IntelliCacheFor non-persistent, disk writes are first captured in the VM’s RAM cache. When the RAM cache is consumed, the oldest content is written to the local write cache.

Persistent with IntelliCacheFor persistent desktops, disk writes are simultaneously captured in the local IntelliCache disk (.VHDCache file in /var/run/sr-mount) and in the shared storage differencing disk. When the VM reads data from disk, it first checks the local IntelliCache disk and then the shared storage differencing disk.

The value for this configuration is two-fold:

  1. Host-based IntelliCache Disk: Using IntelliCache with the Read Cache (RAM) provides us with two levels of caching on XenServer.  This could help reduce reads from shared storage in situations where our Read Cache (RAM) is not large enough.  Imagine if we have multiple images being delivered to each XenServer host. Our read cache (RAM) will not be large enough, resulting in increase read IO activity on shared storage. By combining the two, we should be able to keep shared storage Read IO activity to a minimum.
  2. VM-Based IntelliCache Disk: For persistent desktops, even though each write is performed twice (local IntelliCache disk and differencing disk on shared storage), the reads will come from the local IntelliCache disk, thus helping to reduce the load to shared storage. How much will this help the user experience and cost?  That is still to be determined.

Daniel (Follow on Twitter @djfeller)
XenApp Advanced Concept Guide
XenApp Best Practices
XenApp Videos

Provisioning Services Read Cache


As you can see, I’ve spoken numerous time about the Provisioning Services RAM Cache with Disk Overflow capability.

  1. Windows 10 IOPS
  2. Video Proof
  3. Reducing IOPS to 1
  4. Read/Write Ratios
  5. XenDesktop 7.5 IOPS
  6. Digging deeper into IOPS
  7. ESG Spotlight on IOPS

So yes, I like talking about this topic.  But now, I’m going to talk about something very slightly different… Cache🙂

While I was working on capturing some images for my Citrix Synergy 2016 Tech Update session, I saw something interesting.

I started my lab, started my Provisioning Services server and launched a Windows 10 virtual desktop.  According to the Provisioning Services agent on my virtual desktop, the desktop took almost 60 seconds to boot (Just so you know, I’m working on 7200RPM spinning disks in my meager home lab, so 60 seconds is expected).

First time boot

I then started a second Windows 10 VM, using the same Provisioning Service images.  Now look at the Provisioning Services agent.

Cached boot

 

Instead of an almost 60 second boot time on the first VM, the second VM booted in 14 seconds! WHAT?

Look even closer at the two images.  Look at the disk throughput.  4,400KB/sec vs 18,000KB/sec.

Sorry, but my cheap disks are not that fast. So what gives?

When you boot a Provisioning Services-based VM, the VM requests the disk image from the Provisioning Services server.  The Provisioning Services server reads portions of the disk and streams it across the network.  As the Provisioning Services server reads portions of the disk image, Windows automatically stores this information in RAM (system cache), if enough RAM is available.

So when we boot subsequent target devices that use the same disk image, we get a massive boost in performance as Provisioning Services uses the information in RAM instead of reaching out to slower storage.

As i said before, Cache is Good!

Daniel (Follow on Twitter @djfeller)
XenApp Best Practices
XenApp Videos

Windows 10 IOPS


We live in a multi-dimensional world, but our analysis of Windows 10, to date, has been focused on a single aspect… single server scalability.

I think it is time for us to look at another aspect: storage.

As you recall from looking at the results of the Windows 10 vs Windows 7 Single Server Scalability, we continuously increased server density by optimizing Citrix HDX and the underlying operating system.  But what impact will these different optimizations have on storage IOPS?

First, let’s look at IOPS (average and 95th Percentile) for Windows 7 and Windows 10 without any disk optimization

Disk cache

As expected, Windows 10 has a higher IOPS impact than Windows 7.  When looking at our 95th Percentile numbers, Windows 10 is 30% higher than Windows 7 from a storage IOPS perspectivee. This means upgrading from Windows 7 to Windows 10 will require us to assess that our storage infrastructure can accommodate the new workloads or find ways to reduce the overall IOPS activity.

For those of you who have read my blogs over the years, know I love to talk about Provisioning Service RAM Caching capabilities.  When we enable this feature for Windows 7 and Windows 10, we see something dramatic

RAM Cache

 

Our IOPS drop by 90-95%!  These results were achieved by simply allocating only 256MB of RAM for our Provisioning Services RAM Cache per Windows VM.

So if you are thinking about migrating to Windows 10, think about how to deal with your storage performance.

and remember, even though we only focused on IOPS, we have demonstrated that optimizing storage performance directly impacts the user experience.

Daniel (Follow on Twitter @djfeller)
XenApp Best Practices
XenApp Videos

You don’t need to be a rocket scientist to see the value in RAM Cache


A few years ago, we replaced all of our windows in our home (I’m talking about the panes of glass you look through, not the operating system). We, of course, talked with a few different companies who stopped by, went through their product portfolio and brought along samples. One demonstration stuck with me. The sales person, placed his sample window flat on the ground and stood on it, demonstrating the strength of the window. I immediately started thinking, “That was totally wicked” and “I wonder if it has ever shattered before”.

As the practical part of my brain kicked in, I began to wonder when I am ever going to need to walk on my windows. Is Batman stopping by and going to climb up my house? Is this unique to this particular window? Not knowing much about windows, I wondered if my old windows were just as strong.

Demos are meant to impress us, but we need to ask ourselves if the demo really demonstrates everyday life.

And this was the goal I set out to achieve when I was trying to see how much of a benefit to the user experience would the new RAM Cache with Disk Overflow feature provide. I wanted a demonstration that showed a very typical user.

A typical office user, like myself, uses a Windows desktop with the following

  1. Outlook
  2. Internet Explorer
  3. Microsoft Word

Even with the apps defined, you can still have quite a difference in the workload depending on the websites you visit or the type of document you create. Instead of visiting a website going overboard with multimedia, Citrix.com was used as it resembles a simple, common site.

Instead of creating a large, document, multiple pictures, different aspect ratios and 3d rendering, the demo creates a small document with a single paragraph and a simple chart.

With this simple workload, would we see any noticeable difference in the user experience? And by noticeable, I’m not talking about an application take a 1/2 second longer to load. I’m talking about a “WOW, anyone who sees this will definitely be able to notice the improvement”.

In this very simple demonstration, with a minimal workload, I saw 2 major things

  1. A drastic drop in disk activity
  2. Very noticeable change in the user experience

Try it for yourself. Flip the switch


From the virtual mind of Virtual Feller

ESG Lab Spotlight Report: Up to 80% Reduction in Storage Cost for VDI and RDS


You’ve heard the news, you’ve seen the videos, and now the storage savings have been verified! According to an ESG report, the new RAM Cache with Disk Overflow feature, included in the XenApp and XenDesktop 7.6 release, has the potential to reduce storage costs by 80% or more. Now before you stop reading thinking this is too good to be true, think about the storage cost problem for a moment.

Storage costs associated with RDS/VDI solutions is for throughput and not space. We need to have enough throughput or IOPS so the user experience doesn’t suffer. And believe me, it can suffer drastically, as you can easily see in this simple demonstration (pay particular attention from the 3 to 4 minute mark J).

To visualize how this works, take the following diagrams into perspective


IO is destined for the disk. Disks are slow when compared to RAM. So the Cache on RAM with Overflow feature substitutes RAM for disk. And because RAM is not infinite, we will overflow portions of the RAM to disk as needed. But even this overflow is more efficient. The overflow is sequenced and consolidated into large, sequential blocks of data instead of small, random blocks.

Many implementations required massive SANs or expensive SSDs. People were spending large amounts of money on storage, not for space, but instead to achieve the throughput required by RDS/VDI. With the Cache on RAM with Overflow feature, we can drastically reduce the number of disks. We don’t need hundreds of disks to give us our throughput. We don’t need to implement SSDs. We can drastically reduce our disk count and focus more on storage space, which is by far, easier and cheaper to implement.

According to the ESG report on Provisioning Services, when you focus on disk throughput

  • A XenDesktop implementation requiring 26 disks can be reduced to 3
  • A XenApp implementation requiring 74 disks can be reduced to just 4

And because of the way this feature works, it provides value to multiple hypervisors.

From the virtual mind of Virtual Feller

PROOF[Video] – New XenDesktop and XenApp Storage Optimizations Does Improve the User Experience


I’ve written and seen numerous blogs/tweets about how great the new storage optimization feature is for XenApp and XenDesktop. I’ve read how this feature can reduce IOPS from an average of 15 IOPS per Windows 7 user down to 0.1 IOPS. I’ve read how this feature functions by creating a small RAM buffer within each VM. I’ve seen tweets showing crazy IOPS numbers on using standard, spinning disks.

In fact, I’ve done some of this analysis and was completely blown away by the results.

But who cares? Who cares if my IOPS are reduced by 99%?

Unfortunately, unless you are responsible for storage, you probably don’t care.  But what if this drastic reduction in IOPS had a direct impact on the user experience?  And from someone who uses VDI remotely 100% of the time, the user experience is what I really care about.

Let’s see what the new RAM Cache with Disk Overflow feature can do for the user experience…

What impresses me the most is that the workload used isn’t some crazy operation that a typical user wouldn’t really do.  You can easily see the improvement to the user experience with something as simple as browsing a few web pages.

And all of this is done

  • Without complex configurations
  • Without expensive SANs
  • Without SSDs
  • Without additional hardware
  • Without additional licenses
  • Without a learning curve

From the virtual mind of Virtual Feller

Diving deeper into the latest XenDesktop 7.5 IOPS results


As you saw in a previous blog, XenDesktop 7.5 is able to achieve an average IOPS value of less than 1/10th per user. Of course when you put out unbelievable results like this you hear a lot of comments trying to find holes in the results or test procedures. This is as it should be as it is part of any good scientific method.

In order to show a more complete picture of the value of the new Provisioning Services Ram Cache with Disk Overflow, we gathered additional details from the Citrix Solutions Lab’s tests. This set of data includes details for the duration of the entire test that included logons for roughly 100 users (sorry but it didn’t include boot. However, booting is mostly a read operation that PVS can handle with server-side cache).

The results are still just as stunning as the steady state:

For a physical host, we accumulated IOPS numbers for each of the virtual desktop sessions then combined into a single graph. As you can see, during the logon portion of the test we had a peak, and I mean maximum IOPS value, of 12 IOPS.

What if we don’t break it down by user, what would the host’s total IOPS graph look like?

Peak IOPS

 

 

 

 

 

 

 

 

 

 

The absolute peak is 155 IOPS on a host that is running 100 VDI VMs.

Tests details were as follows:

  • LoginVSI 4.0 medium workload
  • Hypervisor: Hyper-V 2012R2 and vSphere 5.5
  • Virtual Machine: Windows 7, 2 vCPU and 2.5 GB of RAM (512 MB as defined for the RAM Cache)

Based on results like this, I’m left to wonder how many users I can support on my mid-1990s college PC (Pentium 486 with a 420MB hard drive) J

From the virtual mind of Virtual Feller