Desktop Virtualization… The Right Way (Storms)

One of the questions you must ask yourself when designing a desktop virtualization solution is understanding the user patterns.  This has a direct impact on XenDesktop farm design and scalability with respects to boot up storms and logon storms. Let’s take two different examples so you can get a better idea for what I’m talking about:

Scenario 1: 9-5:
In this scenario, all users logon in the morning and logoff in the evening.  There might be some sporadic users working after hours, but for the most part users stay within these working hours.  This is a fairly easy scenario, which is why I’ve started with it.

To design your environment, you need to make sure that the boot up storm doesn’t overwhelm your environment.  You will be starting a large number of hosted virtual desktops and that has a direct impact on your hypervisor of choice, your storage solution and your network infrastructure.  You can easily overcome any challenges with a boot up storm in this scenario by using the XenDesktop idle desktops configuration to pre-boot desktops X minutes before the main rush begins (X is based on how many desktops you need up and running before users start connecting).  By the time users come online, the system should have calmed down from the boot up storm.

Each hypervisor limits the number of simultaneous bootups (XenServer being 3).  Although this helps limit the number of virtual desktops powering on at once, that process only requires a short amount of time as it does not include the actual OS loading.  If you have 1,000 desktops (across 10-20 hypervisors) that must be ready by 9AM, and you assume each desktop takes 30 second to fully boot, you want to start your bootup sequence by at least 8:30.

The second aspect is the logon storm.  There is little we can do to the environment to spread the storm over a greater amount of time as it is based solely on the user pattern.  The logon storm is going to have a direct impact on your farm design.  You need to look at the following:
1.    Number of user connections per minute
2.    The IOPS requirements per minute
3.    The logon times you require

As you add more users to the environment, you need to optimize your architecture and allocate additional resources in order to accommodate the storm.  This might require you to dedicate XenDesktop controllers as XML Brokers and Farm Masters.  By giving the controllers specific roles, you optimize those systems to be able to support greater numbers of simultaneously connecting users.

Scenario 2: 24/7 (3 shifts)
This scenario brings about a few more challenges in that users are always online.  The organization is running 100% of the time and as users are connecting, other users are logging off .  The cycle continues over and over again.  This architecture is really dependent on the environment in question. Even though the organization might be 24/7, those shifts might be located around the world in different locations connecting to different data centers (follow-the-sun model).  But if we have a unique scenario where we have 1 data center and all shifts connecting to that 1 site, this type of an environment would make us change our design as follows (safe to assume that all shifts are different sizes. In fact, many 24/7 models located in one site have one large shift and the remaining 2 shifts are significantly smaller):

In the 9-5 scenario, a boot storm wouldn’t impact other users as no users were online before the start of the workday.  In the 24/7 scenario, we have active users.   If we sized our environment based on max concurrency for a single shift, we have little extra capacity to pre-boot desktops.

  • First, we start all available workstations ahead of time to build up our idle pool (without disrupting working users).
  • Second, we disable the reboot after logoff option for the shift immediately before the largest shift starts. This will allow the desktops to be ready to go even faster.  This can be done by creating a workstation group per shift. This does bring the risk of the users not receiving a clean desktop, but this is mitigated by the desktops being rebooted (cleaned) after the other 2 shifts end.
  • Third, when the logon storm begins, we can also expect a logoff storm to begin as well because one shift begins as a different one ends.  Disabling the reboot for one shift change will help overcome the boot storm impact. To accommodate the logon/logoffs, we need to optimize our environment, just like we did in the 9-5 operational model, dedicating controllers for XML brokering and farm master.  This type of configuration allows us to support the largest possible number of users within one farm, although at a certain point we will require a new farm.

Two different user pattern scenarios to think about during a desktop virtualization design. A few things to keep in mind:

  • Does it require and understanding of the user environment? Yes
  • Will it impact scalability of the underlying infrastructure? Yes
  • Can the environment be designed in such a way to support these usage patterns? Yes


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s