Vertical or Horizontal Load Balancing

Three years ago, I wrote the following best practices:

Virtual Apps and Desktops Best Practice #4: To manage costs, focus on the right user experience instead of the best user experience.

Typically, the better something is, the more it will cost.  So you need to find that happy place where you align user experience with the cost. And with all of the focus on cloud, this is even more important.  In the world of Virtual Apps and Windows Virtual Desktops (WVD), we have two choices in load balancing users across different server instances: Horizontal or Vertical. Let’s see how they work and what impact it might have.

Horizontal Load Balancing

For years, those of us deploying virtual apps within our data centers would determine how many servers we needed to support a defined user load. Traditionally, we designed the environment for maximum concurrency.

We created a load managed group of servers that hosted the same set of applications. We setup our load balancing policies to look at CPU and memory utilization to find the least loaded server for the next user request.  As users logged on, the server load would increase.

All of this was in an effort to give the user the best experience possible.  We wanted to spread the load.

In our on-prem world, we already paid for the hardware, we might as well use it even thought it wasn’t actually needed at that moment.

Vertical Load Balancing

With the release of Windows Virtual Desktops, everyone wants to talk about cloud.  The cloud is great. The cloud is wonderful. The cloud keeps me busy because the cloud changes all of our design thinking.

With the cloud, I don’t purchase anything up front.  I pay for usage.  This sounds great, and it is if you know what you are doing (but initially, none of us know what we are doing so we make mistakes)

If you simply use the same on-prem load balancing policies in the cloud, you will say “uff da” when that first bill arrives because you went with horizontal load balancing. You spread the load across all defined instances. You sized for maximum concurrency, which means maximum cost.

In a cloud deployment, a horizontal load balancing approach, like we do in our on-prem world, will give you the best user experience, but it comes at a huge cost.

With a cloud-based deployment, like Windows Virtual Desktop, we need to think about a different approach. Hello vertical load balancing.

Vertical Load Balancing works opposite of horizontal.  Instead of spreading the load across all servers in our pool, we load up a single instance before moving onto the next instance. When the latest instance is full, a new instance starts.

A Storm Is Coming (Logon Storms)

Vertical load balancing helps keeps cloud costs down, but there are some concerns.

A long time ago, during large-scale, on-premises deployments, we ran into issues with logon storms (many users logging on at the same time – like 8AM).  Before the controller could update the load on a server, additional users requested a session.  All of these requests were sent to the same server.  Very quickly, we overwhelm the server.

We have to remember that the logon operation is CPU intensive. For those 30-60 seconds of a user’s logon, the system is hit pretty hard. That doesn’t change with Windows RDSH or Windows Virtual Desktop. When we send many users to the same host at the same time, we can quickly see bad things can happen.

But we don’t talk about this anymore.  Why?  To combat this issue, there is a load balancing rule called “Concurrent Logon Tolerance” with a default value of two.

This rule allows only 2 users to simultaneously log onto the same server even if the load of that server is the lowest. If there are no additional servers available, this setting gets ignored, because users need a session.

Think about vertical load balancing. I only have a single server able to take sessions. During peak logon activity, we run the risk of issues if we have logon storms. So by trying to save money with vertical load balancing, we are negatively impacting the user experience. Again, I’m only talking about logon storms.  Servers can easily handle a couple of simultaneous logons.  In a logon storm, we are talking hundreds or more simultaneous logons.  

Hybrid Load Balancing

This brings us to a hybrid load balancing model (Do you notice that when it comes to anything related to cloud, the term hybrid always seems like a better approach?  Same can be said with load balancing.)

A hybrid load balancing model helps us reduce cloud costs while still preserving the user experience. At a high-level, the plan is to spread the load but narrow the spread.

This means we

  1. Continue to use horizontal load balancing
  2. Use Auto Scale to reduce the current size of available server instances.
  3. As we start to fill up our server instances, Auto Scale starts a new instance

If we need 6 servers, we can set Auto Scale up to only have 3 available. When users log on, the logons are spread across those 3 servers.  As the load of those 3 servers increases, Auto Scale starts powering on additional server instances (server 4). If we configure Auto Scale to start powering on new server instances BEFORE our current instances are fully loaded, then the Concurrent Logon Tolerance policy setting will still help protect us from sending every new logon request to the newly powered on server instance.

Daniel (Follow on Twitter @djfeller)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.