Safe approach to dynamic memory and memory overcommit

I’ve had another discussion with people passionate about memory overcommitting for virtual desktops. My stance is it can be dangerous if you take it too far. Unfortunately, many reports I see talking about the value of memory overcommit take it too far. So where is just far enough? Let’s go through an example (I’ve generalized this as I don’t want to talk about each different hypervisor)…

Let’s say you need 100 servers (192GB RAM) to host virtual desktops for 7500 users. Those 100 servers will reach maximum capacity (CPU and Memory) when they host 75 virtual desktops VMs (2.5GB RAM). For fault tolerance, you go with the N+10% formula where you will have 10% more servers than you need. That means you really have 110 servers.

As I spread the load across 110 servers, my server concurrency drops from 75 users per server to 68. That also lowers my RAM usage from 187GB to 170GB. I paid for the RAM, so I want to use the RAM.

In this example, being conservative, I will configure the upper memory threshold for the desktops to 2.8GB RAM each and the lower to be 2.5GB (which is what I determined these users must have).

Based on the example, during normal production mode, my desktops are not overcommitting RAM. However, during an outage (planned or unplanned), my servers will be required to host additional desktop VMs. As no RAM is free, the desktop VMs are overcommitting, although spread across the entire server and the entire environment, the impact is small and likely to go unnoticed. Additionally, the overcommitting only occurs during an outage. So day-to-day operations continue to run smoothly and provide a good user experience (at least from a RAM perspective).

What have you used on your deployments?


😄 Design Handbook