First up, Spectre\Meltdown. I did a presentation at the Pittsburgh VMUG earlier this year in February. I promised to upload the presentation and here it is. Ignore the fact that it months late, and lets just celebrate the fact that it made it onto my blog at all.
Download PowerPoint Presentation on Github.
The vBrownbag video of my presentation.
I just wanted to add to this presentation and why I wanted to present on it at all.
My company tends to live on the bleeding edge of technology. We are not a large enterprise, but we have the need to be up to date and nimble. Recently we've put a lot of effort into securing our infrastructure via patching, discovering vulnerabilities and removing them. Our security team was really pushing the patching around the same time that Intel released the Speculative Execution Side-channel vulnerabilities.
It got a lot of attention very quickly. I mean have you seen the cute and scary mascots? I had to explain our patching plan to the CIO and Director of IT Security. So I had to figure it out quickly. It didn't take long to discover that it was not as simple as normal patching. It was going to take some time to do it properly. I had to wade through all the scary discussions and discover the exact process to make it work.
I was told by outside IT comrades that very little VMware\Windows admins actually put as much effort into understanding and explaining the procedures and my knowledge would be helpful. Often they would patch the Windows and\or ESXi hosts but not perform the VM hardware piece which is essential to tie it all together. Hence the presentation.
Since early 2018 and the time of this presentation in February 2019, we have seen a regular release of patches for CPU related vulnerabilities. They all have impressive names and various risk ratings. Each comes with different procedures to patch. But with any CPU related patch, there are always multiple levels.
- OS - Windows\Linux patch. With Windows, Microsoft had just switched to an all in one cumulative patch. At the time they didn't think ahead that there would be a need to not activate a patch. But with these CPU patches, they remove CPU abilities in order to secure the system, thus slowing the system down.
- Windows Registry - So Microsoft had to inject a way to turn on or turn off the mitigation. So they used a registry key to activate or not. Desktop systems automatically activate the patch. Server systems do not. If you don't add the registry key, your system is not mitigated.
- vCenter - The ESXi patches require changes to micocode and passing this microcode to the VMs. In order to pull this off, you need to patch vCenter to be able to control this function.
- ESXi - Of course there is a patch for ESXi. Sometimes it contains the necessary CPU microcode.
- BIOS\CPU Microcode. The CPU needs patched too. This changes the CPU instructions.
- VM hardware - Finally, this new CPU Code needs to be passed to the VM's. If you are running a cluster with EVC mode enabled (you should), you will need to patch all of them before completing these steps. Once they are all patched, then you need to perform a cold power cycle of each VM (with VM hardware version 9 at least) to pass on the CPU instruction.
The Reality... This can be done over time. But what I have found is that it is really difficult pulling this off in a production data center with hundreds of hosts and thousands of VMs. All of them have different change windows and expectations. I've found that by the time I develop a plan to patch for one vulnerability, the next one has come out. The real trick is to keep the bad actors out of your environment.
My team is currently working through ways of automating some of these functions and patching. I will reserve that for another blog post.