L1 terminal fault (L1TF) vulnerablity (aka "Foreshadow"): What we're doing about it


#1

Summary

An Intel CPU vulnerability was disclosed on August 14th 2018 called “L1 terminal fault”, also known as “L1TF” or “Foreshadow”. The vulnerability enables attacks against host memory from inside a guest virtual machine (eg, a Cloud server). Malicious guests could infer values of data from the host machine or from other guest machines.

Following Intel’s public disclosure yesterday, we’ve been investigating thoroughly and identifying the best way forward.

We’ll post an update once we’ve made a decision on which mitigations we’re going to apply, when we’re going to apply them, and if possible an estimation of how these mitigations may or may not affect customers.

Further reading

Brief technical explanation

A page-table entry (PTE) is a structure that translates between virtual and physical memory addresses. Intel CPUs, through an optimization technique called “speculative execution”, are treating invalidated PTEs as valid; this may allow malicious access to system memory that should otherwise be inaccessible to an attacker.

Available mitigations

There are few different mitigations available (references in links at the bottom):

  1. A Linux kernel update (with minimal performance impact) will be available this week that ensures that non-present (invalidated) PTEs point to a non-existent region of memory. This provides adequate protection only if guest kernels are also patched, but Cloud providers usually give customers freedom to choose their kernel.

  2. An Intel microcode update is available that arranges for the L1 cache to be flushed before returning to a guest virtual machine, therefore preventing malicious access. The performance impact varies depending on workload.

  3. Clearing the L1 cache is only a partial solution if the CPU is running with hyperthreading enabled, as threads share the L1 cache. One can disable hyperthreading, but that has a significant impact on performance.

  4. Another approach might be to disable the extended page-table feature entirely. This seems at first look to provide the highest guarantee of protection but also the highest performance cost.

  5. “Intel has developed a method to detect L1TF-based exploits during system operation, applying mitigation only when necessary. Intel has provided pre-release microcode with this capability to some of our partners for evaluation, and hope to expand this offering over time.” Source

We’re still in the process of making technical decisions (much of which revolves around trade-offs between performance and security), but we’ll keep you posted. Rest assured we’ll be trying to plot a path forwards that sits best for our customers.

CVEs (and links to Debian security tracker)

Further reading


#2

Hi Jamie,

Following Intel’s public disclosure yesterday, we’ve been investigating thoroughly and identifying the best way forward.

This implies you had no forewarning, even if the detail wasn’t known? I’m surprised given I hear of planned reboots, etc., by other providers that can’t divulge why in advance of the CVE release.

Cheers, Ralph.


#3

Mitigations chosen

  1. We will roll out a new Linux kernel to our Cloud infrastructure (specifically the Heads, which hold the CPUs used by Cloud Servers) and enable “L1D flushing on VMENTER”.
  2. We will disable SMT (aka hyperthreading) on our Cloud infrastructure.

These mitigations combined together will provide protection against the L1TF vulnerability. We are rolling both of these mitigations out over the next week to our entire Cloud platform.

We will post an update to this thread once this work has been completed.

Impact on customers

Since our platform was built to support live migration of Cloud Servers between underlying hardware, we are able to apply these mitigations without any downtime for customers.

There will be a degree of performance impact, as there would be for any other Cloud hosting provider. Fortunately we have sufficient capacity on our platform, which means that we (and our customers) have breathing room to expand into. That means the mitigations shouldn’t significantly impact the majority of our Cloud customers.

Future plans

  1. We will continue attempts to make it onto Intel’s co-ordinated disclosure list.
  2. We will bring forward a roll out to newer versions of Qemu (one of the key technologies we use for virtualizing Cloud Servers). This will have fixes for some of the variants of the Spectre vulnerabilities that Intel disclosed at the start of the year. We’ll post more about this in due course.

#4

You’re quite correct that we had no forewarning. Often in these situations there’s private discussion between privileged parties as part of a co-ordinated disclosure, often involving NDAs. We’ll continue to try and get onto Intel’s co-ordinated disclosure list so that we can get pre-notification of this kind of vulnerability and make sure all of you are protected as quickly as possible.

Having said that, we’ve got hands on deck testing fixes and rolling them out to our platform as we speak.


#5

By this evening, all Cloud Servers should be protected

We’ve pushed hard to roll out the two mitigations and protect you from L1TF as fast as possible. Almost all Cloud Servers on our platform are now protected, and no customers have suffered downtime.

We only have a handful of Heads left to reboot – this should be finished by later this evening and provide protection for 100% of our Cloud Servers.

Have a lovely weekend!


#6

Are you also making sure you have the latest intel microcode (the August version)?


#7

Good question!

We’re running the microcode from July 3rd which implements L1TF mitigation for various server CPU models, including the CPU models we’re using in our Cloud infrastructure.

The August version does provide some coverage for further models of Intel CPU (mostly desktop), but fortunately we didn’t need to update again on a rushed schedule.

Sources: