When the news of the Intel L1TF security vulnerability hit, STH covered it in Foreshadow Flaw Targets Intel SGX and Virtual Machines. Foreshadow utilized a level 1 cache terminal fault (hence L1TF) as a side channel vector attack. There were a few major implications of Foreshadow. One is that an attacker could get information out of Intel’s SGX secure enclave. A second is that since the L1 cache is shared among hyper-threaded cores. If untrusted virtual machines are run on different hyper-threaded cores, then data can leak from one virtual machine to another. Intel has put out performance data on the impact of this on some industry standard benchmarks.
For those still looking for answers on L1TF, see our earlier piece or this video from Intel.
Intel L1TF Foreshadow Performance Impact Data
Most teams we speak to say that bare metal mitigations are a relatively low-performance impact. Intel’s data generally shows no more than a 1% performance impact if any at all. What was more interesting was what happens when you cannot mitigate simply through bare metal patches. In cases such as cloud providers, VPS providers, and enterprise clouds, one can have two VMs running on the same hyper-threaded core. Both Microsoft and Google said their cloud schedulers do not schedule VMs across hyper-threads on the same core. For small VPS providers, this is still common practice. The mitigation in those cases is turning off hyper-threading.
Intel published some startling numbers, in-line with our expectations, about the impacts of turning hyper-threading off.
As some context, we would not expect SPECfp2017, STREAM Triad, and Linpack to not be impacted by turning hyper-threading off. When vendors ask us to run Linpack, we are often asked to disable SMT.
For those Hyper-V shops, here are the Microsoft numbers.
Overall a low impact, but the Web Server workload is starting to get to one server per rack level of impact.
For those who want to cross-reference, here were the numbers when Intel offered Enterprise Meltdown and Spectre Fix benchmarks.
The Other Cost
We covered RedHat’s response to L1TF. At Hot Chips 30, Jon Masters on stage said that L1TF has already cost RedHat over 10,000 hours of engineering time.
That is an enormous industry cost. Meltdown and Spectre variants have already used over 10,000 hours of engineering time according to the talk.
For companies like Google and Microsoft with the ability to get custom chips, and with custom schedulers that can ensure that VMs to not cross hyper-threading boundaries, this is something that can be relatively easily mitigated. For enterprise virtualization clouds, this may increase utilization of underutilized servers, and cause more server purchases in the future.
The more interesting impact is the average VPS provider. Very few users are running Linpack like workloads on VPS providers. For the average VPS provider without access to custom hardware and software, mitigation will mean turning off hyper-threading. These companies often run at a lower margin where seeing a 10-30% performance impact will kill their economic model. That means they will have to choose between turning off hyper-threading and mitigating L1TF / Foreshadow and keeping hyper-threading on to preserve their economic model. That is a dangerous wire to walk.