Friday, August 11th 2023
"Downfall" Intel CPU Vulnerability Can Impact Performance By 50%
Intel has recently revealed a security vulnerability named Downfall (CVE-2022-40982) that impacts multiple generations of Intel processors. The vulnerability is linked to Intel's memory optimization feature, exploiting the Gather instruction, a function that accelerates data fetching from scattered memory locations. It inadvertently exposes internal hardware registers, allowing malicious software access to data held by other programs. The flaw affects Intel mainstream and server processors ranging from the Skylake to Rocket Lake microarchitecture. The entire list of affected CPUs is here. Intel has responded by releasing updated software-level microcode to fix the flaw. However, there's concern over the performance impact of the fix, potentially affecting AVX2 and AVX-512 workloads involving the Gather instruction by up to 50%.
Phoronix tested the Downfall mitigations and reported varying performance decreases on different processors. For instance, two Xeon Platinum 8380 processors were around 6% slower in certain tests, while the Core i7-1165G7 faced performance degradation ranging from 11% to 39% in specific benchmarks. While these reductions were less than Intel's forecasted 50% overhead, they remain significant, especially in High-Performance Computing (HPC) workloads. The ramifications of Downfall are not restricted to specialized tasks like AI or HPC but may extend to more common applications such as video encoding. Though the microcode update is not mandatory and Intel provides an opt-out mechanism, users are left with a challenging decision between security and performance. Executing a Downfall attack might seem complex, but the final choice between implementing the mitigation or retaining performance will likely vary depending on individual needs and risk assessments.
Source:
Phoronix
Phoronix tested the Downfall mitigations and reported varying performance decreases on different processors. For instance, two Xeon Platinum 8380 processors were around 6% slower in certain tests, while the Core i7-1165G7 faced performance degradation ranging from 11% to 39% in specific benchmarks. While these reductions were less than Intel's forecasted 50% overhead, they remain significant, especially in High-Performance Computing (HPC) workloads. The ramifications of Downfall are not restricted to specialized tasks like AI or HPC but may extend to more common applications such as video encoding. Though the microcode update is not mandatory and Intel provides an opt-out mechanism, users are left with a challenging decision between security and performance. Executing a Downfall attack might seem complex, but the final choice between implementing the mitigation or retaining performance will likely vary depending on individual needs and risk assessments.
162 Comments on "Downfall" Intel CPU Vulnerability Can Impact Performance By 50%
javascript/comments/7ob6a2
As an added bonus: Makes all those system hardening guidelines a little easier to maintain... To be fair, the vulnerability affects only one set of instructions in AVX2+. Explicit vectorization could forgo the op in favor of alternatives (and afaik, this was the better choice back in the early days).
Could pose problems to auto vectorization tho. But people don't typically expect that much performance out of them...
I'm not an expert, but I wonder if this feared performance loss could be itself mitigated by rewriting code to do manual loads instead of relying on the faulty ops.
sounds like it's not that big a deal if it's just older hardware but new chips aren't vulnerable to it.
Everything using AES-NI instructions is potentially affected, and that's accelerating the AES cipher used in many protocols including HTTPS. BitLocker also uses it for disk encryption, so theoretically a malicious program could extract its encryption keys. Whether it can be achieved in practice, especially in consumer setting is another issue. On servers it's another can of worms, particularly in cloud or VM environment.
Optimizing for AVX2/-512 brings very measurable increases in performance, so abandoning it is not really a solution. For example on my CPU AES-NI in VeraCrypt is able to achieve 16GB/s, but when disabled the performance tanks to 2.5GB/s. This decrease exceeds even the most pessimistic "up to 50%" impact of DOWNFALL mitigations. Yes, both "small server" Xeon-E and workstation Xeon-W, but no big chips. Intel is still supporting Skylake Xeons launched in 2017 until the end of this year. There's a lot of hardware still in use since then, and most likely running earlier architectures.
The performance impact is really dependent on workload and the actual use cases - if you're using a rack of servers to convert video files in an isolated network segment then you can disable mitigations. If you are a cloud/server hosting facility then you have to enable them in order to avoid potential liability.
No one is arguing the benefits of vectorization in general, but there are more ways to skin this cat.
I did contest benefits of compiler auto-vectorizations in typical scenarios. Applications that do benefit from SIMD tend to have explicit implementations, no?
Another issue is that most programs use libraries for high performance code, often without even knowing if they are using vectorized code or not internally. You're not rolling your own AES code, or at least you shouldn't ;), you use OpenSSL or even rely on the operating system in case of Windows/macOS. Those in turn are often vectorized/using hardware instructions like AES-NI in order to increase performance and save power.
However, browsers quickly mitigated it by reducing the resolution of the timers that Sprectre relied on for its side channel, making data extraction impractical.
All the info you need about this is in the Wikipedia article.
My advice would be to trust your OS vendor unless it really is just a gaming and local code execution rig. In that case, feel free to turn them off.
Come on people, you're all smart enough to know how things work. See sense. My guess is no. If it is possible from a remote vector, the difficulty will be high, if not extreme.
I am sure some concessions were made throughout product development on the entire Core line, from the first to the 14th.
What exactly they were, only someone on the inside could say.. But if a mitigation takes a certain product line back a gen or two..
Also humans are influenceable, manipulable, corrupt and ideology driven. Just ask yourself, would you pass on a truckload of cash from a GOV agency just for slipping in an exploitable bug?
So if I seemed a bit harsh earlier. I'm sorry. The problem is that, some are implying something clumsy, careless or nefarious actions when ALL of these exotic and crazy vulnerabilities in the last 7 years came about from experimentation with the hardware in ways that no one designing said hardware ever planned for, imagined or could have predicted.
We can't lay the blame at their feet and scream "Why did you do this?!?!". It just doesn't work that way.
Also it's not surprising that tech security flaws stay undetected for soo long. There are not many people on the planet who actually have a understanding for the tech, and those who do work either for the tech companies, the GOV or bad actors. And none of them are interested in making security flaws public, two of them even abuse them. That's why most security flaws are reported by private researchers.