Intel LazyFP vulnerability: Exploiting lazy FPU state switching
Posted on June 6, 2018 by Thomas Prescher, Julian Stecklina, Jacek Galowicz
After
Meltdown (see also
our article about Meltdown) and
Spectre, which were publicly disclosed in January, the Spectre V3a and V4 vulnerabilities followed in May (see also
our article about Spectre V4).
According to the German IT news publisher Heise, the latter might be part of 8 new vulnerabilities in total that are going to be disclosed in the course of the year.
Earlier this year, Julian Stecklina (Amazon) and Thomas Prescher (Cyberus Technology) jointly discovered and responsibly disclosed another vulnerability that might be part of these, and we call it
LazyFP. LazyFP (CVE-2018-3665) is an attack targeting operating systems that use lazy FPU switching. This article describes what this attack means, outlines how it can be mitigated and how it actually works.
For further details, see the current draft of the
lazyFP paper:
Link withheld by request from Intel
Please check back regularly, we’re going to update this post in coordination with Intel.
Summary and Implications
The public disclosure of this vulnerability was initially postponed by a typical responsible disclosure information embargo until August, but first rumors led to this date being dropped.
The register state of the floating point unit (FPU), which consists of the AVX, MMX and SSE register sets, can be leaked across protection domain boundaries. This includes leaking across process- and virtual machine boundaries.
The FPU state may contain sensitive information such as cryptographic keys. As an example, the
Intel AES instruction set (AES-NI) uses FPU registers to store round keys. It is only possible to exploit when the underlying operating system or hypervisor uses lazy FPU switching.
Users are affected when they run a combination of affected processor and affected operating systems.
- Affected operating systems:
- Currently withheld by request from Intel
- Affected CPUs when affected operating system or hypervisor is used:
- Currently withheld by request from Intel
Mitigation requires system software to use
eager FPU switching instead of
lazy FPU switching.
External References
Technical Background
This vulnerability is similar to
Meltdown (CVE-2017-5754). While Meltdown allowed to read protected memory contents from a user space program, this new attack allows to read certain register contents across protection domain boundaries.
Further explanation withheld by request from Intel.
To have a better understanding of how this attack actually works, it is necessary to dive deeper into the inner workings of the x86 FPU, and how it is used by operating systems.
The Floating Point Unit (FPU)
In the early days of x86, the FPU (also called math coprocessor) was an external co-processor that could be added to Intel’s now widely adopted x86 processor architecture. The
Intel 8087 was the first floating point math co-processor of this kind. The purpose of this extension was to accelerate mathematical operations on floating point numbers, such as division and multiplication. With the
Intel 486DX CPU model (released in 1989), the FPU got integrated into the microchip itself. This way, no additional co-processor was needed anymore.
Over the years, the processor was extended to support
Single Instruction, Multiple Data (SIMD) instruction sets, i.e. MMX/SSE/AVX. SIMD instructions perform the same mathematical operation on multiple pairs of operands at the same time in order to improve performance. Each of these instruction set extensions introduced new register sets that continue to be managed as part of the FPU register state. On recent Intel processors, the FPU register state can contain more than 2 kB of data (AVX2 offers 32 registers of 512 byte, each, which translates into 2 kB additional processor state). Due to the usefulness of these instructions this register set may contain not only floating point values but also other data as e.g. integer values.
Loading and Storing the FPU state
To enable multi-tasking, operating systems periodically interrupt processes to give other processes a chance to run. Otherwise a single process that loops forever could grind the system to a halt.
When the operating system switches from one process to another, it needs to save the state of the previous process and restore the state of the process that is about to be run. This state consists of the values stored in general purpose registers and FPU registers.
The x86 instruction architecture provides several instructions to load/store the register state of the FPU from/to memory. We already know about the large size of the FPU state, hence it is pretty obvious that one does not want to read and write such amounts of data on every context switch unless it is actually necessary, because not all processes use the FPU.
Eager and Lazy FPU Switching
Eager FPU switching is comparable to saving the general purpose register state on a context switch. For each process, the operating system reserves an area in memory to save the FPU state. When switching from one process to another, it executes an FPU store instruction to transfer the current FPU content to the state save area. It then loads the new FPU state from the state save area of the process that is about to be scheduled.
Lazy FPU switching optimizes this procedure for the case where not every process uses the FPU all the time.
After a context switch, the FPU is disabled until it is first used. Only then, the old FPU state will be saved, and the correct one restored from memory. Up to this point, the FPU keeps the register state of the process or VM that used it last.
To implement this optimization, the operating system kernel temporarily disables the FPU by setting a certain control register bit (CR0.TS). If this bit is set, any attempt to access the FPU will cause an exception (#NM,
Device Not Available, No Math Coprocessor). When the exception occurs, the kernel can save the old state to the respective state area and restore the state of the current process. Processes that simply do not use the FPU will never trigger this exception and hence never trigger FPU state stores and loads.
Starting with the introducton of the
x86-64 architecture, the presence of at least some SIMD instruction set extensions is mandated and their use has become more widespread. As such the underlying assumption of lazy FPU switching is not valid anymore. The performance gain by lazy FPU switching has become negligible, and some kernels already removed it in favor of eager switching.
The Attack
This section is currently withheld by request from Intel.
Mitigation
Using GNU/Linux with kernel versions >= 3.7, this attack can be mitigated by adding “eagerfpu=on” to the kernel’s boot parameters. We are not aware of a workaround for older versions. For all other operating systems: install the official update(s) from the vendor.
We are not aware of a performance penalty caused by using eager FPU switching. The
commit message of the Linux kernel patch that made eager FPU switching the default on Linux systems even points out that the assumptions which justified lazy FPU switching in the past do not hold any longer on the majority of modern systems.
Cyberus Technology GmbH
So erreichen Sie uns