Thursday, March 16th 2017
AMD Ryzen Machine Crashes to a Sequence of FMA3 Instructions
An AMD Ryzen 7-1800X powered machine was found to be crashing upon execution of a very specific set of FMA3 instructions by Flops version 2, a simple open-source CPU benchmark by Alexander "Mystical" Yee. An important point to note here is that this little known benchmark has been tailored by its developer to be highly specific to the CPU micro-architecture, with separate binaries for each major x64 architecture (eg: Bulldozer, Sandy Bridge, Haswell, Skylake, etc.), and as such the GitHub repository does not have a "Zen" specific binary.
Members of the HWBot forums found that Ryzen powered machines crash on running the Haswell-specific binary, at "Single-Precision - 128-bit FMA3 - Fused Multiply Add." The Haswell-specific binary (along with, we imagine, Skylake), adds support for the FMA3 instruction-set, which Ryzen supports, and which lends some importance to the discovery of this bug. What also makes this important is because a simple application, running at user privileges (i.e. lacking special super-user/admin privileges), has the ability to crash the machine. Such a code could even be executed through virtual machines, and poses a security issue, with implications for AMD's upcoming "Naples" enterprise processor launch.
Members of the HWBot forums found that Ryzen powered machines crash on running the Haswell-specific binary, at "Single-Precision - 128-bit FMA3 - Fused Multiply Add." The Haswell-specific binary (along with, we imagine, Skylake), adds support for the FMA3 instruction-set, which Ryzen supports, and which lends some importance to the discovery of this bug. What also makes this important is because a simple application, running at user privileges (i.e. lacking special super-user/admin privileges), has the ability to crash the machine. Such a code could even be executed through virtual machines, and poses a security issue, with implications for AMD's upcoming "Naples" enterprise processor launch.
62 Comments on AMD Ryzen Machine Crashes to a Sequence of FMA3 Instructions
It's not about compatibility or how rare the problematic instruction is used in software.
It's about the fact that this architecture can be crashed with a single line of code, which should not happen, ever. If a CPU can't execute some code, it should handle this exception in a safe way. Ryzen simply dies.
This is a big stability risk and - as far as enterprise segment - a threat that would make Ryzen unacceptable in commercial applications.
Moreover, while it is rumored that AMD knows how to fix this and the microcode update is being developed, AMD gave no official statement nor deadline. It's already been few days since the issue was revealed..
I do understand why it is news and I do understand why it is important. But, 3 pages later, it is getting stretched pretty thin.
Future you says: Oh, I guess they fixed it. It wasn't such a big deal after all. Time to move on.
Sorry, but an argument that something happened years ago (Coppermine in 2001?) is by no means helping AMD.
Seriously, we became so spoiled by CPUs that just work - having close to none compatibility conflicts, setting themselves up, overclocking automatically etc.
AMD gave us a CPU which once makes you spend weeks on reading about issues, finding a rare RAM that works etc. We're once again waiting for some patches to fix crucial issues...
I totally understand they were committed to maximize performance and this CPU is really squeezed to the limits, but haven't they gone too far?
Quite a few people have reported that this FMA3 issue can be fixed (or greatly limited) by upping voltage. Oh come on... do we deserve being treated like that? :/ It's a huge deal and will not be forgotten by reviewers and enthusiasts. I would compare it to the latest Samsung's battery fail. What saves AMD is that - apart from some gamers and geeks, no one really cares (generally speaking not that many know what AMD is).
This problem wasn't like a game crashing to the desktop, it wasn't even like a BSOD and you had to reboot. Depending on your system, you may or may not have even been able to turn off your computer by holding the on/off button! This happened to me once, guess how.
As I said earlier, this is similar to the Cyrix coma bug or the Pentium F00F bug.
VFMADD132PDx %a, %b, %c
The great thing is that the benchmark used to reveal this bug is open source. Everyone willing to hang their (or - for that matter - someone else's) Ryzen can check how the code forces FMA3 usage. :) Basically, you can force that while compiling (even when coding in a high-level language).
And so what if it takes more than one line of code in your language of choice? Sorry but your being rather pedantic about this.