- Joined
- Aug 20, 2007
- Messages
- 21,407 (3.41/day)
System Name | Pioneer |
---|---|
Processor | Ryzen R9 9950X |
Motherboard | GIGABYTE Aorus Elite X670 AX |
Cooling | Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans... |
Memory | 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30 |
Video Card(s) | XFX RX 7900 XTX Speedster Merc 310 |
Storage | Intel 905p Optane 960GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs |
Display(s) | 55" LG 55" B9 OLED 4K Display |
Case | Thermaltake Core X31 |
Audio Device(s) | TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED |
Power Supply | FSP Hydro Ti Pro 850W |
Mouse | Logitech G305 Lightspeed Wireless |
Keyboard | WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps |
Software | Gentoo Linux x64 / Windows 11 Enterprise IoT 2024 |
I've been doing some behind the scenes research into AMD's so called Linux "Performance Marginality." When I initially began researching this, I had big plans to write an independent research script to attempt to prove the crash can happen in Windows with a program to prove it. Unfortunately, I never quite got there, and it appears I may even have been off on my expected results. The crash is triggered by ASLR, and Windows doesn't use this, generally. Javascript might, but find me any webpage that spawns a 16 thread javascript process that isn't mining coins malware style and I'll be genuinely shocked.
What did come of this is a document where I detailed my results with the RMA. It appears if nothing else, there is heavy evidence indicating there is not a new stepping, but actually just improved binning to mitigate the issue amongst those whom complain. It's circumstantial evidence at this point, but given AMD has declined to comment repeatedly when asked how they fix this, I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request, and standard Ryzen grade CPUs simply don't have a fully functional ASLR function under load (at least, at the binning level they chose).
I'm putting the document I typed up below, including evidence, in hopes you guys can do more research and maybe find enough to make this case a bit more than circumstantial. As it is, I'm out of time and energy to pursue this further, but it certainly seems suspect.
BEGIN PM (Originally sent to W1zzard and company, advised to share with community):
As a user of Gentoo Linux, I have been hit hard by the so-called Ryzen “Performance Marginality.” This manifests itself as an event in which several build jobs running concurrently will crash a random process on the system, usually (but not necessarily) one of the running build jobs. The problem is well documented, and AMD is offering RMAs to affected users. The thing is, that makes it sound like not everyone is affected. Truth be told, after a lot of online research, it is my opinion that anyone with a processor older than build week 25 is affected. Since anything newer than build week 20 has not made it into retail yet (at least, if user reports can be believed), this means nearly all Ryzen processors on the market at present time are affected by this issue.
This is a big deal, and not just on Linux. Why?
The issue vanishes in Linux with nearly all users when they turn off Kernel ASLR (Address Space Layout Randomization). This is a critical security feature that is not presently used much in Windows (and frankly, may never be) but is already being used inside web browsers in VMs like Javascript and similar. I’d be very interested in how a loaded Ryzen VM performs with Javascript longterm, for example. I’m sure this issue can manifest itself elsewhere if ASLR is truly being corrupted under load.
What else is newsworthy here? Well, the issue does not appear to be fixed. By that I mean, there is no new stepping. It appears by all accounts that the most likely “fix” for this issue AMD is employing is to simply bin the processor better (that means picking a better performing wafer of silicon). This also explains why Threadripper and EPYC are “unaffected.” They are ALREADY binned higher.
To test this theory, I submitted my processor for an RMA. All users are reportedly getting “fresh from the presses” Ryzen’s manufactured not too long ago. Personally, my theory is that they are being pulled straight from assembly line binning process and used for RMAs. The fact that my CPU took nearly 2 weeks to “prepare” but got to me almost overnight only supports this theory. Anyhow, my CPU is made in Week 33. You can see this vs my old Week 9 Ryzen compared below:
Note, in the images above, the older CPU container has a plastic shield that is much more “shiny” for some reason. It obscures the laser markings a bit but they should still be legible. I think it is just a packaging difference.
The new CPU has been opened on the bottom (no sticker), as prior reports indicated. It was also shipped rather pathetically. Unfortunately, I forgot to photograph this fact in my excitement, but I can certify there was no bottom “security” sticker and online reports support this. Have a look at the poor packaging anyways for kicks:
The CPU, as predicted, is much higher binned or otherwise a “golden” chip. It does 1.425v 4.1Ghz all cores where it took 1.475v to attain 4.0Ghz All cores on my old Ryzen. It also lets the IMC fly up to 3600Mhz where before, 3200Mhz was a struggle. Here are some relevant comparison shots.
A basic overview of my old Ryzen. Lacking memory/voltage tabs, but this is all I could ever push out of it, and my “daily driver” clocks were lower. IMC was at 3200 MHz with 4 Single rank Samsung B-Die DIMMS. Clock was 4Ghz with 1.475v.
My new Ryzen. Clocks higher, with less volts. Obviously better binned or otherwise golden. IMC goes outrageously high at 3600 MHz. Same memory/DIMMS as above.
Oh, and yes, the issue is fixed.
What does this all mean?
I think AMD is binning run of the mill Ryzen CPUs so low that ASLR is effectively broken as soon as things get "hot" under load. I don't have direct confirmation of this yet, but a lot of circumstantial evidence, mostly found via myself and this thread here:
https://community.amd.com/thread/215773
It's a long read, but the evidence is there, if you look. I'd recommend the later/within last 2 month posts as they cover the RMA process and reports of binning/testing going on prior to chip arrival.
What did come of this is a document where I detailed my results with the RMA. It appears if nothing else, there is heavy evidence indicating there is not a new stepping, but actually just improved binning to mitigate the issue amongst those whom complain. It's circumstantial evidence at this point, but given AMD has declined to comment repeatedly when asked how they fix this, I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request, and standard Ryzen grade CPUs simply don't have a fully functional ASLR function under load (at least, at the binning level they chose).
I'm putting the document I typed up below, including evidence, in hopes you guys can do more research and maybe find enough to make this case a bit more than circumstantial. As it is, I'm out of time and energy to pursue this further, but it certainly seems suspect.
BEGIN PM (Originally sent to W1zzard and company, advised to share with community):
As a user of Gentoo Linux, I have been hit hard by the so-called Ryzen “Performance Marginality.” This manifests itself as an event in which several build jobs running concurrently will crash a random process on the system, usually (but not necessarily) one of the running build jobs. The problem is well documented, and AMD is offering RMAs to affected users. The thing is, that makes it sound like not everyone is affected. Truth be told, after a lot of online research, it is my opinion that anyone with a processor older than build week 25 is affected. Since anything newer than build week 20 has not made it into retail yet (at least, if user reports can be believed), this means nearly all Ryzen processors on the market at present time are affected by this issue.
This is a big deal, and not just on Linux. Why?
The issue vanishes in Linux with nearly all users when they turn off Kernel ASLR (Address Space Layout Randomization). This is a critical security feature that is not presently used much in Windows (and frankly, may never be) but is already being used inside web browsers in VMs like Javascript and similar. I’d be very interested in how a loaded Ryzen VM performs with Javascript longterm, for example. I’m sure this issue can manifest itself elsewhere if ASLR is truly being corrupted under load.
What else is newsworthy here? Well, the issue does not appear to be fixed. By that I mean, there is no new stepping. It appears by all accounts that the most likely “fix” for this issue AMD is employing is to simply bin the processor better (that means picking a better performing wafer of silicon). This also explains why Threadripper and EPYC are “unaffected.” They are ALREADY binned higher.
To test this theory, I submitted my processor for an RMA. All users are reportedly getting “fresh from the presses” Ryzen’s manufactured not too long ago. Personally, my theory is that they are being pulled straight from assembly line binning process and used for RMAs. The fact that my CPU took nearly 2 weeks to “prepare” but got to me almost overnight only supports this theory. Anyhow, my CPU is made in Week 33. You can see this vs my old Week 9 Ryzen compared below:
Note, in the images above, the older CPU container has a plastic shield that is much more “shiny” for some reason. It obscures the laser markings a bit but they should still be legible. I think it is just a packaging difference.
The new CPU has been opened on the bottom (no sticker), as prior reports indicated. It was also shipped rather pathetically. Unfortunately, I forgot to photograph this fact in my excitement, but I can certify there was no bottom “security” sticker and online reports support this. Have a look at the poor packaging anyways for kicks:
The CPU, as predicted, is much higher binned or otherwise a “golden” chip. It does 1.425v 4.1Ghz all cores where it took 1.475v to attain 4.0Ghz All cores on my old Ryzen. It also lets the IMC fly up to 3600Mhz where before, 3200Mhz was a struggle. Here are some relevant comparison shots.
A basic overview of my old Ryzen. Lacking memory/voltage tabs, but this is all I could ever push out of it, and my “daily driver” clocks were lower. IMC was at 3200 MHz with 4 Single rank Samsung B-Die DIMMS. Clock was 4Ghz with 1.475v.
My new Ryzen. Clocks higher, with less volts. Obviously better binned or otherwise golden. IMC goes outrageously high at 3600 MHz. Same memory/DIMMS as above.
Oh, and yes, the issue is fixed.
What does this all mean?
I think AMD is binning run of the mill Ryzen CPUs so low that ASLR is effectively broken as soon as things get "hot" under load. I don't have direct confirmation of this yet, but a lot of circumstantial evidence, mostly found via myself and this thread here:
https://community.amd.com/thread/215773
It's a long read, but the evidence is there, if you look. I'd recommend the later/within last 2 month posts as they cover the RMA process and reports of binning/testing going on prior to chip arrival.
Last edited: