Tuesday, August 8th 2017
AMD Confirms Ryzen Marginality Performance Issue Under Linux, TR and EPYC Clear
An issue on AMD's Ryzen performance under certain Linux workloads, which caused segmentation faults in very heavy, continuous workloads on the Ryzen silicon (parallel compilation workloads in particular) has been confirmed by AMD. Tests like Phoronix's Test Suite's stress run quickly bring the Ryzen processors to their knees with multiple segmentation faults. While this problem is easy to cause under very heavy workloads, the issue is virtually absent under normal Linux desktop workloads and benchmarking,
AMD also confirmed this issue is not present in EPYC or Threadripper processors, but are isolated to early Ryzen samples under Linux (AMD's testing under Windows has found no such behavior.) AMD's analysis has also found that these Ryzen segmentation faults aren't isolated to a particular motherboard vendor, but are problems with the processors themselves. AMD encourages Ryzen customers who believe to be affected by the problem to contact AMD Customer Care. Some of those who have contacted customer care about the segmentation faults have in turn been affected by thermal, power, or other problems, but AMD says they are committed to working with those encountering this performance marginality issue under Linux. AMD will also be stepping up their Linux testing/QA for future consumer products.
Sources:
Phoronix, AMD Confirms Ryzen Issue - Phoronix
AMD also confirmed this issue is not present in EPYC or Threadripper processors, but are isolated to early Ryzen samples under Linux (AMD's testing under Windows has found no such behavior.) AMD's analysis has also found that these Ryzen segmentation faults aren't isolated to a particular motherboard vendor, but are problems with the processors themselves. AMD encourages Ryzen customers who believe to be affected by the problem to contact AMD Customer Care. Some of those who have contacted customer care about the segmentation faults have in turn been affected by thermal, power, or other problems, but AMD says they are committed to working with those encountering this performance marginality issue under Linux. AMD will also be stepping up their Linux testing/QA for future consumer products.
45 Comments on AMD Confirms Ryzen Marginality Performance Issue Under Linux, TR and EPYC Clear
There's a reason why I don't shop at Home Depot, they tried to cover up the credit card hack. Only after it was exposed that they came out and said "We're sorry". Well "sorry" ain't good enough!
It's a business. They are in it for the money regardless of who it is or what they say.
As for why this issue occurred on Linux and not on Windows, it could be that Linux (being that Linux tends to be more on the enthusiast front) was using some kind of instruction set in a weird way whereas Windows tends to be more conservative in terms of using newer processor instruction sets since Microsoft wants to make sure that Windows runs on just about anything including some old-ass Pentium 4 machine.
The errors in the uOP cache is clearly a corruption happening inside the CPU core, micro operations are generated in the front-end/prefetcher, and since the hardware detects these there it's clearly a hardware bug.
I was terribly plagued by this and found that a final 1.2v SOC completely eliminated the bug in all forms and test suites for me.
Weird.
And define "completely eliminated". This is not how electronics work. This problem has already manifested, so it is there - that's the only sure thing.
So you can't say that a problem has been eliminated just with stability tests. Lowering voltage might only make this less probable.
Now we need an explanation and a proof that it won't happen...
All chips seems to have the potential, but silicon quality seems to play a factor in how likely it is to occur. As you know, bumping the voltage does lower the rise/fall time of the transistors, but it's still not enough to guarantee synchronicity, and would not eliminate all disturbances towards the end of a cycle. A proper fix would require a realignment of the circuits in this region of the CPU.
threadripper is not affected