Wednesday, January 23rd 2019
Bulldozer Core-Count Debate Comes Back to Haunt AMD
AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source:
The Register
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD
If only AMD had called them what they truly were, there would have been zero basis for this case: Conjoined Cores
To answer my question (which I've asked twice without clear answers) of whether a Bulldozer module can start work on two different instructions in the same clock cycle, the answer is never:
If you take Kumar's paper in whole, it warns that performance of "conjoined core" is inferior to independent cores. That's what the lawsuit alleges, AMD knew it when they decided to go with it, and they tried to pass it off as a better product than it is by omitting key details in marketing. I'd argue that if AMD even put "8 conjoined core" on the box, this lawsuit wouldn't have happened but they didn't.
What mouacyk quoted may actually explain why 7zip has poor performance on Bulldozer. Most of the instructions 7zip requires are simple array operations. If they're hitting the fetcher faster than the fetcher can cycle between threads when each thread only requires one, maybe two cycles to complete, that's how you can end up with such a massive performance hit. 7zip benefits hugely from hyperthreading because the dual thread fetcher is queuing up two threads for the same compute resources which would otherwise be underutilized. It stands to reason that in tests where Bulldozer does poorly, Hyperthreading will do exceptionally well.
I provided a plethora of examples of what consumers understand as a core from Intel, MIPS, AMD, and IBM. Either Bulldozer is wrong or the rest of the industry is wrong. My money is on Bulldozer.
If the arguments will boil down to "Here, look what everyone else does" or "This product had worse performance than competing product X", well, let's just say they are going to have an exceptionally weak case.
I was shocked Seagate lost the lawsuit in regards to the definition of GB. Seagate was already right in labeling their products going back decades. When Seagate said there was 80 billion bytes (80 GB), there were 80 billion bytes usually with some surplus. The court ruled that 80 GB actually means 80 GiB (insert Jackie Chan WTF here) so Seagate defrauded customers of the difference (5,899,345,920 bytes). The technical argument was completely on Seagates side. It was Microsoft that mislead people by using GiB math with GB labels, not Seagate which used GB math and GB labels. The court didn't care and summary judgement was given to the plaintiff because Microsoft wasn't the one on trial, Seagate paid up.
The technical argument is entirely in the plaintiffs favor in this case. Benchmarks, conceptual papers, instruction papers, schematics, diagrams, and competing products all evidence that; however, just because the technical details are in favor of one side doesn't mean the court will see it that way. It certainly didn't in Seagate's case. How this plays out is going to depend on whose lawyers are more convincing.
I'm still miffed that Microsoft still doesn't show appropriate units to match the math they are using. I'm also miffed Seagate had to pony up for something they were blameless in (all they can be accused of is not being proactive in labelling). It is what it is.
Depending on how the lawsuit goes here, we might see future processors contain a brief description of what a "core" is as defined by the court. Clarity is good in a market, even if meanings were twisted from intent.
1 GB in legalese is 1,073,741,824 bytes. What's on the package now? Language clearly defining 1 GB as 1 billion bytes to correct the legalese definition. The court is under no obligation to enforce an international standard.
Most likely the court is going to rule in favor of the poor, defrauded public like how the Seagate case went. That means AMD loses and "core" means "independent processor." Anything that doesn't meet that test needs clarification on the packaging which isn't a bad thing.
Seriously, I hope ya'll are learning something from this. I know I learned quite a bit about Intel Tera-Scale.
And how do you prove the performance wasn't what people thought it would be ? Weren't there reviews available ? Did AMD go out of their way to hide anything about their product ? Unless there are going to be clear cut answers to that, this really will be a matter of creative thinking.
Not that I actually necessarily believe AMD will win. They are 100% in the right but US courts have proven to be a wild west numerous times.
...this thread has entered the looping phase.
This is why we've entered a loop, no can put their finger on what exactly is wrong.
1. Shared fetcher.
2. Depending on iteration, shared decoder.
3. Depending on iteration, shared dispatcher.
4. Shared floating point units.
5. Shared Core Interface Unit.
...we also got into why these are a problem:
1. The fetcher is incapable of saturating the ALUs in a lot of cases where it has to service both integer clusters. Thuban was able to in the same scenarios.
2. + 3. AMD choose to split the decoder and dispatcher for reasons revolving around power efficiency and performance.
4. AMD was really fixated on the idea that GPUs would take over FPU so, per thread, Bulldozer really offers no improvement over Thuban. Because collisions can happen, in practice it can be slower.
5. All communication with the rest of the system flows through this unit. Windows 10 sees the Core Interface Unit and believes it is looking at a core. It looks at the fetcher offering to take two threads and interprets that as two logical processors.
Overall, a Bulldozer module is a lot of transistors short of a dual-core. That was their intent, after all.
This is also the first I have heard of CPUs being "wrong".
Then you have the group of people that think an Intel i7 chip has 7 cores, the i5 has 5 cores and so on. They might know a little more about computers than your grandma, but they don't really know all that much... and when they find out that the 8 "core" Bulldozer isn't quite 8 full, complete cores, likely after buying one, they might be a little bit mad, because they thought their 8 core FX-8350 was gonna blow their buddie's much more expensive 6 core 3930k out of the water... but then, they probably didn't research their product all that well, or read very many reviews, or maybe not even understand computer hardware quite so well in general, so that's where I agree with the whiny entitled snowflakes bit. My eyes begin to glaze over a little bit when I read one of johnnyguru's PSU reviews (or, more relevant, looking at these block diagrams and reading everyone's argument on what makes a CPU a CPU), but at least I understand enough of it (maybe not the in depth technical details) to know whether or not I'm buying a turd.
The fact that whiny snowflakes(backed by greedy lawyers) are doing the bemoaning does not make an argument automatically invalid.