Wednesday, January 23rd 2019
Bulldozer Core-Count Debate Comes Back to Haunt AMD
AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source:
The Register
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD
Two integer cores in one core. Not rocket science. Blue box aligns with the Thuban picture I posted.
Being unable to seperate highly integrated chips into individually working components is NOT the definition of a core.
Independent execution is.
If you wanted your core to work by itself you would have to add all those other components that have been optimized out.
Same way if you wanted to break a Bulldozer module apart. LOL 0/10, would get sued and lose.
You missed the 2nd FPU unit.
That only has 1 128bit FPU, bulldozer has 2.
Bulldozer was AMD's first AVX supporting chip and they did something different with it.
It required 2 fpu units sharing resources that could also operate independently.
AMD's claim is 8 real cores and we did something different in the name of saving space to give you 8 cores.
It has 8 independently operating cores...... and it does multi-threaded performance Better than 4c/8t.
It is exactly as advertised.
I have tried to explain CPU architectures but you all clearly desire an argument more than the truth, Peace.
Seriously, this is irrelevant. Look at AMD's "Zen" Core slide again. The literally put it next to "Excavator" and it's devoid of any other uses of the word "core." It's plain as day to see AMD acknowledges Excavator "modules" were in fact, Excavator "cores" or they wouldn't compare it to Zen as they did.
Memory controllers and caches are not part of the core, no more than FPUs. In consumer space Athlon64 was the CPU moving memory controller into the CPU, it used to be in northbridge. Same with cache, there were (even x86) CPUs without cache and it was outside the CPU at first.
But please, go actually try and learn what makes a cpu a cpu. And study the evolution of multiprocessor design and how the cores share resources.... it would be enlightening.
must be the same right.
2x Lo 80-bit
2x Hi 64-bit
For FMACs Pipe0 and Pipe1. Lo0+Hi0 = P0 and Lo1+Hi1 = P1
2x Mid 128-bit
For MMXs Pipe2 and Pipe3.
To Steamroller/Excavator;
2x Lo 80-bit
2x Hi 64-bit
For FMACs Pipe0/Pipe1.
1x Mid 128-bit
For MMX Pipe2.
The units themselves are each a FPU. While, all of them are part of the whole floating point core. The FP core however can also be called the Floating Point Unit.
The cores in Bulldozer can also be called Integer clusters. The module can also be called a core. However, most of these distinctions are marketing.
Bulldozer via Industrial+Educational standards has
2x AMD64 cores
1x AMD64 floating-point core.
The cores don't execute x86-64, they execute an internal ISA. The cores are thus separate from the dispatch of those decoded instructions. The core begins at the instruction bus which is the retire queue and ends at the load/store which is the load/store buffers.
Even if AMD went from 2x LSU to 1x LSU there will still be two cores.
If you look at how the word "core" is used in the context of processors, it is the lowest common denominator across all architectures. It describes the discreet hardware that takes an instruction with operands and turns it into a result: cache to cache. The fetcher is a critical component of that going back to at least the 80386:
Excavator "core" is therefore a singular core with two discreet ALUs handling two concurrent threads.
Jump ahead to Pentium 3 there's a fetcher per core:
If the CPU can't fetch an instruction to decode, it literally remains forever idle.
Core 0 can't fetch Core 1's instructions.
Core 1 can't fetch Core 0's instructions.
Core 0 fetches 16B every cycle.
Core 1 fetches 16B every cycle.
If you seriously think the fetcher isn't shared then you're telling me AMD doesn't know their own product (:laugh:). Refresher:
Let me get my van Gogh on again...
This is what a dual-core module would look like (mimics UltraSPARC T1):
You could eliminate the Fetcher/Decoder/FPU entirely from this schematic and it will still qualify as a dual core module because there's two discreet processors there. Likewise, you could clone the FPU, remove the fetcher/decoder for it, and place it under the control of each core's fetcher/decoder and you'd end up with a design very similar to Core 2 Duo.
If there was no FPU under the fetcher, the fetcher wouldn't fetch floating point operations by design. It has to be shared in order to load balance the shared FPU. If one thread is hammering FPU instructions, the processor is better off sending another FPU instruction heavy thread to an entirely different module.
www.extremetech.com/computing/284335-the-garbage-class-action-lawsuit-against-amds-bulldozer-is-headed-to-trial Someone posted that the way AVX-256 is processed by Bulldozer justifies this lawsuit. However, that reasoning calls into question whether any processor that lacks AVX-256 support has even one "actual core", which is clearly absurd.
Beyond how Bulldozer didn't measure up to Intel's design decisions in various ways, it exceeded Intel's performance in certain other ways — such as the number of in flight instructions the processor could handle. Does this mean Intel's processors didn't have true cores in them? After all, they didn't tell consumers that Bulldozer can handle more in flight instructions.
Even earlier processors didn't have FPUs at all. Some supported external FPU chips. Some didn't support even those. Some chips have L4 cache. Old CPUs had no cache at all. Is something that's on the die part of the CPU core, from the point of view of the consumer, like the L4 cache in Broadwell-C? If not, what is the consumer to make of it — that it doesn't exist simply because it's not part of the main chip on the die or part of what CPU architects consider a core? For something that doesn't exist, Broadwell-C's L4 did improve performance tangibly in workloads that are important to consumers — making the obsessing over what's inside cores even more suspect.
There is also the issue of in-order vs. out-of-order design. In-order, which is slower, was dropped back in 1995 with the Pentium Pro. Yet, Intel decided, many years later, to sell Atom to the masses, an in-order design. With the notion that consumers should consider it fraud when a company sells them slow cores — the Atom seems to be a great target for frivolous lawsuits. Not only was it a radical return to in-order design, it was paired with a power-inefficient supporting cast that cast very dramatic doubt on the entire point of Atom's marketing pitch: its performance-per-watt, a performance-per-watt level reached by subjecting consumers to the anemic performance of in-order processing, processing slowness not justified by the savings in power due to the horribly inefficient supporting chipset/GPU. To make matters worse in terms of consumer confusion, Atom was later changed to be out-of-order. There was a ton of pro-Atom netbook hype for quite some time. Then, a large swath of reviewers began writing as if the entire thing was the fault of silly consumers, even though so many of them hyped netbooks while it was trendy to do so.
regmedia.co.uk/2019/01/22/amd-core-class-action.pdf Dickey purchased a FX-9590 and Parmer purchased an FX-8350 both advertised as a " native 8-core desktop processor." AMD tried to throw the case out but the judge said: How does the court answer that question? The definition of "core" must be held by "a reasonable consumer standard." The population ("class") cannot be divided up between experts and amateurs because that's not what false advertising is about.
At this point, the class action lawsuit is very confined in scale: I think that's a mistake seeing how the alleged misrepresentation appears everywhere (retailers, on the retail packaging, on advertising material associated with machines containing the processors, etc.). They're not reaching for the stairs like they could be.
"Plaintiffs allege that the Bulldozer CPUs, advertised as having eight cores, actually contain eight “sub-processors” which share resources..." this statement is absolutely true and where AMD is in trouble trying to redefine what a "core" is. No doubt AMD is going to explain to the jury that sharing L2 cache is not out of the ordinary across many architectures so the plaintiffs' case is kind of weak there but sharing FPUs is something extraordinary in the consumer space.
Keep in mind that the Plaintiffs aren't "tech experts." They'll bring in an expert to argue their case for the jury.
Hruska's article was published in response to the judge's opinion, so the order of things is to rebut Hruska's rebuttal rather than to go back in time to the point in time where the opinion was released and the rebuttal didn't exist.
Hruska was prattling on about technical jargon which is irrelevant to the case. There's really only two very basic questions being asked here:
1) Is the definition of a "core" an "independent processor?" [this is going to be a resounding "yes"]
2) Does Bulldozer sharing resources conflict with the definition of an independent processor? [this can go either way depending on the strength of the arguments presented by the lawyers and witnesses]
A jury of 12 will be answering those question, not you nor I, and their decision will define the word in California and likely beyond.
The Bulldozer CPU was a hybrid architecture. It was neither 8 true cores NOR 4 true cores. Because of it's design it was somewhere inbetween. The performance bares that out. For cost, the performance was a good value. People whining about whether or not they got 8 actual cores need some cheese. Technically, it did have 8 instruction executing cores with an FPU unit for each pair of cores in a module. Because of that logic, which is based in factual functionality of how the CPU works, AMD should win this. For it's time it performed very well for it's price point. Your conclusion is flawed.
At the same time, look at how bulldozer was improved over time. Separate decoder was added in Steamroller and it was suspected and partially shown that with decoder being removed as a limitation, fetch became one. So, in the Fetch-Decode-Execute, Execute had more resources since the beginning, Decode had to be doubled afterwards and to extract the possible performance Fetch would have to be doubled as well. Now if they went through with that the result would have been two independent cores.
FPU claims are fairly irrelevant. In the same way, so are L2 caches. Bringing these up in the court is kind of stupid.
Memory controllers and caches are not integral or required part of the core, no more than FPUs. In consumer space Athlon64 was the CPU moving memory controller into the CPU, it used to be in northbridge. Same with cache, there were (even x86) CPUs without cache and it was outside the CPU at first. Those 8 are pipes, not cores.
Zen has six plus pretty much the same FPU, 10 pipes total in both units in execution stage. Skylake has 8 pipes in the execution unit.
With all this, we are talking about execution units.
Bulldozer: en.wikipedia.org/wiki/File:AMD_Bulldozer_block_diagram_(CPU_core_bloack).PNG
Zen: en.wikichip.org/wiki/amd/microarchitectures/zen#Individual_Core
Skylake: en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Individual_Core
Independently executing means a core (similarly to CPU) should be capable of executing the instruction set, not specific micro-operations. At least when we are talking about x86.
Context: www.tomshardware.com/reviews/processors-cpu-apu-features-upgrade,3569-15.html
Image (Core Interface Unit is abbreviated as "Core IF"):
There's only four and they're responsible for communication among:
-input via L1I
-output via L1D from each integer unit
-input/output via L2
-other modules
Three major components are shared across all Bulldozer iterations:
1) Fetcher (manages high level instructions)
2) Core Interface Unit (effectively a high level cache and communications controller)
3) Floating Point Unit (it's wider in an attempt to match Thuban FPU performance per thread but AVX2 will effectively shutdown access to the FPU by one thread in Excavator)
AMD officially calls them "integer cores" judging by AMD slides. Pictures above call them integer clusters. Lawsuit calls them "subprocessors." One can't deny that AMD has done a poor job of messaging here.
It was supposed to go up against Sandy Bridge, but AMD were then forced to reduce the price because performance was so rubbish.
On top of that, it's not really a true 8-core (hence my use of "dodgy" in my statement) and can't therefore be claimed as such, no matter how one spins it, hence this lawsuit. I hope the lawsuit wins and no one ever tries this again.
But if we are talking about false advertising, where's the lawsuit about Intel's CPU generation advertising?
After all "7th" gen was nothing but carbon copy of 6th gen with just clock speed tweaks and really had no business of being anything else than new xx50 designation CPU models.
Even "9th" gen is more of same old Skylake with only some bug tweaks. Though at least extra cores would give justification for calling it as seventh gen.
Not to forget artificial CPU socket roulette to force people to buy new motherboards:
www.techpowerup.com/250109/core-i9-9900k-achieves-5-50-ghz-overclock-on-a-z170-chipset-motherboard
And then there are those compilers provided by Intel two decades ago claiming compatibility with also AMD...
While they actually disabled multimedia extensions supported by CPU, if program was run on AMD CPU, to give Intel CPUs artificial advantage. OK, so when does Intel get judged for their advertising and sleezy tactics?
Or are we going to be picky about who gets penalized and who is given get out of jail for free card?
Intel practised literally extortion 15 years ago.
Intel isn't any white knight on white horse, not even grey knight...
jolt.law.harvard.edu/digest/intel-and-the-x86-architecture-a-legal-perspective
Performance was bad for a couple reasons but, a very big reason was that the integer cores were gimped compared to previous generations and these block diagrams everyone is showing describes that; less ALUs and AGUs means less uOPs per clock, less uOPs per clock results in lower IPC numbers, and as a result, poor performance per core. You know what it doesn't result in? Worse per-core scaling and I think @cdawall already did an excellent job of illustrating that. If this doesn't inoculate us from this misconception, considering it's comparing apples to apples, then I don't know what will.
I'm sorry people, but an 8 core CPU doesn't have a requirement for those cores to not be crap. 8 crappy cores are still 8 cores and a core is still a core without the FPU. People are grasping at any straws to find substantial arguments at this point. The reality is that if you try to do that in court, they'll see what you're doing, because it means that your argument doesn't have a very strong foundation because you've changed it so many times to support a particular narrative.
- The part that is NOT shared is one individual block. Everything else is shared.
- Core is a CPU by definition.