Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#176
Vya Domus
You simply don't know what is power gated and what isn't, give it up.

But from the fact that there is still scaling to be gained even for floating point throughput by enabling both cores within the modules suggests there is more to it than just disabling the ALU. Clearly the fetch/decode/FPU are affected as well when disabling one core per module.
Posted on Reply
#177
londiste
You are derailing the topic with power gating. Stop putting words in my mouth. I have not said ALUs are the only thing that can be power gated and I indeed expect everything can be power gated, probably on a unit level as well as on more granular level in many cases.

Increased throughput in this case - including the floating point throughput in some cases - has not that much to do with cores. It is related to available compute pipelines as well as managing the work.
Fetch/Decode/FPU are affected positively when one Integer Cluster is disabled. They have to do less work. Especially with OS then not pushing 2 threads worth of work into the module :)
Posted on Reply
#178
Particle
Midland Dog8 alu from memory, for integer ops its an 8 core for floating point its a 4 core, simples, i personally would have called it an 8 thread cpu, not and 8 core. at the same time its up to the customer to do some research, as a quad core its decent perf but for an 8 core its kind of pathetic
That is an incorrect assessment. The FPU works as two independent units unless either core needs to execute a 256-bit FP op. The units were designed to fuse together for (relatively rare) 256 bit operations.

Calling a module a single core because of how the FP unit works would be akin to having two 3" paint brushes and calling them a single 6" brush because you *can* hold them together to paint a thicker line.
Posted on Reply
#179
londiste
FPU consists of 4x64-bit pipes that can be fused into 2x128-bit or 1x256-bit as needed. 4x64b and 2x128b can be shared to two separate threads just fine.
FPU really is not a reason why module would be a single core. FPU has not always been a part of CPUs and that includes fairly recent stuff for example ARMv6 or v7.
Posted on Reply
#180
Vayra86
FordGT90ConceptPerformance doesn't really matter. It's definition of words in the eyes of the public and whether or not AMD used words to describe their product in a way that was misleading to the public.

The public understand "core" as independent processors. That's not the case with AMD modules.
Interesting take on it, and yes, plausible too.
Posted on Reply
#181
FordGT90Concept
"I go fast!1!11!1!"
Vya DomusYou are making a lot of assumptions here it seems without cheeking first, cores can be turned off. You do understand what independent means I presume, so I ask you again how can you do this without breaking the functionality of the module ?

On reason why AMD split dispatcher in future iterations of the design is that they could shutdown more transistors when there was only one thread directed at the module. This is not an issue with independent cores at all.

That does exactly what I said It does: soft powers off one integer core per module by denying a thread to each module. The settings below are per module which are the equivalent of powering down an independent core. It affirms what I said about the design: you can't stop just core0, or core6 like independent cores can. So many components being shared limit its ability to do so.
AusWolf"12 members of the public (not necessarily from an IT background)"

It is stupid to have non-IT people in the jury in a case like this.
It's the public that was damaged, not necessarily IT people that have a deeper understanding of what they're buying.

Also, juries are always randomly selected and screened for conflicts of interest. They're always supposed to be neutral and representative of the district's population.
londisteLook at the Bulldozer block diagram @Vya Domus posted a few posts up. It has been said in this thread repeatedly that AMD did provide fairly nice details about the Bulldozer Architecture.
That block diagram is a single module. The moment you shut down any parts that are not duplicated it will no longer work. The duplicated parts are the two Integer Clusters.
The fact that AMD never produced a diagram of a "single core" of Bulldozer is evidence that closest similarity to a "single core" in other designs is, in fact, what AMD calls a module. Module is the smallest, complete processor (what a core is) Bulldozer has.
Posted on Reply
#182
mouacyk
Vya DomusIt simply doesn't matter if it's singular hardware block or not. Look how many entries the decode stage has.




You have some serious reading issues.
Thanks for posting this diagram. If I have 2 independent integer instructions to execute, is it possible for cluster 1 and cluster 2 to execute at the exact same clock cycle without delaying any of them?
Posted on Reply
#183
FordGT90Concept
"I go fast!1!11!1!"
Yes, assuming the dispatcher doesn't run in to any blocking scenarios (e.g. only a 256-bit FPU instruction to execute).
Posted on Reply
#184
Shambles1980
AusWolf"12 members of the public (not necessarily from an IT background)"

It is stupid to have non-IT people in the jury in a case like this.
i dont see how that is true..
the case is
"normal members of the public were sold "cores" which weren't the same as a traditional core. But amd says that normal members of the public would have known the difference"

Who better to have than normal members of the public if the argument is the public should know the difference.
Posted on Reply
#185
Vya Domus
mouacykThanks for posting this diagram. If I have 2 independent integer instructions to execute, is it possible for cluster 1 and cluster 2 to execute at the exact same clock cycle without delaying any of them?
Each cluster can execute 4 instructions concurrently in total , two arithmetic and two memory operations.
FordGT90ConceptYes, assuming the dispatcher doesn't run in to any blocking scenarios (e.g. only a 256-bit FPU instruction to execute).
Floating point instructions do not block arithmetic ones as far as I know. The only limitation appears when a single 256 bit operation is issued which blocks any other floating point instruction for the respective clock cycles.
Posted on Reply
#186
FordGT90Concept
"I go fast!1!11!1!"
If you have two threads which have a series of 256-bit instructions queued, only one thread will proceed per clock.
Posted on Reply
#187
Vya Domus
In the case of two threads which consist of a sequence of 256 bit ops each, blocking integer ops isn't even a consideration anyway.
Posted on Reply
#188
FordGT90Concept
"I go fast!1!11!1!"
FX-8350 can only execute 4 256-bit FPU operations at a time. Ryzen 2700X can execute 8 256-bit FPU operations at a time. The duties of a core aren't exclusively integer in nature.
Posted on Reply
#189
Vya Domus
And this relevant to the discussion why ?
Posted on Reply
#190
FordGT90Concept
"I go fast!1!11!1!"
There are situations where a Bulldozer module behaves like a single core with SMT.
Posted on Reply
#191
Patriot
FordGT90ConceptFX-8350 can only execute 4 256-bit FPU operations at a time. Ryzen 2700X can execute 8 256-bit FPU operations at a time. The duties of a core aren't exclusively integer in nature.
And Phenom II can do 0 256-bit FPU, AVX support does not a core make or break. For that matter FPU does not a core make or break.
That said... the FX 8150 and line can do 8x 128-bit FPU operations or can combine and do 4x 256-bit.
A poorly architected design does not change the core count no matter how much you want to argue over it.
Being able to turn cores off in pairs or as singles does not change the core count... there was a time where there was no power gating, all on or all off, were they no longer cores then?

Plain and simple frivolous lawsuit. The performance sucked, they overhyped it, the only argument they have IS performance of those 8 cores was not better than 4 quick cores and that is a bogus argument.
Posted on Reply
#192
Vya Domus
FordGT90ConceptThere are situations where a Bulldozer module behaves like a single core with SMT.
Are you seriously going to use AVX as an argument for all this ?
Posted on Reply
#193
Shambles1980
what you, i or amd want to classify as a core. is not relevant, the only thing relevant to the case is what did people think a core was due to convention of the time. and did amd provide people the thing they expected when they said 8 cores.
AMD says people should have understood it wasn't 8 fully independent cores even though they didn't label it as such.
The law suit says amd should have labeled it properly and not doing so was deliberate.

----------------

P.s

I dont think you could possibly arrive where we are today if they did have 8 real cores.
and you may want to bring up infinity fabric and how things are glued together as a Scandal that just isnt anything.. But the thing with that is.. that genuinly isnt an issue and will never end up in a thread like this 4-5 years down the road.
Posted on Reply
#194
FordGT90Concept
"I go fast!1!11!1!"
Shambles1980The law suit says amd should have labeled it properly and not doing so was deliberate.
Exactly.
core == processor
integer core != core (rather integer core is a component of a core)
Federal Trade Commission: Under the law, claims in advertisements must be truthful, cannot be deceptive or unfair, and must be evidence-based.
AMD used two phrases in describing what Bulldozer was that were not deceptive: module and integer core. They elected not to use these phrases instead, went the deceptive, untruthful one: core. Outside of the context of Bulldozer, references to a "core" as an "integer core" are virtually nonexistent. Since the debut of the multicore CPU, cores meant individual processors sharing the same socket.

Evidence is found in how Bulldozer was designed differently. A core in any other CPU is effectively an independent processor. By the same logic, AMD's module has more similarities with a core than AMD's integer cores do. Evidence, therefore, strongly suggests untruthfulness here too.
Posted on Reply
#195
wiyosaya
qubitHopefully this lawsuit will discourage AMD from using such a cludgy, low performance compromised design in the future.
IMO, that went out the door with clueless Rory Reed.
Posted on Reply
#196
qubit
Overclocked quantum bit
wiyosayaIMO, that went out the door with clueless Rory Reed.
Let’s hope it stays that way.
Posted on Reply
#197
Patriot
FordGT90ConceptExactly.
core == processor
integer core != core (rather integer core is a component of a core)


AMD used two phrases in describing what Bulldozer was that were not deceptive: module and integer core. They elected not to use these phrases instead, went the deceptive, untruthful one: core. Outside of the context of Bulldozer, references to a "core" as an "integer core" are virtually nonexistent. Since the debut of the multicore CPU, cores meant individual processors sharing the same socket.

Evidence is found in how Bulldozer was designed differently. A core in any other CPU is effectively an independent processor. By the same logic, AMD's module has more similarities with a core than AMD's integer cores do. Evidence, therefore, strongly suggests untruthfulness here too.
You are basing core defn on existence of AVX support. In that case there are 20 years of non cpu cores.
There are 8 int and 8 128bit FPU compute units. Those are the facts. How the cpu is composed modularly does not change the fact that there are in fact 8 cores.
Zen is composed of 4 core modules. That doesn't mean that a 2700x has 2 cores it has 8.
Posted on Reply
#198
Shambles1980
its not a matter of what we call a core, or what amd call a core.
Its a matter of what the general accepted definition of a core was at the time.
AMD even in the evidence provided from other users here admit that the bulldozers did not use a conventional core.

the law suit basically says
People bought bulldozers thinking they were getting 8 conventional cores.
amd Deliberately neglected to state that they were not the cores that you were probably expecting.
People should be compensated.

you are welcome to try and say amd adequatly explained the cores in bulldozers were not the same as those in other multi core processors of the time.
But then you have to remember you are talking to other people who Also knew from the start what they were.

The people who were "allegedly" ripped off are the people who just saw a box saying "8cores" and bought it not knowing anything about it other than more cores = more better.

Now i can argue back and forth as to why i don't call them cores, but that does not matter, just as you can argue back and forth why you think they are cores.
the only thing that matters is people expected product A and received product B, and Product B was not adequately advertised to state that it was not the same as product A.
Posted on Reply
#199
FordGT90Concept
"I go fast!1!11!1!"
PatriotThe lack of reading comprehension you have shown is exceptional. You are basing core defn on existence of AVX support. In that case there are 20 years of non cpu cores.
I'm basing it on this (Excavator versus Zen):

And this (Thuban):

Thuban -> core
Excavator -> core but let's call it a module and slap it next to a...
Zen -> core ...because they're basically one in the same, right?

This slide should be exhibit A for the plaintiff (look at that title even :laugh:):

2 integer cores != 2 cores

If you put Excavator cores or Zen cores in the Thuban diagram, you end up with six of them either way and both can handle 12 threads. The *only* difference is that Excavator has extra ALUs and a decoder to accelerate the second thread.
Posted on Reply
#200
Patriot
What people expect is irrelevant, only that product matches advertisement, which it does.
AVX support is irrelevant, even current cpu's have unequal avx support even throughout intel's server linup.

Even assuming FPU support is now expected as part of a cpu core. A bulldozer 8 core cpu can do 8 int or 8 independent fpu calculations at the same time.
AMD said we tried something different, we tried to make a more efficient cpu to give you 8 real cores instead of 4 cores and 4 hyperthreads.
They in fact did what they said.

Lets look at this again since you all clearly missed it.



Look at that sandybridge with Hyperthreading for a grand 4x speadup. 4cores 8 threads for 4x performance.
990x 6 cores 12 threads for 6x performance. (also 1k)
AMD 8 cores 8 threads for a little over 6x improvement.

Is their scaling bad? IPC worse? absolutely.
But did they deliver what they promised? Absolutely.

BTW this is a FPU benchmark. That is not 4 FPU cores quite clearly it is 8 poorly scaling ones.


Also for those clearly not understanding CPU core architecture history and WHY FPU does not a core make...
Here is Thuban... what's that, only 1 128-bit fpu? So is it also not a core?

Posted on Reply
Add your own comment
Dec 18th, 2024 08:08 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts