Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#327
seronx
Orochi die = Orochi being an eight headed/eight tailed snake.


Right A/B and left G/H = L1i/BP/Fetch/Decode
Below A/B and G/H = Cache unit and L2
Left A/B and right G/H = Floating point unit
Inside A/B/G/H = Scheduler/Integer Units/Load-store/registers/etc <== Actual core

Eight actual cores.
Posted on Reply
#328
FordGT90Concept
"I go fast!1!11!1!"
Funny, you had to provide your own picture with boxes pre-drawn on it. Thanks for affirming my point that they aren't obvious. :roll:

Let's throw something not x86 into the mix. Here's an Exynos Octa (big-LITTLE design):

Like Sandy Bridge-EP, the cores are obvious.
Posted on Reply
#329
seronx
FordGT90ConceptFunny, you had to provide your own picture with boxes pre-drawn on it.
I would say it is better than what is shown in "Design of the Two-Core x86-64 AMD “Bulldozer” Module in 32 nm SOI CMOS" - 25 October 2011



The design from Core(Chip) Multiprocessing(CMP) to Cluster-based(Chip) Multi-threading(CMT) needs to be displayed.
A CMP module will always have a single core.
A CMT module within the same space as a CMP module will always have 1+x cores.
Posted on Reply
#331
lexluthermiester
FordGT90ConceptTL;DR: the jury would have to take AMD's word for it.
Just like they'd have to take any other CPU maker's word for it. Your point was pointless.
Posted on Reply
#332
Midland Dog
ParticleThat is an incorrect assessment. The FPU works as two independent units unless either core needs to execute a 256-bit FP op. The units were designed to fuse together for (relatively rare) 256 bit operations.

Calling a module a single core because of how the FP unit works would be akin to having two 3" paint brushes and calling them a single 6" brush because you *can* hold them together to paint a thicker line.
this just leads back to the original arguament, the definition of a core, the best i have heard was that a core is able to do independant logic and calculations without being tied to any other sillicon, dont know how accurate it is but we need to find an actual definition. Would i not be able to call a 4 core chip with hyperthreading an "8" core because it is able to execute instructions on 8 threads?
Posted on Reply
#333
FordGT90Concept
"I go fast!1!11!1!"
I wouldn't be against IEEE establishing standards not unlike SAE and horsepower/torque measurement. Consumers have the right to know that the cores they are buying are independent, conjoined, or multi-threaded. Intel did a good job with the latter via their Hyper-Threading trademark; conversely, AMD does a poor job at informing consumers about Zen's multi-threaded capabilities. Conjoined is always going to have inferior performance (but lower cost) compared to independent which is something consumers should know about.
Posted on Reply
#334
londiste
FordGT90ConceptIntel did a good job with the latter via their Hyper-Threading trademark; conversely, AMD does a poor job at informing consumers about Zen's multi-threaded capabilities.
AMD is doing perfectly good job with Zen's multi-threaded capabilities. SMT is an industry standard term and fits the situation perfectly. Intel's HyperThreading is SMT with a trademarked name slapped on it.
Midland Dogthis just leads back to the original arguament, the definition of a core, the best i have heard was that a core is able to do independant logic and calculations without being tied to any other sillicon
For the definition to be effective - not any logic and calculations but specifically execute instructions of the given instruction set.
Posted on Reply
#335
lexluthermiester
Midland Dogthe best i have heard was that a core is able to do independent logic and calculations without being tied to any other silicon
And that about sums it up. Actual specific components don't matter.
Posted on Reply
#336
Shambles1980
lexluthermiesterAnd that about sums it up. Actual specific components don't matter.
well it does or you can argue 4c8t cpus are 8cores
Posted on Reply
#337
lexluthermiester
Shambles1980well it does or you can argue 4c8t cpus are 8cores
No you can't.
Posted on Reply
#338
FordGT90Concept
"I go fast!1!11!1!"
The only difference is one wide integer cluster versus two narrow ones. The frontside and the backside is otherwise the same.
Posted on Reply
#339
lexluthermiester
FordGT90ConceptThe only difference is one wide integer cluster versus two narrow ones. The frontside and the backside is otherwise the same.
Exactly. The difference is that with SMP, a single core is toggling between tasks/threads so quickly that to us it seems to be handling them at the same time, when in actuality it isn't.
Posted on Reply
#340
FordGT90Concept
"I go fast!1!11!1!"
Guess what the fetcher, core interface unit, and floating point cluster does in Bulldozer.

Look at it this way:
Architecture|Trans|Organization|Source
Vishera|1.2|4m/8t|AnandTech
Sandy Bridge|1.16|4c/8t|AnandTech
Ivy Bridgee|1.4|4c/8t|AnandTech
Sandy Bridge-EP|2.27|8c/16t|Overclock

One of these things is not like the other...

You can see performance numbers on the AnandTech link. Hint: i7-3770K 3.5 GHz almost always wins against FX-8350 4.0 GHz and often by a long mile. Why is that? Because Intel beats the hell out of their dual-threaded cores where AMD divided and conquered in their dual-threaded cores. When Intel is faced with only a single thread, it pulls out all of the stops to get it done. AMD can't. Even when you async like a boss, AMD's shared nature comes back to haunt it often doing 10-50% worse than it should. Can't win single, can't win dual, can't win in terms of transistor count either (whole reason why AMD pursued it). Zen and Bulldozer proves "conjoined cores" were a bad idea. Isolating hardware resources from threads that could use it makes little sense.
Posted on Reply
#341
mouacyk
FordGT90ConceptZen and Bulldozer proves "conjoined cores" were a bad idea. Isolating hardware resources from threads that could use it makes little sense.
It served the purpose of a budget design, reducing transistor requirements, reducing die size, increasing yields, and presenting cheaper alternatives. I would reserve the term innovation for designs that actually improve performance, while minimizing cost, unlike how many fans are describing Bulldozer.
Posted on Reply
#342
FordGT90Concept
"I go fast!1!11!1!"
Look at the transistor counts though: there's virtually no savings. Die size is a combination of transistor count and process: no savings. Yields are mostly the result of die size: no savings. Cheaper? Only if you want an inefficient processor that's really good at integer math on many threads. Judging by the benchmarks, that's not the bottleneck for most software.
Posted on Reply
#343
mouacyk
Based on the white paper, AMD was supposed to accomplish the listed things. Did they know during design that they wouldn't be able to realize those reductions but went for it anyways, to avoid completely busting?
Posted on Reply
#344
FordGT90Concept
"I go fast!1!11!1!"
It costs billions to design and prototype a new architecture. Once they got so many billions in, they were committed to bringing it to the market to get some revenue off it.
Posted on Reply
#346
londiste
Zyll Goliath
He is taking the same route of blaming single FPU. And ignoring everything else on the same picture in the module that is part of a CPU core and shared :)
Posted on Reply
#347
FordGT90Concept
"I go fast!1!11!1!"
londisteHe is taking the same route of blaming single FPU. And ignoring everything else on the same picture in the module that is part of a CPU core and shared :)
Because that's the layman's way of looking at it (stands out in diagrams). As I pointed about before, the transistor count is a dead giveaway that Bulldozer isn't remotely close to being an 8-core CPU. It's a 4-core, 8-thread CPU with dedicated resources for each thread (which is actually really stupid from a performance point of view because this enforces underutilization of hardware resources).

Cores are independent processors. They don't share anything--they communicate via memory subsystems. Each processor pulls the data it needs, executes it, and pushes it back. At no point does one core interfere with another (unless there's some kind of intentional memory lock to prohibit thread cross references). 7zip compression proves Bulldozer "cores" are not independent.
Posted on Reply
#348
lexluthermiester
FordGT90Concept7zip compression proves Bulldozer "cores" are not independent.
Out of curiosity, how is that conclusion reached?
Posted on Reply
#349
mouacyk
lexluthermiesterOut of curiosity, how is that conclusion reached?
Might be referring to this test done at Anandtech, where conclusion is flawed. If you normalize the 7zip scores to the same clock speed, they are identical. The relevance to this thread is that the FX-8150 is claiming 8 cores but the 2600K is only claiming 4 cores:

Posted on Reply
#350
lexluthermiester
mouacykMight be referring to this test done at Anandtech, where conclusion is flawed. If you normalize the 7zip scores to the same clock speed, they are identical. The relevance to this thread is that the FX-8150 is claiming 8 cores but the 2600K is only claiming 4 cores:

Just because a quad core CPU came close does not settle the question of how many actual cores are at play. 7Zip is heavily floating point dependent. Not all programs are. In fact, at the time these CPU's were being made/released most CPU instructions were still being done on the interger side of things, thus the design logic. It was a gamble that didn't pay off. That doesn't mean that the integer cores are not individual cores.
Posted on Reply
Add your own comment
Dec 19th, 2024 06:23 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts