Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#226
Aquinus
Resident Wat-man
londiste- Core is a CPU by definition.
Modern cores aren't that independant. It's not like we're still rolling with Pentium Ds... but please, keep gasping at straws until you find one that's long enough.
Posted on Reply
#227
londiste
AquinusModern cores aren't that independant. It's not like we're still rolling with Pentium Ds... but please, keep gasping at straws until you find one that's long enough.
Could you please elaborate? What do you mean they are not that independent?
Core does not have to be a separate die, core needs to be functionally independent.
Posted on Reply
#228
Aquinus
Resident Wat-man
londisteCould you please elaborate? What do you mean they are not that independant?
Modern CPUs have a lot of shared components for handling the control of cores within a CPU. There is nothing inside a CPU that's full independent from the rest of the CPU most of the time. That's just how modern super-scalar CPUs with multiple cores work. It's not like every single part of what would traditionally be considered a CPU is duplicated within a single core, unless you're doing what Intel did with the Pentium D, which is a huge waste of wafer space... it was also one of the earliest attempts to go the multi-core route. The reality though is that the Penium D was much more like multi-CPU system than a multi-core one. An argument I might make would be that a bulldozer module is more like a CPU than it is like a core because it's duplicating a lot of control logic for the cores themselves.
Posted on Reply
#229
londiste
AquinusModern CPUs have a lot of shared components for handling the control of cores within a CPU. There is nothing inside a CPU that's full independent from the rest of the CPU most of the time. That's just how modern super-scalar CPUs work. It's not like every single part of what would traditionally be considered a CPU is duplicated within a single core, unless you're doing what Intel did with the Pentium D, which is a huge waste of wafer space... it was also one of the earliest attempts to go the multi-core route. The reality though is that the Penium D was much more like multi-CPU system than a multi-core one.
CPU as we popularly know it, the slab of silicone under an IHS, is quite far from the actual definition of CPU. Supporting functionality has been more and more integrated into the die as time and production technology progressed but all that is still supporting functionality. CPU needs instructions and data to be sent in and a place to put the data, neither of which is inherent part of a CPU. Lately other logic has been added - Memory Controllers, Storage Controllers, Bus Controllers etc.

What CPU does is still to process instructions. Fetch-Decode-Execute. A core in a multicore processor is defined same as a CPU.
And every single part of what would traditionally be considered a CPU IS duplicated within a single core in a modern CPU/core.
AquinusAn argument I might make would be that a bulldozer module is more like a CPU than it is like a core because it's duplicating a lot of control logic for the cores themselves.
Core is CPU. Bulldozer module is definitely a CPU, there is absolutely no doubt about that.
The point is that the module really is not two independent CPUs. CPU control logic is Fetch and Decode units. These are shared in a Bulldozer module. There are separate Decode units in Steamroller and Excavator but Fetch remains shared.
Posted on Reply
#230
seronx
Intel made the same mistake as AMD within a couple patents.

patents.google.com/patent/US10140129B2/en
which cites
patents.google.com/patent/US20120166777A1/en

Intel's doesn't make sense. Nor, does AMD's make sense(not related to the above patent, but the previous 2007 one. However the 2005 one is accurate.)

"Processing core having shared front end unit"
which comes from
"The processor 100 may include a plurality of processor cores 102 and a front end 104 shared by the processor cores 102."

-> Processing cores having shared front end unit
vs
-> Processing core having shared front end unit.

//It should be noted that FIG. 1 is provided as an example, not as a limitation, and even though it is depicted that the processor 100 includes two processor cores, the embodiments disclosed herein are applicable to a processor with any number of cores or a system with multiple processors with single or multiple cores.
londisteCPU control logic is Fetch and Decode units.
Nope, it is the scheduler. Of which there are two in a Bulldozer module.

computersciencewiki.org/index.php/Control_unit_(CU)
- The control unit obtains data / instructions from memory
- Interprets / decodes the instructions into commands / signals
- Controls transfer of instructions and data in the CPU
- Coordinates the parts of the CPU

All of the above is handled by the cores scheduler.

Scheduler fetches macro-ops which are decoded into micro-ops.

The cores front-end is a co-processor for implementing various performance enhancing features. It can be swapped out for any other front-end design. Whether, if it is slower or faster even if it is smaller or bigger.
Posted on Reply
#231
londiste
seronxNope, it is the scheduler. Of which there are two in a Bulldozer module.
This is incorrect.
https://en.wikipedia.org/wiki/Central_processing_unit#OperationThe fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions that is called a program. The instructions to be executed are kept in some kind of computer memory. Nearly all CPUs follow the fetch, decode and execute steps in their operation, which are collectively known as the instruction cycle.
Scheduler is usually a part of Execution Unit.
The problem with counting the schedulers is that scheduler placement is an implementation detail and this has been implemented in multiple ways. Not accounting for the size or exact purpose, Bulldozer module has three schedulers, Zen has 7-8, Skylake has one.

Edit:
I missed this part of your post:
seronxcomputersciencewiki.org/index.php/Control_unit_(CU)
- (1) The control unit obtains data / instructions from memory
- (2) Interprets / decodes the instructions into commands / signals
- (3) Controls transfer of instructions and data in the CPU
- (4) Coordinates the parts of the CPU
All of the above is handled by the cores scheduler.
Scheduler fetches macro-ops which are decoded into micro-ops.
It is not handled by the scheduler.
1 - Fetch
2 - Decoder
The article does not elaborate but it seems 3 and 4 go a bit beyond the traditional CPU definition and goes for multi-core and other controllers.
3 - probably a combination of dispatch and schedulers
4 - This is likely at PSP/ME level.

I mean, I get why the entire frontend can be considered a scheduler, this as a broad term is what frontend does. The fact that Operating System has its own scheduler that deals with work distribution to cores-threads does not help. But as a technical term in CPU, Scheduler is a specific function in Execution Unit.
Posted on Reply
#232
seronx
londisteScheduler is usually a part of Execution Unit.
*Execution core.
londisteNot accounting for the size or exact purpose, Bulldozer module has three schedulers, Zen has 7-8, Skylake has one.
Bulldozer has three, Zen has two, Skylake has one.

Zen only has a single scheduler for integer. Of which the reservation stations of are statically divided between each execution unit. These create the scheduler queues, which are only a part of the scheduler.
Posted on Reply
#235
londiste
Slightly varying definitions of Execution Unit. 1995 was a long time ago. The one in the link is a basic execution unit, ALU plus some registers. Control logic was not necessary for something as simple as that. Now where Execution Unit includes number of pipes, control is necessary. This would primarily mean a scheduler.

The link looks like part of a CPU architecture course of some kind, meaning a simplified example.
Posted on Reply
#236
Aquinus
Resident Wat-man
londisteSlightly varying definitions of Execution Unit. 1995 was a long time ago. The one in the link is a basic execution unit, ALU plus some registers. Control logic was not necessary for something as simple as that. Now where Execution Unit includes number of pipes, control is necessary. This would primarily mean a scheduler.
Actually, it was. Those lines have to be managed by some sort of control unit, otherwise it will literally do nothing. This is true for CPUs as early as the 8080.
Posted on Reply
#237
londiste
AquinusActually, it was. Those lines have to be managed by some sort of control unit, otherwise it will literally do nothing. This is true for CPUs as early as the 8080.
Something this simple can easily enough be fed right from dispatch. I meant no control logic in the execution unit.
Posted on Reply
#238
Vya Domus
londisteControl logic was not necessary for something as simple as that.
Control logic is always necessary.
Posted on Reply
#239
qubit
Overclocked quantum bit
EsaTOK, so when does Intel get judged for their advertising and sleezy tactics?
Or are we going to be picky about who gets penalized and who is given get out of jail for free card?
Intel practised literally extortion 15 years ago.

Intel isn't any white knight on white horse, not even grey knight...
jolt.law.harvard.edu/digest/intel-and-the-x86-architecture-a-legal-perspective
That's a very nice strawman argument there. Well done! :)

The fact is, that whatever Intel has done does not justify AMD committing offences, too. Especially so when the issues aren't even related.

Therefore your argument is invalid and you're coming off as an AMD apologist.
Posted on Reply
#240
neatfeatguy
Maybe it's just me, but this whole thread has kind of burned out.


At first it was entertaining, now it's just sad to read through the same things getting said over and over and over again by different people.

Side 1 (for AMD): My dad can beat up your dad!
Side 2 (against AMD): Na-ah! My dad can beat up your dad because your dad is slower!
Side 1: Your dad is gimped so my dad will beat up your dad!
Side 2: Your dad has glasses so he can't see well enough to defend himself. My dad will beat up your dad!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
....

Maybe
Posted on Reply
#241
Shambles1980
I said ages ago it would happen lol.
It always does.
Posted on Reply
#242
cdawall
where the hell are my stars
neatfeatguyMaybe it's just me, but this whole thread has kind of burned out.


At first it was entertaining, now it's just sad to read through the same things getting said over and over and over again by different people.

Side 1 (for AMD): My dad can beat up your dad!
Side 2 (against AMD): Na-ah! My dad can beat up your dad because your dad is slower!
Side 1: Your dad is gimped so my dad will beat up your dad!
Side 2: Your dad has glasses so he can't see well enough to defend himself. My dad will beat up your dad!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
....

Maybe
More of

Side 1 here are facts signed off by people that are considered leaders and experts in the industry.

Side 2 nope.

Side 1 additional information further proving what was backed up by subject matter experts

Side 2 see previous lack of argument.
Posted on Reply
#243
mouacyk
AquinusModern cores aren't that independent. It's not like we're still rolling with Pentium Ds... but please, keep gasping at straws until you find one that's long enough.
That's exactly why AMD is getting the slap on the hand -- redefining market leading terminology with digressing performance (scaling). Wouldn't have been a huge deal if they managed closer to 8x scaling on purely integer loads. When the performance scaling falls short, we try to attribute the causes to the differences in design -- a major one of which is shared instruction fetching.
Posted on Reply
#244
RichF
For some reason, the quote from Hruska's article isn't showing up correctly on my iPhone. Part of it is and some of it isn't. The first part says:

What Dickey and Parmer are actually arguing is that Bulldozer/Piledriver (the FX-9590 said:

Then it skips a lot and goes to the paragraph "A Bulldozer CPU core was different..." et cetera and the paragraph after that shows up fine. The problem occurs both in the AdBlock browser I use and in Safari. So, here is the missing quoted text, for those who couldn't see it due to this bug:

Hruska:

What Dickey and Parmer are actually arguing is that Bulldozer/Piledriver (the FX-9590, specifically) did not deliver the performance they expected from an eight-core CPU relative to Intel CPUs. They argue that the shared resources in the Bulldozer core prevented the chip from “simultaneously multi-tasking” and that because resources were shared between the CPU cores, that Bulldozer “functionally only have four cores.” Both of these claims are factually wrong.

Bulldozer did not support SMT, which allows a CPU to execute more than one thread simultaneously. The fact that performance scales upwards in integer and FPU workloads on a BD/PD processor when moving from four threads to eight is proof that the CPU is not limited to a functional four-core arrangement. As these results from OpenBenchmarking.org show, BD performance improves above the four-thread mark, even in FPU workloads. Integer workloads also show improvements in scaling from four threads to eight. While the absolute degree of scaling may be less, Bulldozer is not a functional quad-core CPU as a matter of defined core count. The fact that its overall performance may have been equivalent to an Intel quad-core has nothing to do with whether the CPU factually had the advertised number of cores.
Posted on Reply
#246
FordGT90Concept
"I go fast!1!11!1!"
AquinusYou know, all of these block diagrams are cute and everything, but the fact of the matter is that 99% of consumers don't care about the internal parts of the CPU. You don't market block diagrams, you market simple information. Mind you, this entire argument is predicated on the idea that the FPU is essential to the operation of a CPU... it is not.
NVIDIA GTX 970 4GB* * 0.5 GB runs at a fraction of the performance of the other 3.5 GB.
AMD FX-8350 8-core* * Fetcher, Core Interface Unit, and FPU are shared so performance will be less than advertised when a blocking scenario is encountered.

FPU is essential to all consumer products. Without the FPU, performance would suffer so much that simple tasks such as web browsing (JPEG images especially rely on FPU for rendering) would be impossible without lengthy delays. No consumer processor made in the last two decades lacked an integrated FPU. You either have to go all the way back to when the FPU was a *new* or leave the consumer space to look at what are effectively ASIC processors like UltraSPARC which are designed specifically to handle database processing. Neither case are relevant to the Dickey and Parmer lawsuit.
seronxpatents.google.com/patent/US10140129B2/en
Appears to describe Hyperthreading to me. The fetcher accepts two threads per core.
mouacykThis is what baseline core scaling efficiency looks like with the data from openbenchmarking.org/result/1110227-AR-AMDSCAL0184:



Even the 2384 does well, because it's got 4 fully independent cores.
Exactly why Dickey and Parmer filed suit.
Posted on Reply
#247
Vya Domus
FordGT90ConceptAMD FX-8350 8-core* * Fetcher, Core Interface Unit, and FPU are shared so performance will be less than advertised when a blocking scenario is encountered.
* Not actually shared
Posted on Reply
#248
FordGT90Concept
"I go fast!1!11!1!"
Vya Domus* Not actually shared
:rolleyes:

Thread agnostic means that the execution units aren't assigned to one thread or the other, they can flip based on demand. That's also why the FPU has a "Frontend" (manages the SMT).
Posted on Reply
#249
Vya Domus
FordGT90Concept[facepalm.jpg]

Thread agnostic means that the execution units aren't assigned to one thread or the other, they can flip based on demand. That's also why the FPU has a "Frontend" (manages the SMT).
FPU = Floating Point Unit , singular , one. One unit.



Have we really gotten to a point when we just don't care anymore ? Plural/singular matters and if the plaintiffs have any hope of winning this they better keep that in mind, we can't just keep mangling these terms forever until we get where we want. It doesn't work like that.
Posted on Reply
#250
FordGT90Concept
"I go fast!1!11!1!"
Thread A can end up on FMAC 1 and/or 2
Thread B can end up on FMAC 1 and/or 2

They are not explicitly assigned; they are thread agnostic. FP scheduler/frontend determines what thread ends up where.
Posted on Reply
Add your own comment
Nov 3rd, 2024 14:46 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts