Wednesday, January 23rd 2019
Bulldozer Core-Count Debate Comes Back to Haunt AMD
AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source:
The Register
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD
Core does not have to be a separate die, core needs to be functionally independent.
What CPU does is still to process instructions. Fetch-Decode-Execute. A core in a multicore processor is defined same as a CPU.
And every single part of what would traditionally be considered a CPU IS duplicated within a single core in a modern CPU/core. Core is CPU. Bulldozer module is definitely a CPU, there is absolutely no doubt about that.
The point is that the module really is not two independent CPUs. CPU control logic is Fetch and Decode units. These are shared in a Bulldozer module. There are separate Decode units in Steamroller and Excavator but Fetch remains shared.
patents.google.com/patent/US10140129B2/en
which cites
patents.google.com/patent/US20120166777A1/en
Intel's doesn't make sense. Nor, does AMD's make sense(not related to the above patent, but the previous 2007 one. However the 2005 one is accurate.)
"Processing core having shared front end unit"
which comes from
"The processor 100 may include a plurality of processor cores 102 and a front end 104 shared by the processor cores 102."
-> Processing cores having shared front end unit
vs
-> Processing core having shared front end unit.
//It should be noted that FIG. 1 is provided as an example, not as a limitation, and even though it is depicted that the processor 100 includes two processor cores, the embodiments disclosed herein are applicable to a processor with any number of cores or a system with multiple processors with single or multiple cores. Nope, it is the scheduler. Of which there are two in a Bulldozer module.
computersciencewiki.org/index.php/Control_unit_(CU)
- The control unit obtains data / instructions from memory
- Interprets / decodes the instructions into commands / signals
- Controls transfer of instructions and data in the CPU
- Coordinates the parts of the CPU
All of the above is handled by the cores scheduler.
Scheduler fetches macro-ops which are decoded into micro-ops.
The cores front-end is a co-processor for implementing various performance enhancing features. It can be swapped out for any other front-end design. Whether, if it is slower or faster even if it is smaller or bigger.
The problem with counting the schedulers is that scheduler placement is an implementation detail and this has been implemented in multiple ways. Not accounting for the size or exact purpose, Bulldozer module has three schedulers, Zen has 7-8, Skylake has one.
Edit:
I missed this part of your post: It is not handled by the scheduler.
1 - Fetch
2 - Decoder
The article does not elaborate but it seems 3 and 4 go a bit beyond the traditional CPU definition and goes for multi-core and other controllers.
3 - probably a combination of dispatch and schedulers
4 - This is likely at PSP/ME level.
I mean, I get why the entire frontend can be considered a scheduler, this as a broad term is what frontend does. The fact that Operating System has its own scheduler that deals with work distribution to cores-threads does not help. But as a technical term in CPU, Scheduler is a specific function in Execution Unit.
Zen only has a single scheduler for integer. Of which the reservation stations of are statically divided between each execution unit. These create the scheduler queues, which are only a part of the scheduler.
en.wikipedia.org/wiki/Execution_unit
web.archive.org/web/20131231145405/http://people.cs.umass.edu/~weems/CmpSci535/Discussion10.html
The link looks like part of a CPU architecture course of some kind, meaning a simplified example.
The fact is, that whatever Intel has done does not justify AMD committing offences, too. Especially so when the issues aren't even related.
Therefore your argument is invalid and you're coming off as an AMD apologist.
At first it was entertaining, now it's just sad to read through the same things getting said over and over and over again by different people.
Side 1 (for AMD): My dad can beat up your dad!
Side 2 (against AMD): Na-ah! My dad can beat up your dad because your dad is slower!
Side 1: Your dad is gimped so my dad will beat up your dad!
Side 2: Your dad has glasses so he can't see well enough to defend himself. My dad will beat up your dad!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
Side 1: Na-uh!
Side 2: Ya-hu!
....
Maybe
It always does.
Side 1 here are facts signed off by people that are considered leaders and experts in the industry.
Side 2 nope.
Side 1 additional information further proving what was backed up by subject matter experts
Side 2 see previous lack of argument.
What Dickey and Parmer are actually arguing is that Bulldozer/Piledriver (the FX-9590 said:
Then it skips a lot and goes to the paragraph "A Bulldozer CPU core was different..." et cetera and the paragraph after that shows up fine. The problem occurs both in the AdBlock browser I use and in Safari. So, here is the missing quoted text, for those who couldn't see it due to this bug:
Hruska:
What Dickey and Parmer are actually arguing is that Bulldozer/Piledriver (the FX-9590, specifically) did not deliver the performance they expected from an eight-core CPU relative to Intel CPUs. They argue that the shared resources in the Bulldozer core prevented the chip from “simultaneously multi-tasking” and that because resources were shared between the CPU cores, that Bulldozer “functionally only have four cores.” Both of these claims are factually wrong.
Bulldozer did not support SMT, which allows a CPU to execute more than one thread simultaneously. The fact that performance scales upwards in integer and FPU workloads on a BD/PD processor when moving from four threads to eight is proof that the CPU is not limited to a functional four-core arrangement. As these results from OpenBenchmarking.org show, BD performance improves above the four-thread mark, even in FPU workloads. Integer workloads also show improvements in scaling from four threads to eight. While the absolute degree of scaling may be less, Bulldozer is not a functional quad-core CPU as a matter of defined core count. The fact that its overall performance may have been equivalent to an Intel quad-core has nothing to do with whether the CPU factually had the advertised number of cores.
Even the 2384 does well, because it's got 4 fully independent cores.
AMD FX-8350 8-core* * Fetcher, Core Interface Unit, and FPU are shared so performance will be less than advertised when a blocking scenario is encountered.
FPU is essential to all consumer products. Without the FPU, performance would suffer so much that simple tasks such as web browsing (JPEG images especially rely on FPU for rendering) would be impossible without lengthy delays. No consumer processor made in the last two decades lacked an integrated FPU. You either have to go all the way back to when the FPU was a *new* or leave the consumer space to look at what are effectively ASIC processors like UltraSPARC which are designed specifically to handle database processing. Neither case are relevant to the Dickey and Parmer lawsuit. Appears to describe Hyperthreading to me. The fetcher accepts two threads per core. Exactly why Dickey and Parmer filed suit.
Thread agnostic means that the execution units aren't assigned to one thread or the other, they can flip based on demand. That's also why the FPU has a "Frontend" (manages the SMT).
Have we really gotten to a point when we just don't care anymore ? Plural/singular matters and if the plaintiffs have any hope of winning this they better keep that in mind, we can't just keep mangling these terms forever until we get where we want. It doesn't work like that.
Thread B can end up on FMAC 1 and/or 2
They are not explicitly assigned; they are thread agnostic. FP scheduler/frontend determines what thread ends up where.