Wednesday, January 23rd 2019
Bulldozer Core-Count Debate Comes Back to Haunt AMD
AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source:
The Register
Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.
The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD
Here is another quote, since you didn't bother to read the first article linked which was written by no less than 8 PhD holding people from various companies including AMD, HP, HAL, etc. I want you to read that out loud to yourself. This is merely from the abstract. I would link to the actual article sections, but seeing how the conversation is going I could link a McDonald's menu and you would argue about it not being real fast food or some nonsense.
@cdawall read that out loud and it says "integer cores" and "compared to fully replicated CPU cores "
like i said they can call them "integer, imaginary, lite " whatever they want as long as they define it..
But loe oand behold it didnt say "8 integer cores, not traditional" on the box.
Youd think theyd be yelling that from the roof tops if they were as good or better. Or that they wouldnt bother if they were trying to trick consumers.
Bulldozers were slow. because each module did not act like 2 cores in windows.
ms had to change the sceduler to eliviate the issue.
amd call the modules Integer cores and define catagorically that they are not actuall cores.
amd later abandon the moduels thing because its just worse than using reall cores.
BUT amd advertized bulldozer as having 8 reall cores.
people were upset and so started a law suit.
all the evidence you have presented shows catagorically that amd didnt think they were traditional cores. didnt call them traditional cores, defined them as diferent to traditional cores and they cut parts out to reduce power.
And yet advertized them as cores.
Heck, Atom was launched as an in-order execution CPU, something we hadn’t seen from Intel since before Pentium Pro. For best performance on a Silvermonte, you needed to target an x86 architecture from before 1995.
www.techspot.com/review/1730-intel-core-i9-9900k-core-i7-9700k/
The 9700K is an 8 core 8 thread CPU. It scores 214 points for the single threaded CB test, it scores 1513 points for the multithreaded test. That is a 7.07x speed up. In 2012 AMD was able to pull off a 6.7x speed up in that same benchmark and you are going to sit there and tell me it only had 4 cores?
www.techpowerup.com/forums/threads/amd-dragged-to-court-over-core-count-on-bulldozer.217327/page-21#post-3535907
Modules use less power fully powered than two independent cores fully powered; however, one independent core will use less power than a semi-powered down module. It's ironic you mention the 9700K...a deliberately nerfed processor compared to one that isn't (other than having an unconventional design). 9900K is 9.48x using 8 independent cores versus allegedly "6.7x" using 8 shared cores. That's on the order of 41% improvement instead of 16% loss. There's a reason why AMD dropped modules like it's hot and went SMT too.
techreport.com/review/31179/intel-core-i7-7700k-kaby-lake-cpu-reviewed/13
7700K pulls off 197 single threaded and 998 multithreaded for a 5.06x speed up. Again those 8 "shared" cores as you called them did 6.7x I would say if we were to purely compare HT vs CMT this particular application is showing substantial gains to CMT, almost at the same level as traditional cores.
This trend actually got better with more cores added. The quad core dual module FX based stuff did not do as well per core, still heftily beat the intel HT offerings, but was not nearly as good as cores.
So my personal A10-7800 ran 91 single and 308 multi 3.38x speed up (4/4)
The G4560 just a couple notches up ran 142 single and 352 multi 2.47x speed up (2/4)
Another random i3 6100 193 single and 491 multi 2.54 speed up (2/4)
and the 4/4 4690K 171 single and 646 multi 3.77 speed up (4/4)
another 6600K 193 single and 729 multi 3.77 speed up (4/4)
these are just yanked off of the CB thread
www.techpowerup.com/forums/threads/post-your-cinebench-score.213237/
Bulldozer, Piledriver, and Steamroller: 67% improvement from integrating two integer cores. They should only theoretically block when either is faced with a major FPU instruction; however, there's no switching available to keep the integer cores fully tasked. It gets more performance improvement because there's more transistors behind it but it cannot exceed 100% because it lacks integer core SMT. My problem with all of this is I'm not sure how Cinebench even works. Is it ALU heavy, FPU heavy, or a mixture of both? Is it synchronous multithreading or asynchronous? From what I gather, it's a rendering benchmark which is FPU heavy. Assuming that, it's good to see that Bulldozer's FPU can manage 83.75% but from your own numbers, you can clearly see that there's a significant difference between where Bulldozer performs compared to independent cores (e.g. 9700K at 88.375%), especially when considering that Bulldozer is getting 100% of possible threads, architecturally, compared to 9700K's 50% of possible threads architecturally. Your figure of 7700K demonstrates that: 126.5% performance out of four independent cores. versus 83.75% out of eight integer cores or 167.5% out of four modules.
That's what I don't get: AMD could have owned the module argument. 167.5% per module is more attractive than 83.75% per "core." They stabbed themselves in the back by calling them "cores" because it just doesn't stand up to the 120%+ that Hyper-Threading can do. This is looking at it from the perspective of a customer comparing an "8-core" Intel/Zen to an "8-core" Bulldozer/Piledriver/Steam Roller.
In the time frame from 2012 to to 2019 Intel was able to offer 5% better multithreading efficiency comparing core for core in what is considered a heavy workload. You are correct I don't know if it is alu or fpu heavy, but it performs very well for efficiency on both sides of the map.
Either way you cut this up either in 2012 they had nearly equaled Intel 2019 multithreading ability or in 2012 CMT so vastly outperformed both amds in replacement SMT and Intels HT it isn't even funny. Either way you chalk that up you are saying the chip performed admirably in this specific scenario. Now mind you I do get what you are saying with the 7700k holding a 126% per core efficiency, but it's per thread would be worse than bulldozer. That would be what that speed up shows. You can mix those numbers however you want, but the root of it doesn't change. Intels own 7700k when compared to a 9700k showed the same thing. 126% vs 88% when compared the same way. So why is it ok for Intels efficiency, but not ok for amd again you are comparing a 2012 product to 2018/2019 right now as well.
Edit: Circling back to Cinebench, the fact 9700K is 12% loss in scaling, I'd say the multithreading code is either synchronous or has a lot of blocking scenarios. Async code with little cross talk between threads should get damn close to 100%. The fact SMT in the same test gives what is effectively a 37% uplift in performance proves it is not a good multithreading benchmark.
1x Module = 2x Integer CPU cores + 1 FPU core
4x Modules = 8 Integer CPU cores + 4 FPU cores
so technically an 8 core CPU if all you want to do is x86 integer operations
" C-Ray, a simple raytracer designed to test the floating-point CPU performance "
i7 990x 6c/12t got 6x improvement.
fx8150 8c/8t got also 6x improvement. No one is arguing its a failed architecture, but the lawsuit is meritless... there are in fact 8 cores both int and fp.
bulldozer lost significant IPC from Magnycours or thuban. on the server side replacing 12c magny with 16c bulldozer yielded the same performance at the same clock.
Bulldozer was on a newer node and used less power and scaled to higher clocks. I kept the cinebench crown with 48 Magnycours cores till bricktown (Intel 4p ivy) came out. (60c/120t), 3.8ghz 48c magny beat off 64c 4.2ghz interlagos (bulldozer take 2)... then it was gobstomped by bricktown lol.
That said FP was not half but more like 75% efficient, it was painfully bottlenecked. you can be mad that it was a shit architecture, but you cannot claim the cores weren't there... they clearly were.
Don't mean to be rude, it is just a greatly misunderstood architecture, it went backwards from magny...and then made 5-10% gains per refresh as intel was making 20% ipc uplifts.
By the way, this lines up with how software treats a core as well as how a core logically should be treated. Blaming Microsoft here is shortsighted, they clearly went by AMDs suggestions in showing Bulldozers as 8 core, a decision which had to be changed later. Linux changed the OS level scheduling far quicker and with less arguments. No you can't split a Bulldozer module into two functional cores. In a Bulldozer module there is one Branch Predictor, one Fetcher, one Decode, One L2 cache etc.