Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#126
cdawall
where the hell are my stars
Shambles1980and less than 2 cores would have, point is mute.

no one complained for them bing modules.. the issue is they call them 8 cores when they are demonstrably not.
Except for the time they showed identical if not better scaling than any other 8 core product on the market when placed into a multithreaded environment.
AquinusYou're the only person I quoted, bub. I'm not @cdawall. Also IEEE has members from just about every major hardware vendor.
With 420,000 members they have more than one :roll:
Shambles19801st line of the quote
"Just adding traditional cores isn’t going to be enough, says AMD’s Moore. "

which Also right there says they arent cores. AMD said they arent cores right there in that stupid thing you just quoted.

oh and the paper is for a "module" not a "core"

All the evidence you bring just contradicts what you say.. and yet you still say it!
I take it you actually read neither of the articles I linked. The second article speaks about the natural evolution of multicore designs as was seen in 2010. It brings up GPU cores, which apparently aren't cores, it brings up CELL cores, which apparently aren't all real cores and mentions how this is how the evolution of things is going. The article did falsely say Bulldozer would be great (paraphrasing), but a lot of what they said is absolutely holding true. We have mixed cores in so many different devices and Bulldozers design absolutely is a part of that.

Here is another quote, since you didn't bother to read the first article linked which was written by no less than 8 PhD holding people from various companies including AMD, HP, HAL, etc.
The module includes two independent integer cores but shares the fetch, decode, floating-point, and L2 cache units to maximize single-threaded performance and multi-threaded throughput while significantly improving power and area efficiency compared to fully replicated CPU cores.
I want you to read that out loud to yourself. This is merely from the abstract. I would link to the actual article sections, but seeing how the conversation is going I could link a McDonald's menu and you would argue about it not being real fast food or some nonsense.
Posted on Reply
#127
Aquinus
Resident Wat-man
Shambles1980Should this law suit continue?? Yes..
If for no other reason that for mfrs to just accept they cant just trick the uneducated.
Except it's not a trick. You literally got 8 shitty cores instead of 4 or 6 decent ones.
Posted on Reply
#128
Shambles1980
AquinusExcept it's not a trick. You literally got 8 shitty cores instead of 4 or 6 decent ones.
you got 4 decent cores and some other bits that could usually but not always do tasks in conjunction

@cdawall read that out loud and it says "integer cores" and "compared to fully replicated CPU cores "
like i said they can call them "integer, imaginary, lite " whatever they want as long as they define it..

But loe oand behold it didnt say "8 integer cores, not traditional" on the box.
Youd think theyd be yelling that from the roof tops if they were as good or better. Or that they wouldnt bother if they were trying to trick consumers.
Posted on Reply
#129
cdawall
where the hell are my stars
Shambles1980you got 4 decent cores and some other bits that could usually but not always do tasks in conjunction
Those same bits don't even exist in multiple histories of multiple CPU's that were standalone. Per the actual IEEE tech publication for bulldozer each module consists of 2 integer cores with some shared resources.
Posted on Reply
#130
Aquinus
Resident Wat-man
Shambles1980you got 4 decent cores and some other bits that could usually but not always do tasks in conjunction
The only SMT-esqe part about this is the FPU... and now we're going full circle.
Posted on Reply
#131
cdawall
where the hell are my stars
Shambles1980you got 4 decent cores and some other bits that could usually but not always do tasks in conjunction

@cdawall read that out loud and it says "integer cores" like i said they can call them "integer, imaginary, lite " whatever they want as long as they define it..

But loe oand behold it didnt say "8 integer cores, not traditional" on the box.
Youd think theyd be yelling that from the roof tops if they were as good or better. Or that they wouldnt bother if they were trying to trick consumers.
Traditional cores do not have an FPU. I honestly don't know why that is difficult to understand, but since you cannot get that through your thick skull I guess you win.
Posted on Reply
#132
Aquinus
Resident Wat-man
Shambles1980Or that they wouldnt bother if they were trying to trick consumers.
A typical consumer doesn't even know what a FPU is or what it does, man. Marketing has to be simple.
Posted on Reply
#133
cdawall
where the hell are my stars
Actually. I am fixing this for myself. Guy can't figure out that an integer calculation and floating point calculation are not the same thing. This is not worth my time. Hope you folks had a good read.

Posted on Reply
#134
Aquinus
Resident Wat-man
cdawallActually. I am fixing this for myself. Guy can't figure out that an integer calculation and floating point calculation are not the same thing. This is worth my time. Hope you folks had a good read.

You are a wise man. :respect:
Posted on Reply
#135
Shambles1980
cdawallTraditional cores do not have an FPU. I honestly don't know why that is difficult to understand, but since you cannot get that through your thick skull I guess you win.
ok tell me what part of this you dont understand..
Bulldozers were slow. because each module did not act like 2 cores in windows.
ms had to change the sceduler to eliviate the issue.
amd call the modules Integer cores and define catagorically that they are not actuall cores.
amd later abandon the moduels thing because its just worse than using reall cores.

BUT amd advertized bulldozer as having 8 reall cores.
people were upset and so started a law suit.

all the evidence you have presented shows catagorically that amd didnt think they were traditional cores. didnt call them traditional cores, defined them as diferent to traditional cores and they cut parts out to reduce power.
And yet advertized them as cores.
Posted on Reply
#136
FordGT90Concept
"I go fast!1!11!1!"
Vya DomusI could've very easily disabled 1 core from each "module" on my FX 6300 and guess what Windows would boot just fine and software ran as it should, floating point functionality still intact. How could that have worked if Piledriver didn't have independent cores ? What am I missing ? Can you still not see that your assertion that something like the 8350 didn't have 8 independent core is plain and simple wrong ?
Pretty sure you can't. Bulldozer and sons power control scope is limited to modules. A module can soft-shutdown an idle integer core to conserve power but that's not something software has any control over. An independent core can be completely powered off.
cdawallHow many clock cycles would it take for an BD module to process two single cycle cost integer math tasks.

How much is the multicore speedup for integer math tasks between a 4 module AMD is it around 3.xx or 7.xx? Now compare that to a traditional 4 core setup.

In these specific scenarios it is quite easy to see any and all argument to say that the AMD design act as any standard 8 core unit would. Just because single threaded performance was dreadful, doesn't have anything to do with it scaling linear across all 8 cores available assuming integer calculations.
It seems ghazi brushed on the point here:
ghaziCore count is a more accurate predictor of performance than thread count, and also is a term that common people are familiar with. Even in the case of Bulldozer, the 8-core chips scaled around ~6.7x in multithreaded workloads -- closer to 8 than 4, unlike quad-cores with SMT that don't do much better than 5x.
16% slower than an independent 8-core because of shared components. If I had a Bulldozer, I'd want my 16% back that I was promised. On the flipside, SMT processors promise you 100% but you're getting 125%. That's a bargain, not theft. AMD could have marketed 4-module processors as having 34% better SMT performance than Intel's 4-core processors but, no, they didn't do the smart and honest thing. Sad.
Posted on Reply
#137
Darmok N Jalad
Personally, I think this is going to be hard to prove. AMD can argue that each integer core was fully independent of the other, despite being part of the same module. Each had its own integer scheduler, register file and 16KB L1 data cache. Yes, they shared an FPU core, but that FPU was capable of handling 2 threads, and both integer cores could access both threads. It was certainly a unique design, but I think the only thing they could prove is that it was a bad design, but we don’t exactly need the court of law to prove that one.

Heck, Atom was launched as an in-order execution CPU, something we hadn’t seen from Intel since before Pentium Pro. For best performance on a Silvermonte, you needed to target an x86 architecture from before 1995.
Posted on Reply
#138
cdawall
where the hell are my stars
AquinusYou are a wise man. :respect:
Like you said the discussion became a circular argument. That is not worth the time of day. The documentation was provided and approved by IEEE in 2012. The processor can complete 8 simultaneous integer core problems per clock cycle and the design existed to try and reduce the foot print of a core.
FordGT90ConceptPretty sure you can't. Bulldozer and sons power control scope is limited to modules. A module can soft-shutdown an idle integer core to conserve power but that's not something software has any control over. An independent core can be completely powered off.
I still have an FX9370 and CHV. You absolutely can power off 1 core in a module. I could boot the chip 4 modules and 4 cores right now.
FordGT90ConceptIt seems ghazi brushed on the point here:

16% slower than an independent 8-core because of shared components. If I had a Bulldozer, I'd want my 16% back that I was promised.
This is a review for the 9700K and 9900K. I will use cinebench as a basis for this. Now mind you this is the absolutely latest Intel product.

www.techspot.com/review/1730-intel-core-i9-9900k-core-i7-9700k/

The 9700K is an 8 core 8 thread CPU. It scores 214 points for the single threaded CB test, it scores 1513 points for the multithreaded test. That is a 7.07x speed up. In 2012 AMD was able to pull off a 6.7x speed up in that same benchmark and you are going to sit there and tell me it only had 4 cores?
Posted on Reply
#139
FordGT90Concept
"I go fast!1!11!1!"
cdawallI still have an FX9370 and CHV. You absolutely can power off 1 core in a module. I could boot the chip 4 modules and 4 cores right now.
Fantastic! Still waiting on numbers on this thread:
www.techpowerup.com/forums/threads/amd-dragged-to-court-over-core-count-on-bulldozer.217327/page-21#post-3535907

Modules use less power fully powered than two independent cores fully powered; however, one independent core will use less power than a semi-powered down module.
cdawallThe 9700K is an 8 core 8 thread CPU. It scores 214 points for the single threaded CB test, it scores 1513 points for the multithreaded test. That is a 7.07x speed up. In 2012 AMD was able to pull off a 6.7x speed up in that same benchmark and you are going to sit there and tell me it only had 4 cores?
It's ironic you mention the 9700K...a deliberately nerfed processor compared to one that isn't (other than having an unconventional design). 9900K is 9.48x using 8 independent cores versus allegedly "6.7x" using 8 shared cores. That's on the order of 41% improvement instead of 16% loss. There's a reason why AMD dropped modules like it's hot and went SMT too.
Posted on Reply
#140
ghazi
FordGT90ConceptIt's ironic you mention the 9700K...a deliberately nerfed processor compared to one that isn't (other than having an unconventional design). 9900K is 9.48x using 8 independent cores versus allegedly "6.7x" using 8 shared cores. That's on the order of 41% improvement instead of 16% loss. There's a reason why AMD dropped modules like it's hot and went SMT too.
To the contrary, the 9900K gets a ~19% improvement from its 8 virtual threads. If the FX were a 4-core, 8-thread CPU, its "virtual" (hardware) threads would give it a 68% improvement. That's more in-line with the performance uplift from fully independent cores than that of SMT. Let's also remember that fully independent cores don't scale totally perfectly either.
Posted on Reply
#141
cdawall
where the hell are my stars
FordGT90ConceptFantastic! Still waiting on numbers on this thread:
www.techpowerup.com/forums/threads/amd-dragged-to-court-over-core-count-on-bulldozer.217327/page-21#post-3535907

More like 2 modules and 4 integer cores. The modules are going to use more power in a semi-powered down state than independent cores in a full power down state simply because a lot more transistors aren't being used.
I am curious how it will do. If I have time this weekend I will see if I can get it up and running again. I actually have been wanting to turn it into an XP box for some older games for a while. Have a pair of 7950's it is going to get stuffed into it.
FordGT90ConceptI'd rather trust my own program in the link above than Cinebench. I know my program is extremely asynchronous, relies on ALU performance over FPU, and performance patterns fall exactly inline with expectations.

It's ironic you mention the 9700K...a deliberately nerfed processor compared to one that isn't (other than having an unconventional design). 9900K is 9.48x using 8 independent cores versus allegedly "6.7x" using 8 shared cores. That's on the order of 41% improvement instead of 16% loss. There's a reason why AMD dropped modules like it's hot and went SMT too.
I specifically picked the 9700K, because I thought we were comparing apples to apples. That is an 8 core 8 thread CPU compared to an 8 core 8 thread CPU. If the argument is that they aren't cores, then that is absolutely A-OK, we can compare a 7700K for the 4 core 8 thread scaling vs 4 module 8 threads.

techreport.com/review/31179/intel-core-i7-7700k-kaby-lake-cpu-reviewed/13

7700K pulls off 197 single threaded and 998 multithreaded for a 5.06x speed up. Again those 8 "shared" cores as you called them did 6.7x I would say if we were to purely compare HT vs CMT this particular application is showing substantial gains to CMT, almost at the same level as traditional cores.

This trend actually got better with more cores added. The quad core dual module FX based stuff did not do as well per core, still heftily beat the intel HT offerings, but was not nearly as good as cores.

So my personal A10-7800 ran 91 single and 308 multi 3.38x speed up (4/4)
The G4560 just a couple notches up ran 142 single and 352 multi 2.47x speed up (2/4)
Another random i3 6100 193 single and 491 multi 2.54 speed up (2/4)
and the 4/4 4690K 171 single and 646 multi 3.77 speed up (4/4)
another 6600K 193 single and 729 multi 3.77 speed up (4/4)

these are just yanked off of the CB thread

www.techpowerup.com/forums/threads/post-your-cinebench-score.213237/
Posted on Reply
#142
FordGT90Concept
"I go fast!1!11!1!"
ghaziTo the contrary, the 9900K gets a ~19% improvement from its 8 virtual threads. If the FX were a 4-core, 8-thread CPU, its "virtual" (hardware) threads would give it a 68% improvement. That's more in-line with the performance uplift from fully independent cores than that of SMT. Let's also remember that fully independent cores don't scale totally perfectly either.
Hyperthreading and Zen: 19% improvement from juggling two threads per core across 8 cores. When one thread hits a blocking state, it switches context to the other thread to maximize the usage of hardware resources.

Bulldozer, Piledriver, and Steamroller: 67% improvement from integrating two integer cores. They should only theoretically block when either is faced with a major FPU instruction; however, there's no switching available to keep the integer cores fully tasked. It gets more performance improvement because there's more transistors behind it but it cannot exceed 100% because it lacks integer core SMT.
cdawallI specifically picked the 9700K, because I thought we were comparing apples to apples. That is an 8 core 8 thread CPU compared to an 8 core 8 thread CPU. If the argument is that they aren't cores, then that is absolutely A-OK, we can compare a 7700K for the 4 core 8 thread scaling vs 4 module 8 threads.

techreport.com/review/31179/intel-core-i7-7700k-kaby-lake-cpu-reviewed/13

7700K pulls off 197 single threaded and 998 multithreaded for a 5.06x speed up. Again those 8 "shared" cores as you called them did 6.7x I would say if we were to purely compare HT vs CMT this particular application is showing substantial gains to CMT, almost at the same level as traditional cores.

This trend actually got better with more cores added. The quad core dual module FX based stuff did not do as well per core, still heftily beat the intel HT offerings, but was not nearly as good as cores.

So my personal A10-7800 ran 91 single and 308 multi 3.38x speed up (4/4)
The G4560 just a couple notches up ran 142 single and 352 multi 2.47x speed up (2/4)
Another random i3 6100 193 single and 491 multi 2.54 speed up (2/4)
and the 4/4 4690K 171 single and 646 multi 3.77 speed up (4/4)
another 6600K 193 single and 729 multi 3.77 speed up (4/4)

these are just yanked off of the CB thread

www.techpowerup.com/forums/threads/post-your-cinebench-score.213237/
My problem with all of this is I'm not sure how Cinebench even works. Is it ALU heavy, FPU heavy, or a mixture of both? Is it synchronous multithreading or asynchronous? From what I gather, it's a rendering benchmark which is FPU heavy. Assuming that, it's good to see that Bulldozer's FPU can manage 83.75% but from your own numbers, you can clearly see that there's a significant difference between where Bulldozer performs compared to independent cores (e.g. 9700K at 88.375%), especially when considering that Bulldozer is getting 100% of possible threads, architecturally, compared to 9700K's 50% of possible threads architecturally. Your figure of 7700K demonstrates that: 126.5% performance out of four independent cores. versus 83.75% out of eight integer cores or 167.5% out of four modules.

That's what I don't get: AMD could have owned the module argument. 167.5% per module is more attractive than 83.75% per "core." They stabbed themselves in the back by calling them "cores" because it just doesn't stand up to the 120%+ that Hyper-Threading can do. This is looking at it from the perspective of a customer comparing an "8-core" Intel/Zen to an "8-core" Bulldozer/Piledriver/Steam Roller.
Posted on Reply
#143
NC37
Easy case to win. Just look at benchmarks. It beat i7s back then in multithreading. Pretty clear when you got into the heavy workloads that it wasn't a quad core. Physical cores always are better than SMT. Sure it sucked big time in single thread and everything else, but it definitely was an *8 core.
Posted on Reply
#144
cdawall
where the hell are my stars
FordGT90ConceptMy problem with all of this is I'm not sure how Cinebench even works. Is it ALU heavy, FPU heavy, or a mixture of both? Is it synchronous multithreading or asynchronous? From what I gather, it's a rendering benchmark which is FPU heavy. Assuming that, it's good to see that Bulldozer's FPU can manage 83.75% but from your own numbers, you can clearly see that there's a significant difference between where Bulldozer performs compared to independent cores (e.g. 9700K at 88.375%), especially when considering that Bulldozer is getting 100% of possible threads, architecturally, compared to 9700K's 50% of possible threads architecturally. Your figure of 7700K demonstrates that: 126.5% performance out of four independent cores. versus 83.75% out of eight integer cores or 167.5% out of four modules.

That's what I don't get: AMD could have owned the module argument. 167.5% per module is more attractive than 83.75% per "core." They stabbed themselves in the back by calling them "cores" because it just doesn't stand up to the 120%+ that Hyper-Threading can do. This is looking at it from the perspective of a customer comparing an "8-core" Intel/Zen to an "8-core" Bulldozer/Piledriver/Steam Roller.
So you are calling a 5% difference between the 83% BD core for core and 88% coffee lake significantly more important why?

In the time frame from 2012 to to 2019 Intel was able to offer 5% better multithreading efficiency comparing core for core in what is considered a heavy workload. You are correct I don't know if it is alu or fpu heavy, but it performs very well for efficiency on both sides of the map.

Either way you cut this up either in 2012 they had nearly equaled Intel 2019 multithreading ability or in 2012 CMT so vastly outperformed both amds in replacement SMT and Intels HT it isn't even funny. Either way you chalk that up you are saying the chip performed admirably in this specific scenario. Now mind you I do get what you are saying with the 7700k holding a 126% per core efficiency, but it's per thread would be worse than bulldozer. That would be what that speed up shows. You can mix those numbers however you want, but the root of it doesn't change. Intels own 7700k when compared to a 9700k showed the same thing. 126% vs 88% when compared the same way. So why is it ok for Intels efficiency, but not ok for amd again you are comparing a 2012 product to 2018/2019 right now as well.
Posted on Reply
#145
FordGT90Concept
"I go fast!1!11!1!"
NC37Easy case to win. Just look at benchmarks. It beat i7s back then in multithreading.
Because they were going against 4 cores. AMD is still doing the same thing today but with 8 cores instead of 4 modules.
NC37Pretty clear when you got into the heavy workloads that it wasn't a quad core.
But it's also clear it isn't an 8 core either. AMD made a mistake not marketing them as 4 modules or 8 integer cores and they're liable to pay for it now.
NC37Physical cores always are better than SMT.
No doubt but SMT increases efficiency of physical cores. That's why almost all modern CISC architectures do it.
cdawallSo you are calling a 5% difference between the 83% BD core for core and 88% coffee lake significantly more important why?
Because you're under tasking the Coffee Lake architecture. 100% load both, you're looking at 125% versus 83%. That's the reason why Bulldozer/Piledriver/Steamroller didn't take mainframe marketshare by storm but Zen is.
cdawallEither way you cut this up either in 2012 they had nearly equaled Intel 2019 multithreading ability or in 2012 CMT so vastly outperformed both amds in replacement SMT and Intels HT it isn't even funny. Either way you chalk that up you are saying the chip performed admirably in this specific scenario. Now mind you I do get what you are saying with the 7700k holding a 126% per core efficiency, but it's per thread would be worse than bulldozer. That would be what that speed up shows. You can mix those numbers however you want, but the root of it doesn't change. Intels own 7700k when compared to a 9700k showed the same thing. 126% vs 88% when compared the same way. So why is it ok for Intels efficiency, but not ok for amd again you are comparing a 2012 product to 2018/2019 right now as well.
I'm not saying and never did say that CMT was a bad architecture. AMD just went about describing it poorly to the public. Think the Seagate lawsuit about the definition of "GB." That's what this is fundamentally about but it's "core" instead.


Edit: Circling back to Cinebench, the fact 9700K is 12% loss in scaling, I'd say the multithreading code is either synchronous or has a lot of blocking scenarios. Async code with little cross talk between threads should get damn close to 100%. The fact SMT in the same test gives what is effectively a 37% uplift in performance proves it is not a good multithreading benchmark.
Posted on Reply
#146
Athlonite
I can remember years ago when first getting into PC's some motherboards for the 286 CPU had 2 sockets on them 1 for the x86 integer CPU and 1 for the x87 FPU side of things did it work without the x87 FPU yes was it slower without it yes but only in FPU intensive tasks ... So as far as I'm concerned any CPU that has 2 x86 compute units is a dual core CPU

1x Module = 2x Integer CPU cores + 1 FPU core
4x Modules = 8 Integer CPU cores + 4 FPU cores

so technically an 8 core CPU if all you want to do is x86 integer operations
Posted on Reply
#147
Patriot
Midland Dog8 alu from memory, for integer ops its an 8 core for floating point its a 4 core, simples, i personally would have called it an 8 thread cpu, not and 8 core. at the same time its up to the customer to do some research, as a quad core its decent perf but for an 8 core its kind of pathetic
Negative, for floating point it is an 8 core... it had a double wide floating point unit (AVX) that could operate 2x 128bit fmac at a time or 1 double wide. The problem was the design just didn't work as designed and the shared scheduler hamstrung it. With scheduler changes in windows it greatly improved and did fine on many multithreaded applications. Just because it had a poor design and architectural bottleneck doesn't mean it isn't an 8 core.

" C-Ray, a simple raytracer designed to test the floating-point CPU performance "



i7 990x 6c/12t got 6x improvement.
fx8150 8c/8t got also 6x improvement. No one is arguing its a failed architecture, but the lawsuit is meritless... there are in fact 8 cores both int and fp.

bulldozer lost significant IPC from Magnycours or thuban. on the server side replacing 12c magny with 16c bulldozer yielded the same performance at the same clock.
Bulldozer was on a newer node and used less power and scaled to higher clocks. I kept the cinebench crown with 48 Magnycours cores till bricktown (Intel 4p ivy) came out. (60c/120t), 3.8ghz 48c magny beat off 64c 4.2ghz interlagos (bulldozer take 2)... then it was gobstomped by bricktown lol.

That said FP was not half but more like 75% efficient, it was painfully bottlenecked. you can be mad that it was a shit architecture, but you cannot claim the cores weren't there... they clearly were.
Don't mean to be rude, it is just a greatly misunderstood architecture, it went backwards from magny...and then made 5-10% gains per refresh as intel was making 20% ipc uplifts.
Posted on Reply
#148
londiste
seronxThe public and industry understands a "core" as these components;
1. A Control Bus/Control Logic.
2. An Instruction Bus.
3. An Address/Data Bus which is usually connected to a Load/Store Unit.
4. A datapath, this the ALU/AGU.

The Bulldozer module has;
2 Retire Queues -> Instruction Bus.
2 Schedulers(etc componentry) -> Control Logic.
2 clusters of 2 ALU/2 AGLUs -> Superscalar datapath
2 Address/Data buses which interconnect to a Load/Store unit. -> Address/Data Bus
Thus,
2 Cores.

The Bulldozer module by industrial and educational definition is two real/processing/physical cores.

Front-end of the module <-- not part of the core.
FPU in the module <-- not part of the core.
Shared L2 cache unit in the module <-- not part of the core.

The cores in the Bulldozer module are as independent as any fully replicated microprocessor.
Public and industry understands core as a unit that can take instructions from a set (in this case x86), execute them and get the compute results out. Front end is definitely part of the core. The gist of it is - you cannot take that integer core out and use it as a functional x86 CPU.

By the way, this lines up with how software treats a core as well as how a core logically should be treated. Blaming Microsoft here is shortsighted, they clearly went by AMDs suggestions in showing Bulldozers as 8 core, a decision which had to be changed later. Linux changed the OS level scheduling far quicker and with less arguments.
seronxSplit Bulldozer Module;
2x 2K L2 BTB
2x 256K L1 BTB
2x Branch Predictor
2x 32 KB L1i
2x 16B Fetcher/Prefetcher
2x IBB/Pick
2x 2-wide decode
2x 2 ALU/2AGU
2x 1 FMAC+1 FMMX(FMISC/FSTORE)
2x LSU
2x 16 KB L1d
2x 1 MB L2

You can easily split an Bulldozer module and get two functional cores. However, those two cores would utilize more space than the two-core module. Which in turn would provide less performance than the Bulldozer module.
No you can't split a Bulldozer module into two functional cores. In a Bulldozer module there is one Branch Predictor, one Fetcher, one Decode, One L2 cache etc.
Posted on Reply
#149
qubit
Overclocked quantum bit
eidairaman1They already learned hence Zen
I know, I was just making the point if they ever have thoughts of doing it again in the future.
Posted on Reply
#150
Flyordie
danbert2000The problem is that AMD made a design that was just hyperthreading with an extra integer unit thrown in. So technically, it is more capable than just having two execution states for one execution engine (containing an FPU and IU), since two threads can technically run at the same time, but if both threads need the FPU, then it is basically back to a multithreaded single core.

I think they may have a case in that early processors didn't even have an FPU, and were still processors. So it is technically an eight core integer CPU, and a four core floating point CPU. So you technically have eight cores that could run at once. It will depend on how much a modern processor design matters, as AMD didn't include any caveats in their marketing for the FX series to indicate that it wasn't always going to run 8 threads simultaneously. And in my personal opinion it was not correct to call them 8 core processors. They should have just called them 4 module processors, or made clear it was 8 integer cores and would perform at half speed for FP math. I wouldn't be surprised if AMD loses. I don't consider them 8 core processors since they aren't 8 core all the time.

Maybe the most damning thing is that AMD now sells 8 core processors that actually do have the FPU and IU per core, and 4 core processors with SMT that are 4 core/8 thread. In a way, that's admitting that the FX core terminology was a load of shit the whole time.
The code we ran was optimized with AMD's help. We were able to get the FPUs to operate as 2 in parallel. The problem is, most programmers don't wanna do the extra few steps or just use an Intel based compiler which views the FX series as a 4-core CPU and doesn't take advantage of any of its extra resources. The FPU in each Bulldozer module is technically 2x 64bit units but merged into using just 1 scheduler. You had to know how to code for Bulldozer to get the scheduler to run things that could utilize both.
Posted on Reply
Add your own comment
Dec 19th, 2024 00:23 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts