Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#276
Vya Domus
FordGT90Concept486DX is the equivalent of 20 lifetimes old in terms of technology. It's not relevant to processors that debuted in 2011.
But the FPUs which have been around since before that chip are relevant ? Ain't gonna float (no pun intended), you either ignore all of these decades of computing because it's all in the past or you don't. You can't pluck things out selectively.
Posted on Reply
#277
FordGT90Concept
"I go fast!1!11!1!"
lexluthermiesterAbsolutely it is as everything currently in use today owes it's heritage to that generation of CPU's, just like all modern ARM based RISC CPU's owe their existence to the early Acorn CPU's. Just because CPU design's have improved and evolved does not make the older iterations irrelevant.

However, are not required. An Integer Unit can do floating point the long way, which is the way floating point was done before FPU's were designed. A CPU is still a CPU with or without an FPU. Likewise a CPU core is still an individual core whether it has it's own FPU or shares one with another core.
486DX didn't do branch prediction like modern processors do. It is also missing a lot of instructions and completely devoid of multithreading capability. Additionally, 8087 was sold separately because of design limitations of technology at the time (power, heat, transistor density, etc.). They were merged into a unified architecture as soon as it was technically viable to do so. In other words, all of these references to processors that debuted in the 1980s are pathetic excuses to this debate.

Also, 486DX is the first processor that supported both x86 and x87 instructions. It's processors before that which had x87 coprocessors.
lexluthermiesterBy that logic, the Core2Quads and any other CPU that has two or more dies bridged together, and shares resources, do not qualify as a single CPU. They are dual CPU packages. So should we all sue Intel and AMD for that deception?
Core 2 Duo only shared L2 cache between cores:

Core 2 Quad had two of those packages on the same PCB. All four are independent, complete processors. The only cache that is necessary to the operation of a processor is L1.

Also, by AMD's definition, Core 2 Duo module would be a quad core because there's 4 ALUs. :p
lexluthermiesterAMD never said that. They called it an 8 core CPU, which by technical definition, it is.
Look at any other processor on the market and there's a equal number of fetchers to cores. Bulldozer is an exception, not the rule.


Oh look, and Athlon 64 X2 slide from AMD!

See what they did there? Called the whole independent processor package an "execution core." AMD didn't draw a little box in the box saying this little integer bit here is the "core." :roll:

There's an enormous amount of hypocrisy by AMD on display here. In fact, there's really no references to the arithmetic units of a processor being called a "core" outside of Bulldozer's design.

Even Intel calls the whole independent processor a core. Here's a slide from Haswell:


As you can see, the industry has a very clear understanding of what a core is. AMD twisted that understanding to give the appearance of an edge against competing products. That's "false advertising." How is it okay for AMD to redefining the word in a way that is misleading to consumers?


For the record, one can make a CPU that has no addressable ALUs, only FPUs. There's really nothing an FPU can't do that an ALU can do. It is just slower and requires more transistors.
DegenerateIntel i486SX
Funny story there: disabling the FPU in the i486SX was similar as disabling a core in, for example, Zen. The FPU by far used the most die space on the 486 so instead of pitching chips that had a defect in the FPU, they disconnected it and sold it as an i486SX. The x87 instruction set and IEEE 754 standard was still in its infancy at the time so not much software used it.


The newest architecture that comes to my mind is IA-64. It was created from scratch long after IEEE 754. Is ALU and FPUs intrinsic to its cores? Why yes, of course:


Mmm, MIPS is kind of an oddball mostly designed for network routing. Does it have an FPU? R16000 does:


Doesn't matter what architecture you look at, cores are clearly defined and they are not just the arithmetic calculators like AMD claims*.
* Only when on the subject of Bulldozer, Steamroller, and Excavator.

I'm not saying a processor needs an FPU because that's completely dependent on the scenario in which it will be used. I'm saying that AMD had one definition of the word "core," changed it for Bulldozer through Excavator, and then went back to their original definition for Zen. That bit in the middle deserves a slap on the wrist.
Posted on Reply
#278
Shambles1980
the issue is still what perception of a core was at the time..
And we started really with a pentium d. 64 x2, core 2, i5 and so on,
AMD significantly changed the formula but did not represent the differences in the advertising.

people like us who have an interest in this sort of stuff Made our own minds up about the processors. I decided they were not 8 cores. Some others agreed with amd.
But the issue is there genuinely is a difference. Even in all the "evidence" provided by the "they are 8 cores"people in this thread, amd admit they are not 8 traditional cores.
For the most part amd do not even refer to them as cores.
By AMD's own definition of a core from only the pentium D era they are not cores.
But amd advertised it as 8 cores to the masses who would have thought (it must be the same as a core 2, or a phenom, or other relevant multi core processor of the time, When it simply wasnt and still isnt.
Theres a reason why Phenoms and core 2 quads out performed or performed as well as a buldozer at the time, And theres a reason why i5's out perform them still to this day..

Im not going to say that its Only down to the layout of the die, because that simply isnt true. There were coner cutting and cost saving methods taken during manufacturing. Which did lead to potential performance losses which could ammount to low double digit performance drops compared to having designed some parts manually rather than via software.

But regardless of that the issue that the law suit is regarding remains.

people expected cores to mean the same thing as they did with phenoms and other similar cpus of the time. Sure most of them didn't know what that meant. and probably 80% of them dont even know how it is different.
But they are different and amd did not adequately advertise it as such..

You can say what you want..
But when people lose a law suit because they advertize 1billion bytes as a GB instead of 1.07billion, then "slight" differences do matter.

it also does not help that amd have changed back to traditional cores virtually admitting that the buldozer modules were infact worse than traditional cores.
Posted on Reply
#279
Patriot
Hate to call people out.... but ford man... you really need to stop trying to compare block diagrams of vastly different granularity and assuming they are comparable.
You are making many assumptions about how things work internally that... just aren't so.
You are also cherry picking the fuck out of the facts hoping everyone else wont notice.

Claim you can't compare old FPU but keep pulling modern AVX2 FPU to first gen AVX fpu and repeatedly ignoring just 1 gen back.
Please just stop.

FPU is the same per int as in Thuban, but by organizing a pair of int into modules along with a dual fpu with shared fetcher it enables a flexibility that Thuban did not have and enables AVX across 2 FPU units. As per the scaling performance given (better than sandybridge) it is clear there are 8 fpu units.
Configuration is indeed different to enable AVX gen 1 support.

At the time, the shared resources was the only way to enable 8 cores on 32nm...
The FPU of zen is vastly superior because there is die room to enable it... just like moving to 7nm enables another doubling of cores...

You keep trying to look at 1 detail and declaring AMD was out to get people when it was a solid solution at the time.
1 module, 2 cores. 4mod/8 cores. It had a unique structure, there was no smt involved, and it outperformed the intel solution on multithreading... but due to the deeeep pipeline the ipc was decreased and required higher clocks to be competitive.
Posted on Reply
#280
FordGT90Concept
"I go fast!1!11!1!"
PatriotAt the time, the shared resources was the only way to enable 8 cores on 32nm...
You do realize that 8087 was made a coprocessor because that was the only way they could accelerate floating point operations on the 3 μm process, right? Even then, yields were terrible which is why most computers didn't have them. For reference, 8087 had 45,000 transistors compared to 8086's 29,000.
Processor|Architecture|Structure|Transistors (billions)
Ryzen 1800X|Zen|8c/16t|4.8
FX-9590|Vishera|4m/8t|1.2
Phenom II X6 1100T BE|Thuban|6c/6t|0.9

...an 8 core Thuban would have ended up having about the same number of transistors. There's a lot of thread management overhead in doing what AMD did with Excavator that simply did not exist in Thuban.
Patriotthere was no smt involved
Floating points were calculated using SMT.
Patriotit outperformed the intel solution on multithreading
Only when you compare 1 AMD "module" with two threads compared to 1 Intel "core" with two threads. If you compare 1 Intel "core" to 1 AMD "integer core," Intel's solution is higher performing.
Patriot... but due to the deeeep pipeline the ipc was decreased and required higher clocks to be competitive.
AMD side graded to make their product more attractive to server and mainframe operators. It was designed to be cost effective in those use cases, not consumer use cases.
Posted on Reply
#281
Shambles1980
PatriotAt the time, the shared resources was the only way to enable 8 cores on 32nm...
The FPU of zen is vastly superior because there is die room to enable it... just like moving to 7nm enables another doubling of cores...
You would think theyd still be using it if its that good.
Posted on Reply
#282
lexluthermiester
FordGT90ConceptAMD twisted that understanding to give the appearance of an edge against competing products.
Incorrect. AMD didn't "twist" anything. They tried a new way of building a device that could execute code in an attempt to compete.

All the rest of your very fancy display amounts to flash/bang nitpicking. You did succeed in doing one thing though; you displayed for all to see that you understand that an execution unit does count as a full and complete core in and of it's own. It's good you're not an attorney as you would have effectively tanked your own case with that display and argument.
Posted on Reply
#283
FordGT90Concept
"I go fast!1!11!1!"
As previously discussed, an Execution Core does Fetch-Decode-Execute (everything required to turn inputs into outputs). Fetch is shared in Excavator. Fetch and Decode are shared in Bulldozer. The execution core is incomplete in Bulldozer unless you consider the execution core an entire module which is what the plaintiff is arguing in favor of.

ArbitraryAffection gave excellent proof of this earlier:


There's only one fetch, one L1 instruction cache, and one decoder. Omit those components from either integer core and you have transistors that just look pretty in a picture. You cannot cleave a bulldozer module in two and have two functional processors. You can do so with virtually every other multi-core architecture out there.

What you see in that picture is a core by textbook definition. It just happens to be able to process two threads simultaneously when circumstances are favorable to doing so.
Posted on Reply
#284
Vya Domus
FordGT90ConceptWhat you see in that picture is a core by textbook definition.
Elaborate please, point us to a couple of books or papers in which CPU cores are described as such.
Posted on Reply
#285
FordGT90Concept
"I go fast!1!11!1!"
The whole die shot pictured is a self-contained processor which fits the definition of a singular core. Have a source:

searchdatacenter.techtarget.com/definition/multi-core-processor
A core is synonymous with "CPU." Initial dual core processors were two CPUs sharing the same socket on the same bus, not unlike a dual socket, single CPU machine.

1800X can rightfully be called an 8 CPU machine on a single socket. FX-8350 can only be called a 4 CPU machine on a single socket. Because that gets awfully confusing, AMD, Intel, ARM, MIPS, etc. have taken to calling them "cores" instead so they can distinguish multi-socketed solutions from multi-CPUs on one socket solutions.
Posted on Reply
#286
lexluthermiester
Vya DomusElaborate please, point us to a couple of books or papers in which CPU cores are described as such.
Couldn't have said that better myself...
FordGT90ConceptThe whole die shot pictured is a self-contained processor which fits the definition of a singular core.
Your opinion, not supported by historical fact or any citations.
Posted on Reply
#287
Vya Domus
FordGT90ConceptThe whole die shot pictured is a self-contained processor which fits the definition of a singular core.
Book or paper, we had enough of looking at die shots and reading 2 paragraph explanations on the internet. You are vehemently claiming this is a textbook example of core. I am assuming you have stumbled across this very exact description and by that I mean cores that have multiple decode entries, multiple load/store, multiple ALUs/FPUs, etc as is the case with a Bulldozer module. Surely you can pull one example out for us from all the material you read. In everything that I have read however I see this :



"Central Processing Unit main components"

No mentioning of independent fetch/decode stages, no load/store units, no FPUs, nothing that you argue constitutes a "independent processor" aka core. All I see is either generic "instruction decoder" or "timing and control".

You seem to be extremely fixated on the idea that CPU cores have to be independent processor. Let's see, independent meaning it can operate on it's own and fulfill all the functionalities that it could previously do while inside it's multi-core arrangement, right ?


The only thing that I can think of that fits that description is something like this : www.pcper.com/reviews/Processors/Intel-Atom-330-Dual-core-Processor-Review.

In this case you can totally pluck one core/processor out of the assembly and you can use it completely on it's own. It's undoubtedly self contained and self sufficient.

Not even AMD's upcoming chiplets designs would count as being made out of independent processors because they rely on external logic, which they share, to operate . Why wont you understand that independent processors do not exist anymore in the context of modern CPUs, they share caches, memory controllers , interconnects which, in particular are absolutely critical to their functionality. Intel even has a word for it : Uncore and it usually occupies a considerable portion of the die. It's also the reason why whenever Intel/AMD wants a new chip with less cores they to have redesign the whole damn thing instead of just "cleaving it in two".
Posted on Reply
#288
FordGT90Concept
"I go fast!1!11!1!"
lexluthermiesterYour opinion, not supported by historical fact or any citations.
Have this one:
accel.cs.vt.edu/files/lecture2.pdf

Breaks multicore processors into homogeneous (copy-pasta) and heterogenous (CELL is named but think SoC in general where there's many components put together to make a flexible whole).

Page 11: Dies, Observations
Core replication obvious.
Page 14: Multicore-present:
Operating systems schedule processes out to the various cores in the same way they always have on traditional multiprocessor systems.
The fetcher is what the operating system sees. This is why Windows 8/10 report FX-8350 as a "4 core," "8 logical processors" on 1 socket.


Have another (sourced from Intel no less):
www.ecs.umass.edu/ece/andras/courses/ECE668/Mylectures/Introduction_to_Multi_Core.pdf
Page 17, diagram and explanation of what it means to transition from one core to two. Area is 2x. Heterogenous cores by definition are copy-pasta and include all parts required to process instructions.

Page 18, Intel highlights the two heterogenous cores on Conroe.

Page 24, processor resources:
-Caches
-General Purpose Registers
-Segment Registers & TLB
-FP registers, XMM registers
-System Flags
-Control and Data registers, Debug registers, MSRs
-Many more

Page 25, explains differences between CMP, SMP, Hyper Threading, and Software Threading. Particularly relevant:
"Chip Multi Processing, refers to multiple physical core engines that have unique resources."
Bulldozer's FP registers and XMM registers are not unique resources to the integer cores, they're unique resources to the module. Bulldozer doesn't fit under SMP because that requires sharing all resources.

Page 26, "Core Architecture (Prescott)" diagram includes everything from instruction TLB to L2 cache. This mirrors AMD's slide showing an Excavator "core" next to a Zen "core."

Page 27, "Core Architecture (Xeon - Dual Core)" diagram which tells the same story as Prescott. The diagram only includes a single core but on the left most side of it, they have a label depicting "Second core" which means mirror what you see here on the other side of the L2 cache. More confirmation that a "core" is wholistic (fetch-decode-execute), not just what AMD calls an "integer core."

Page 28, "Multi-core platform (Freescale: embedded)" diagram which depicts two clear "e500-mc cores" with "accelerators" and "connectivity" attached to it via "CoreNet fabric."

Page 29, "Multi-core platform (RMI-XLR: embedded)" diagram depicting 8 clear cores on a "Memory Distributed Interconnect."

Page 30, "Tilera - 64 core CPU" diagram depicting 64 interconnected processors.

Page 34, "Tiled Design & Mesh Network" depicts Intel's 80-core Polaris showing each "core" as a "compute element" + "router"

Page 39, "Multi-core: Design Challenges" says "replicating cores improves productivity."
Vya DomusBook or paper, we had enough of looking at die shots and reading 2 paragraph explanations on the internet. You are vehemently claiming this is a textbook example of core. I am assuming you have stumbled across this very exact description and by that I mean cores that have multiple decode entries, multiple load/store, multiple ALUs/FPUs, etc as is the case with a Bulldozer module. Surely you can pull one example out for us from all the material you read. In everything that I have read however I see this :

First, I'll fix the diagram so it's relevant to Bulldozer:

Then I'll point that #1 proves my point:
1. The next instruction to be executed, whose address is obtained from the PC, is fetched from the memory and stored in the IR.
I assume IR stands for "Instruction Register." On all Bulldozer processors, this is part of the Fetch block which is shared for both threads. As far as #1 is concerned, you're only looking at one CPU.
#2 continues to drive that point home when considering Bulldozer (not Steamroller/Excavator):
2. The instruction is decoded.
The Decode block is shared in Bulldozer so as far as this is concerned, there's only one CPU.
#3 is another task of Fetch block so rewind to what I said above. Two or three steps here dictate we're only dealing with one CPU. See how my tweaked diagram makes a whole lot of sense now?
Finally, step #4 and #5, we get to the sole components where the cycle deviates but only if the instruction doesn't include a floating point instruction otherwise it is back to shared which means #4 and #5 are part of the singular CPU.

TL;DR: At least 3 steps say Bulldozer is a single CPU and without those steps, those integer clusters know not what to do. Pretty clear case a module is a core.
Vya DomusAll I see is either generic "instruction decoder" or "timing and control".
Bulldozer shares "instruction decoder" (literally, I didn't modify the diagram at all) and "timing and control" via "Core Interface Unit" (Core IF in diagram):

This diagram has been posted at least twice now. I believe it was sourced from Tom's Hardware which is cited in the lawsuit.


And before you retort that Bulldozer can do two threads simultaneously, remember that it hits blocking scenarios more often than dual-core (or more) processors do as mouacyk pointed out:
mouacykThis is what baseline core scaling efficiency looks like with the data from openbenchmarking.org/result/1110227-AR-AMDSCAL0184:



Even the 2384 does well, because it's got 4 fully independent cores.
Bulldozer underperforms independent cores/processors in c-ray, compress-7zip, npb BT.A, npb FT.B, nbp LU.A, nbp UA.A, and clomp when comparing Opteron 2384 to FX-8150. For example, in 7-zip, FX-8150 only did 48% better where Opteron 2384 did 102% better. 7zip, as far as I can tell, is very ALU and cache intensive.
Posted on Reply
#289
londiste
lexluthermiesterIncorrect. AMD didn't "twist" anything. They tried a new way of building a device that could execute code in an attempt to compete.
What exactly was new in the way in how Bulldozer was built?
Vya DomusIn everything that I have read however I see this :



"Central Processing Unit main components"

No mentioning of independent fetch/decode stages, no load/store units, no FPUs, nothing that you argue constitutes a "independent processor" aka core. All I see is either generic "instruction decoder" or "timing and control".
Thank you for the textbook page: Just underneath the figure, in instruction cycle:
1. Fetch
2. Decode
Both of which are shared in Bulldozer.
These are part of the control unit on Figure 5.1.
Vya DomusYou seem to be extremely fixated on the idea that CPU cores have to be independent processor. Let's see, independent meaning it can operate on it's own and fulfill all the functionalities that it could previously do while inside it's multi-core arrangement, right ?


The only thing that I can think of that fits that description is something like this : www.pcper.com/reviews/Processors/Intel-Atom-330-Dual-core-Processor-Review.

In this case you can totally pluck one core/processor out of the assembly and you can use it completely on it's own. It's undoubtedly self contained and self sufficient.

Not even AMD's upcoming chiplets designs would count as being made out of independent processors because they rely on external logic, which they share, to operate . Why wont you understand that independent processors do not exist anymore in the context of modern CPUs, they share caches, memory controllers , interconnects which, in particular are absolutely critical to their functionality. Intel even has a word for it : Uncore and it usually occupies a considerable portion of the die. It's also the reason why whenever Intel/AMD wants a new chip with less cores they to have redesign the whole damn thing instead of just "cleaving it in two".
I am not sure why but you seem to have a skewed understanding of independent here. There is absolutely no need for a core to be a a separate chip. Independent core/CPU means it is able to perform its function - execute instructions - independently. No more, no less. Instructions are fetched from somewhere else and results are stored somewhere else - generally either the data bus or cache depending on how the wider system is built.
Posted on Reply
#290
FordGT90Concept
"I go fast!1!11!1!"
Thread TL;DR: the Bulldozer module is a different way to multi-thread but it does not represent a multi-core.


...roughly, anyway. Take those numbers times the number of cores and you'll get an approximation of multithreading scaling.
Posted on Reply
#291
Vya Domus
FordGT90ConceptFirst, I'll fix the diagram
Well, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.

I'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
Posted on Reply
#292
lexluthermiester
FordGT90ConceptHave this one:
accel.cs.vt.edu/files/lecture2.pdf

Breaks multicore processors into homogeneous (copy-pasta) and heterogenous (CELL is named but think SoC in general where there's many components put together to make a flexible whole).

Page 11: Dies, Observations
Page 14: Multicore-present:
The fetcher is what the operating system sees. This is why Windows 8/10 report FX-8350 as a "4 core," "8 logical processors" on 1 socket.
Isn't interesting how you skipped over the comparisons involving the Itaniums and Terascale information? The Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 cores not qualified as individual cores? Or is that all just one CPU? Using you argument, that is a single CPU, with 80 sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrates and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explore the other citation as it also demonstrates a variety of methodologies to build a CPU.
Those citations do not help your position. They actually work against it.
Vya DomusWell, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.
Exactly correct.
Vya DomusI'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
And that is clearly being demonstrated.

Ford, you have lost this debate on merit and by providing citations against your position. Let it go. AMD is going to win this case.
Posted on Reply
#293
FordGT90Concept
"I go fast!1!11!1!"
Vya DomusWell, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.

I'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
How is the image I presented not representative of Bulldozer? I know the registers aren't right but that's because it's deliberately vague in that regard. The rest is spot on.
Posted on Reply
#294
lexluthermiester
FordGT90ConceptHow is the image I presented not representative of Bulldozer?
The original image was a concept of a basic CPU. Your alteration does not change the context. Just stop.
Posted on Reply
#295
FordGT90Concept
"I go fast!1!11!1!"
lexluthermiesterIsn't interesting how you skipped over the comparisons involving the Itaniums and Terascale information. The Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 core not individual core? Or is that all just one CPU? Using you argument, that is a single, with 80 different sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrate and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explorer the other citation as it also demonstrates a variety of methodologies to build a CPU.
Those citations do not help your position. They actually work against it.
The RIB in each tile/core is effectively the fetcher. Tera-Scale has more in common with a GPU than a CPU; nevertheless, it still has discreet cores that function the same fetch-decode-execute routine (just software scheduled instead of hardware scheduled)...

More info: www.anandtech.com/show/2170/3

I get the strong impression that Tera-Scale is entirely incapable of ALU work: everything exposed is floating point. That said, it was a prototype meant to reach 1 TFLOP of dynamic compute power and that's exactly what it did.
Posted on Reply
#296
Vya Domus
That was not for you to photoshop whatever you thought a bulldozer module would look like. It was to prove that the elements which you claim are mandatory for a CPU core to to be independent are never even stipulated as discreet distinguishable components in the overwhelmingly majority of descriptions out there of what a CPU contains.

Amazingly even the slides that you provided contradict some of your claims about shared resources :



"Functional units" aka execution units, which may contain there own separate logic as is the case with the FP scheduler in the Bulldozer module. And that was one of your main points on why Bullzoder wasn't an 8 core CPU. Try as you may, it seems you can never get away from these facts.
Posted on Reply
#297
FordGT90Concept
"I go fast!1!11!1!"
You cited the only example where "functional units" is used in that whole document. It's not expanded on anywhere what it meant.

Edit: Looking at the whole page, pretty sure he was referring to Hyper-threading so two threads sharing the same ALUs and FPUs. Hyper-threading impacts caches, tlb, and btb. The line directly below it also strongly suggests Hyper-threading (tradeoff being transistors spent on improving utilization in one core versus adding another core). Fits like a glove but again, just an educated guess.
lexluthermiesterThe original image was a concept of a basic CPU. Your alteration does not change the context. Just stop.
Bulldozer is anything but "basic." :roll:
Posted on Reply
#298
Vya Domus
Sure thing man, anyway just keep on selectively picking up information out of everything that you are presented with and ignore the rest. Argumentation on the internet 101.
Posted on Reply
#299
londiste
lexluthermiesterThe Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 cores not qualify as individual cores? Or is that all just one CPU? Using you argument, that is a single CPU, with 80 different sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrates and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explore the other citation as it also demonstrates a variety of methodologies to build a CPU.
Context is important.
- You are right, Tera-Scale can be looked at both ways - calling these 80 units cores can be argued as well as whether Tera-Scale is even a general purpose CPU. Intel Tera-Scale is a specialized application processor, both the intended application as well as architecture is much closer to GPU than a CPU. It is also a much simpler VLIW architecture with very simple instruction set and execution units (couple FPUs). This has more than a few similarities to AMD's similarly named VLIW GPU architecture TeraScale - HD2000-HD4000 series :)
- A simple CPU may only need instruction and data passed into it and control logic can be minimal or nonexistent, especially the decode part.
- Bulldozer on the other hand is an x86 CPU. This is effectively a RISC processor masquerading as CISC. Fetch and Decode have a large part to play in its operation.
Posted on Reply
#300
FordGT90Concept
"I go fast!1!11!1!"
Tera-Scale is kind of an oxymoron. In one way, it is extremely flexible (lots of fully programmable cores) but in another, it's terribly inflexible (software programmers have tell the processor how to do almost everything). I think it can be used as a general purpose CPU but it needs to be coupled with a tailor-made operating system that's unlike any operating system on the market today. It's kind of in-between a lot of ideas: part GPU (tiles remind me of CELL SPEs), part CPU (can branch far deeper than GPUs), part ASIC (whatever the architecture can do, it will do well).

Intel mostly designed Tera-Scale to test the idea of high-speed interconnects. It will never find its way into a commercial product most likely because they can't convince anyone to create and maintain the operating system for it.

For the record: Tera-Scale "tiles" are definitely cores just as Bulldozer "modules" are definitely cores. Fetch -> decode -> execute. Both do that, and so does every other core. If you try to call Tera-Scale's FPMACs "cores" like you call Bulldozer's integer cluster "cores," you end up with the same incomplete understanding of what a core must do. FPMACs and integer clusters are glorified calculators--not processors. They can tell you 1+1=2 but they can't tell you 2 is an index into an array of values and whether the referenced value is odd or even and if odd, is it prime? That takes a processor, not a calculator.
Posted on Reply
Add your own comment
Aug 24th, 2024 20:42 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts