• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Bulldozer Core-Count Debate Comes Back to Haunt AMD

Degenerate

New Member
Joined
Jan 27, 2019
Messages
1 (0.00/day)
I think they may have a case in that early processors didn't even have an FPU, and were still processors.

This. The FPU argument has no weight, because it's not an essential part of a processor.

Intel 8086
Intel 80286
Intel 80386
Intel i486SX
Motorola 68000
Motorola 68020
Motorola 68030

And many more were all "zero-core processors" if you follow this nonsense logic.
 
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
486DX is the equivalent of 20 lifetimes old in terms of technology. It's not relevant to processors that debuted in 2011.

But the FPUs which have been around since before that chip are relevant ? Ain't gonna float (no pun intended), you either ignore all of these decades of computing because it's all in the past or you don't. You can't pluck things out selectively.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Absolutely it is as everything currently in use today owes it's heritage to that generation of CPU's, just like all modern ARM based RISC CPU's owe their existence to the early Acorn CPU's. Just because CPU design's have improved and evolved does not make the older iterations irrelevant.

However, are not required. An Integer Unit can do floating point the long way, which is the way floating point was done before FPU's were designed. A CPU is still a CPU with or without an FPU. Likewise a CPU core is still an individual core whether it has it's own FPU or shares one with another core.
486DX didn't do branch prediction like modern processors do. It is also missing a lot of instructions and completely devoid of multithreading capability. Additionally, 8087 was sold separately because of design limitations of technology at the time (power, heat, transistor density, etc.). They were merged into a unified architecture as soon as it was technically viable to do so. In other words, all of these references to processors that debuted in the 1980s are pathetic excuses to this debate.

Also, 486DX is the first processor that supported both x86 and x87 instructions. It's processors before that which had x87 coprocessors.

By that logic, the Core2Quads and any other CPU that has two or more dies bridged together, and shares resources, do not qualify as a single CPU. They are dual CPU packages. So should we all sue Intel and AMD for that deception?
Core 2 Duo only shared L2 cache between cores:

Core 2 Quad had two of those packages on the same PCB. All four are independent, complete processors. The only cache that is necessary to the operation of a processor is L1.

Also, by AMD's definition, Core 2 Duo module would be a quad core because there's 4 ALUs. :p

AMD never said that. They called it an 8 core CPU, which by technical definition, it is.
Look at any other processor on the market and there's a equal number of fetchers to cores. Bulldozer is an exception, not the rule.


Oh look, and Athlon 64 X2 slide from AMD!

See what they did there? Called the whole independent processor package an "execution core." AMD didn't draw a little box in the box saying this little integer bit here is the "core." :roll:

There's an enormous amount of hypocrisy by AMD on display here. In fact, there's really no references to the arithmetic units of a processor being called a "core" outside of Bulldozer's design.

Even Intel calls the whole independent processor a core. Here's a slide from Haswell:


As you can see, the industry has a very clear understanding of what a core is. AMD twisted that understanding to give the appearance of an edge against competing products. That's "false advertising." How is it okay for AMD to redefining the word in a way that is misleading to consumers?


For the record, one can make a CPU that has no addressable ALUs, only FPUs. There's really nothing an FPU can't do that an ALU can do. It is just slower and requires more transistors.

Intel i486SX
Funny story there: disabling the FPU in the i486SX was similar as disabling a core in, for example, Zen. The FPU by far used the most die space on the 486 so instead of pitching chips that had a defect in the FPU, they disconnected it and sold it as an i486SX. The x87 instruction set and IEEE 754 standard was still in its infancy at the time so not much software used it.


The newest architecture that comes to my mind is IA-64. It was created from scratch long after IEEE 754. Is ALU and FPUs intrinsic to its cores? Why yes, of course:


Mmm, MIPS is kind of an oddball mostly designed for network routing. Does it have an FPU? R16000 does:


Doesn't matter what architecture you look at, cores are clearly defined and they are not just the arithmetic calculators like AMD claims*.
* Only when on the subject of Bulldozer, Steamroller, and Excavator.

I'm not saying a processor needs an FPU because that's completely dependent on the scenario in which it will be used. I'm saying that AMD had one definition of the word "core," changed it for Bulldozer through Excavator, and then went back to their original definition for Zen. That bit in the middle deserves a slap on the wrist.
 
Last edited:
Joined
May 3, 2014
Messages
965 (0.25/day)
System Name Sham Pc
Processor i5-2500k @ 4.33
Motherboard INTEL DZ77SL 50K
Cooling 2 bay res. "2L of fluid in loop" 1x480 2x360
Memory 16gb 4x4 kingstone 1600 hyper x fury black
Video Card(s) hfa2 gtx 780 @ 1306/1768 (xspc bloc)
Storage 1tb wd red 120gb kingston on the way os, 1.5Tb wd black, 3tb random WD rebrand
Display(s) cibox something or other 23" 1080p " 23 inch downstairs. 52 inch plasma downstairs 15" tft kitchen
Case 900D
Audio Device(s) on board
Power Supply xion gaming seriese 1000W (non modular) 80+ bronze
Software windows 10 pro x64
the issue is still what perception of a core was at the time..
And we started really with a pentium d. 64 x2, core 2, i5 and so on,
AMD significantly changed the formula but did not represent the differences in the advertising.

people like us who have an interest in this sort of stuff Made our own minds up about the processors. I decided they were not 8 cores. Some others agreed with amd.
But the issue is there genuinely is a difference. Even in all the "evidence" provided by the "they are 8 cores"people in this thread, amd admit they are not 8 traditional cores.
For the most part amd do not even refer to them as cores.
By AMD's own definition of a core from only the pentium D era they are not cores.
But amd advertised it as 8 cores to the masses who would have thought (it must be the same as a core 2, or a phenom, or other relevant multi core processor of the time, When it simply wasnt and still isnt.
Theres a reason why Phenoms and core 2 quads out performed or performed as well as a buldozer at the time, And theres a reason why i5's out perform them still to this day..

Im not going to say that its Only down to the layout of the die, because that simply isnt true. There were coner cutting and cost saving methods taken during manufacturing. Which did lead to potential performance losses which could ammount to low double digit performance drops compared to having designed some parts manually rather than via software.

But regardless of that the issue that the law suit is regarding remains.

people expected cores to mean the same thing as they did with phenoms and other similar cpus of the time. Sure most of them didn't know what that meant. and probably 80% of them dont even know how it is different.
But they are different and amd did not adequately advertise it as such..

You can say what you want..
But when people lose a law suit because they advertize 1billion bytes as a GB instead of 1.07billion, then "slight" differences do matter.

it also does not help that amd have changed back to traditional cores virtually admitting that the buldozer modules were infact worse than traditional cores.
 
Joined
Oct 27, 2009
Messages
1,190 (0.22/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
Hate to call people out.... but ford man... you really need to stop trying to compare block diagrams of vastly different granularity and assuming they are comparable.
You are making many assumptions about how things work internally that... just aren't so.
You are also cherry picking the fuck out of the facts hoping everyone else wont notice.

Claim you can't compare old FPU but keep pulling modern AVX2 FPU to first gen AVX fpu and repeatedly ignoring just 1 gen back.
Please just stop.

FPU is the same per int as in Thuban, but by organizing a pair of int into modules along with a dual fpu with shared fetcher it enables a flexibility that Thuban did not have and enables AVX across 2 FPU units. As per the scaling performance given (better than sandybridge) it is clear there are 8 fpu units.
Configuration is indeed different to enable AVX gen 1 support.

At the time, the shared resources was the only way to enable 8 cores on 32nm...
The FPU of zen is vastly superior because there is die room to enable it... just like moving to 7nm enables another doubling of cores...

You keep trying to look at 1 detail and declaring AMD was out to get people when it was a solid solution at the time.
1 module, 2 cores. 4mod/8 cores. It had a unique structure, there was no smt involved, and it outperformed the intel solution on multithreading... but due to the deeeep pipeline the ipc was decreased and required higher clocks to be competitive.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
At the time, the shared resources was the only way to enable 8 cores on 32nm...
You do realize that 8087 was made a coprocessor because that was the only way they could accelerate floating point operations on the 3 μm process, right? Even then, yields were terrible which is why most computers didn't have them. For reference, 8087 had 45,000 transistors compared to 8086's 29,000.
Processor|Architecture|Structure|Transistors (billions)
Ryzen 1800X|Zen|8c/16t|4.8
FX-9590|Vishera|4m/8t|1.2
Phenom II X6 1100T BE|Thuban|6c/6t|0.9
...an 8 core Thuban would have ended up having about the same number of transistors. There's a lot of thread management overhead in doing what AMD did with Excavator that simply did not exist in Thuban.

there was no smt involved
Floating points were calculated using SMT.

it outperformed the intel solution on multithreading
Only when you compare 1 AMD "module" with two threads compared to 1 Intel "core" with two threads. If you compare 1 Intel "core" to 1 AMD "integer core," Intel's solution is higher performing.

... but due to the deeeep pipeline the ipc was decreased and required higher clocks to be competitive.
AMD side graded to make their product more attractive to server and mainframe operators. It was designed to be cost effective in those use cases, not consumer use cases.
 
Last edited:
Joined
May 3, 2014
Messages
965 (0.25/day)
System Name Sham Pc
Processor i5-2500k @ 4.33
Motherboard INTEL DZ77SL 50K
Cooling 2 bay res. "2L of fluid in loop" 1x480 2x360
Memory 16gb 4x4 kingstone 1600 hyper x fury black
Video Card(s) hfa2 gtx 780 @ 1306/1768 (xspc bloc)
Storage 1tb wd red 120gb kingston on the way os, 1.5Tb wd black, 3tb random WD rebrand
Display(s) cibox something or other 23" 1080p " 23 inch downstairs. 52 inch plasma downstairs 15" tft kitchen
Case 900D
Audio Device(s) on board
Power Supply xion gaming seriese 1000W (non modular) 80+ bronze
Software windows 10 pro x64
At the time, the shared resources was the only way to enable 8 cores on 32nm...
The FPU of zen is vastly superior because there is die room to enable it... just like moving to 7nm enables another doubling of cores...

You would think theyd still be using it if its that good.
 
Joined
Jul 5, 2013
Messages
28,238 (6.74/day)
AMD twisted that understanding to give the appearance of an edge against competing products.
Incorrect. AMD didn't "twist" anything. They tried a new way of building a device that could execute code in an attempt to compete.

All the rest of your very fancy display amounts to flash/bang nitpicking. You did succeed in doing one thing though; you displayed for all to see that you understand that an execution unit does count as a full and complete core in and of it's own. It's good you're not an attorney as you would have effectively tanked your own case with that display and argument.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
As previously discussed, an Execution Core does Fetch-Decode-Execute (everything required to turn inputs into outputs). Fetch is shared in Excavator. Fetch and Decode are shared in Bulldozer. The execution core is incomplete in Bulldozer unless you consider the execution core an entire module which is what the plaintiff is arguing in favor of.

ArbitraryAffection gave excellent proof of this earlier:


There's only one fetch, one L1 instruction cache, and one decoder. Omit those components from either integer core and you have transistors that just look pretty in a picture. You cannot cleave a bulldozer module in two and have two functional processors. You can do so with virtually every other multi-core architecture out there.

What you see in that picture is a core by textbook definition. It just happens to be able to process two threads simultaneously when circumstances are favorable to doing so.
 
Last edited:
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
What you see in that picture is a core by textbook definition.

Elaborate please, point us to a couple of books or papers in which CPU cores are described as such.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
The whole die shot pictured is a self-contained processor which fits the definition of a singular core. Have a source:

https://searchdatacenter.techtarget.com/definition/multi-core-processor
A core is synonymous with "CPU." Initial dual core processors were two CPUs sharing the same socket on the same bus, not unlike a dual socket, single CPU machine.

1800X can rightfully be called an 8 CPU machine on a single socket. FX-8350 can only be called a 4 CPU machine on a single socket. Because that gets awfully confusing, AMD, Intel, ARM, MIPS, etc. have taken to calling them "cores" instead so they can distinguish multi-socketed solutions from multi-CPUs on one socket solutions.
 
Last edited:
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
The whole die shot pictured is a self-contained processor which fits the definition of a singular core.

Book or paper, we had enough of looking at die shots and reading 2 paragraph explanations on the internet. You are vehemently claiming this is a textbook example of core. I am assuming you have stumbled across this very exact description and by that I mean cores that have multiple decode entries, multiple load/store, multiple ALUs/FPUs, etc as is the case with a Bulldozer module. Surely you can pull one example out for us from all the material you read. In everything that I have read however I see this :

Untitled.png


"Central Processing Unit main components"

No mentioning of independent fetch/decode stages, no load/store units, no FPUs, nothing that you argue constitutes a "independent processor" aka core. All I see is either generic "instruction decoder" or "timing and control".

You seem to be extremely fixated on the idea that CPU cores have to be independent processor. Let's see, independent meaning it can operate on it's own and fulfill all the functionalities that it could previously do while inside it's multi-core arrangement, right ?


The only thing that I can think of that fits that description is something like this : https://www.pcper.com/reviews/Processors/Intel-Atom-330-Dual-core-Processor-Review.

In this case you can totally pluck one core/processor out of the assembly and you can use it completely on it's own. It's undoubtedly self contained and self sufficient.

Not even AMD's upcoming chiplets designs would count as being made out of independent processors because they rely on external logic, which they share, to operate . Why wont you understand that independent processors do not exist anymore in the context of modern CPUs, they share caches, memory controllers , interconnects which, in particular are absolutely critical to their functionality. Intel even has a word for it : Uncore and it usually occupies a considerable portion of the die. It's also the reason why whenever Intel/AMD wants a new chip with less cores they to have redesign the whole damn thing instead of just "cleaving it in two".
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Your opinion, not supported by historical fact or any citations.
Have this one:
http://accel.cs.vt.edu/files/lecture2.pdf

Breaks multicore processors into homogeneous (copy-pasta) and heterogenous (CELL is named but think SoC in general where there's many components put together to make a flexible whole).

Page 11: Dies, Observations
Core replication obvious.
Page 14: Multicore-present:
Operating systems schedule processes out to the various cores in the same way they always have on traditional multiprocessor systems.
The fetcher is what the operating system sees. This is why Windows 8/10 report FX-8350 as a "4 core," "8 logical processors" on 1 socket.


Have another (sourced from Intel no less):
http://www.ecs.umass.edu/ece/andras/courses/ECE668/Mylectures/Introduction_to_Multi_Core.pdf
Page 17, diagram and explanation of what it means to transition from one core to two. Area is 2x. Heterogenous cores by definition are copy-pasta and include all parts required to process instructions.

Page 18, Intel highlights the two heterogenous cores on Conroe.

Page 24, processor resources:
-Caches
-General Purpose Registers
-Segment Registers & TLB
-FP registers, XMM registers
-System Flags
-Control and Data registers, Debug registers, MSRs
-Many more

Page 25, explains differences between CMP, SMP, Hyper Threading, and Software Threading. Particularly relevant:
"Chip Multi Processing, refers to multiple physical core engines that have unique resources."
Bulldozer's FP registers and XMM registers are not unique resources to the integer cores, they're unique resources to the module. Bulldozer doesn't fit under SMP because that requires sharing all resources.

Page 26, "Core Architecture (Prescott)" diagram includes everything from instruction TLB to L2 cache. This mirrors AMD's slide showing an Excavator "core" next to a Zen "core."

Page 27, "Core Architecture (Xeon - Dual Core)" diagram which tells the same story as Prescott. The diagram only includes a single core but on the left most side of it, they have a label depicting "Second core" which means mirror what you see here on the other side of the L2 cache. More confirmation that a "core" is wholistic (fetch-decode-execute), not just what AMD calls an "integer core."

Page 28, "Multi-core platform (Freescale: embedded)" diagram which depicts two clear "e500-mc cores" with "accelerators" and "connectivity" attached to it via "CoreNet fabric."

Page 29, "Multi-core platform (RMI-XLR: embedded)" diagram depicting 8 clear cores on a "Memory Distributed Interconnect."

Page 30, "Tilera - 64 core CPU" diagram depicting 64 interconnected processors.

Page 34, "Tiled Design & Mesh Network" depicts Intel's 80-core Polaris showing each "core" as a "compute element" + "router"

Page 39, "Multi-core: Design Challenges" says "replicating cores improves productivity."

Book or paper, we had enough of looking at die shots and reading 2 paragraph explanations on the internet. You are vehemently claiming this is a textbook example of core. I am assuming you have stumbled across this very exact description and by that I mean cores that have multiple decode entries, multiple load/store, multiple ALUs/FPUs, etc as is the case with a Bulldozer module. Surely you can pull one example out for us from all the material you read. In everything that I have read however I see this :

View attachment 115241
First, I'll fix the diagram so it's relevant to Bulldozer:

Then I'll point that #1 proves my point:
1. The next instruction to be executed, whose address is obtained from the PC, is fetched from the memory and stored in the IR.
I assume IR stands for "Instruction Register." On all Bulldozer processors, this is part of the Fetch block which is shared for both threads. As far as #1 is concerned, you're only looking at one CPU.
#2 continues to drive that point home when considering Bulldozer (not Steamroller/Excavator):
2. The instruction is decoded.
The Decode block is shared in Bulldozer so as far as this is concerned, there's only one CPU.
#3 is another task of Fetch block so rewind to what I said above. Two or three steps here dictate we're only dealing with one CPU. See how my tweaked diagram makes a whole lot of sense now?
Finally, step #4 and #5, we get to the sole components where the cycle deviates but only if the instruction doesn't include a floating point instruction otherwise it is back to shared which means #4 and #5 are part of the singular CPU.

TL;DR: At least 3 steps say Bulldozer is a single CPU and without those steps, those integer clusters know not what to do. Pretty clear case a module is a core.

All I see is either generic "instruction decoder" or "timing and control".
Bulldozer shares "instruction decoder" (literally, I didn't modify the diagram at all) and "timing and control" via "Core Interface Unit" (Core IF in diagram):

This diagram has been posted at least twice now. I believe it was sourced from Tom's Hardware which is cited in the lawsuit.


And before you retort that Bulldozer can do two threads simultaneously, remember that it hits blocking scenarios more often than dual-core (or more) processors do as mouacyk pointed out:
This is what baseline core scaling efficiency looks like with the data from https://openbenchmarking.org/result/1110227-AR-AMDSCAL0184:

View attachment 115088

Even the 2384 does well, because it's got 4 fully independent cores.
Bulldozer underperforms independent cores/processors in c-ray, compress-7zip, npb BT.A, npb FT.B, nbp LU.A, nbp UA.A, and clomp when comparing Opteron 2384 to FX-8150. For example, in 7-zip, FX-8150 only did 48% better where Opteron 2384 did 102% better. 7zip, as far as I can tell, is very ALU and cache intensive.
 
Last edited:
Joined
Feb 3, 2017
Messages
3,818 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Incorrect. AMD didn't "twist" anything. They tried a new way of building a device that could execute code in an attempt to compete.
What exactly was new in the way in how Bulldozer was built?
In everything that I have read however I see this :

View attachment 115241

"Central Processing Unit main components"

No mentioning of independent fetch/decode stages, no load/store units, no FPUs, nothing that you argue constitutes a "independent processor" aka core. All I see is either generic "instruction decoder" or "timing and control".
Thank you for the textbook page: Just underneath the figure, in instruction cycle:
1. Fetch
2. Decode
Both of which are shared in Bulldozer.
These are part of the control unit on Figure 5.1.
You seem to be extremely fixated on the idea that CPU cores have to be independent processor. Let's see, independent meaning it can operate on it's own and fulfill all the functionalities that it could previously do while inside it's multi-core arrangement, right ?


The only thing that I can think of that fits that description is something like this : https://www.pcper.com/reviews/Processors/Intel-Atom-330-Dual-core-Processor-Review.

In this case you can totally pluck one core/processor out of the assembly and you can use it completely on it's own. It's undoubtedly self contained and self sufficient.

Not even AMD's upcoming chiplets designs would count as being made out of independent processors because they rely on external logic, which they share, to operate . Why wont you understand that independent processors do not exist anymore in the context of modern CPUs, they share caches, memory controllers , interconnects which, in particular are absolutely critical to their functionality. Intel even has a word for it : Uncore and it usually occupies a considerable portion of the die. It's also the reason why whenever Intel/AMD wants a new chip with less cores they to have redesign the whole damn thing instead of just "cleaving it in two".
I am not sure why but you seem to have a skewed understanding of independent here. There is absolutely no need for a core to be a a separate chip. Independent core/CPU means it is able to perform its function - execute instructions - independently. No more, no less. Instructions are fetched from somewhere else and results are stored somewhere else - generally either the data bus or cache depending on how the wider system is built.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Thread TL;DR: the Bulldozer module is a different way to multi-thread but it does not represent a multi-core.


...roughly, anyway. Take those numbers times the number of cores and you'll get an approximation of multithreading scaling.
 
Last edited:
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
First, I'll fix the diagram

Well, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.

I'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
 
Joined
Jul 5, 2013
Messages
28,238 (6.74/day)
Have this one:
http://accel.cs.vt.edu/files/lecture2.pdf

Breaks multicore processors into homogeneous (copy-pasta) and heterogenous (CELL is named but think SoC in general where there's many components put together to make a flexible whole).

Page 11: Dies, Observations
Page 14: Multicore-present:
The fetcher is what the operating system sees. This is why Windows 8/10 report FX-8350 as a "4 core," "8 logical processors" on 1 socket.
Isn't interesting how you skipped over the comparisons involving the Itaniums and Terascale information? The Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 cores not qualified as individual cores? Or is that all just one CPU? Using you argument, that is a single CPU, with 80 sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrates and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explore the other citation as it also demonstrates a variety of methodologies to build a CPU.
Those citations do not help your position. They actually work against it.
Well, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.
Exactly correct.
I'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
And that is clearly being demonstrated.

Ford, you have lost this debate on merit and by providing citations against your position. Let it go. AMD is going to win this case.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Well, don't. You just can't help yourself but change every bit of information just so it can fit with your narrative.

I'll have to accept that you will never be able to use correct information to prove your points and you'll always skew facts.
How is the image I presented not representative of Bulldozer? I know the registers aren't right but that's because it's deliberately vague in that regard. The rest is spot on.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Isn't interesting how you skipped over the comparisons involving the Itaniums and Terascale information. The Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 core not individual core? Or is that all just one CPU? Using you argument, that is a single, with 80 different sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrate and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explorer the other citation as it also demonstrates a variety of methodologies to build a CPU.
Those citations do not help your position. They actually work against it.
The RIB in each tile/core is effectively the fetcher. Tera-Scale has more in common with a GPU than a CPU; nevertheless, it still has discreet cores that function the same fetch-decode-execute routine (just software scheduled instead of hardware scheduled)...

More info: https://www.anandtech.com/show/2170/3

I get the strong impression that Tera-Scale is entirely incapable of ALU work: everything exposed is floating point. That said, it was a prototype meant to reach 1 TFLOP of dynamic compute power and that's exactly what it did.
 
Last edited:
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
That was not for you to photoshop whatever you thought a bulldozer module would look like. It was to prove that the elements which you claim are mandatory for a CPU core to to be independent are never even stipulated as discreet distinguishable components in the overwhelmingly majority of descriptions out there of what a CPU contains.

Amazingly even the slides that you provided contradict some of your claims about shared resources :

a.png


"Functional units" aka execution units, which may contain there own separate logic as is the case with the FP scheduler in the Bulldozer module. And that was one of your main points on why Bullzoder wasn't an 8 core CPU. Try as you may, it seems you can never get away from these facts.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
You cited the only example where "functional units" is used in that whole document. It's not expanded on anywhere what it meant.

Edit: Looking at the whole page, pretty sure he was referring to Hyper-threading so two threads sharing the same ALUs and FPUs. Hyper-threading impacts caches, tlb, and btb. The line directly below it also strongly suggests Hyper-threading (tradeoff being transistors spent on improving utilization in one core versus adding another core). Fits like a glove but again, just an educated guess.

The original image was a concept of a basic CPU. Your alteration does not change the context. Just stop.
Bulldozer is anything but "basic." :roll:
 
Last edited:
Joined
Jan 8, 2017
Messages
9,503 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Sure thing man, anyway just keep on selectively picking up information out of everything that you are presented with and ignore the rest. Argumentation on the internet 101.
 
Joined
Feb 3, 2017
Messages
3,818 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
The Terascale CPU shows very clearly that the integer execution units(80 of them) exist without anything other than an IO connection to a separate die with other additional functionality features that operate in addition to the main die. So are those 80 cores not qualify as individual cores? Or is that all just one CPU? Using you argument, that is a single CPU, with 80 different sub-cores. But that's not what Intel calls it. So should they sued? That citation does not help your argument as it demonstrates and illustrates that there many varying ways to build a functional CPU, including multiple functionally independent cores. We could also explore the other citation as it also demonstrates a variety of methodologies to build a CPU.
Context is important.
- You are right, Tera-Scale can be looked at both ways - calling these 80 units cores can be argued as well as whether Tera-Scale is even a general purpose CPU. Intel Tera-Scale is a specialized application processor, both the intended application as well as architecture is much closer to GPU than a CPU. It is also a much simpler VLIW architecture with very simple instruction set and execution units (couple FPUs). This has more than a few similarities to AMD's similarly named VLIW GPU architecture TeraScale - HD2000-HD4000 series :)
- A simple CPU may only need instruction and data passed into it and control logic can be minimal or nonexistent, especially the decode part.
- Bulldozer on the other hand is an x86 CPU. This is effectively a RISC processor masquerading as CISC. Fetch and Decode have a large part to play in its operation.
 
Last edited:
Top