Your opinion, not supported by historical fact or any citations.
Have this one:
http://accel.cs.vt.edu/files/lecture2.pdf
Breaks multicore processors into homogeneous (copy-pasta) and heterogenous (CELL is named but think SoC in general where there's many components put together to make a flexible whole).
Page 11: Dies, Observations
Core replication obvious.
Page 14: Multicore-present:
Operating systems schedule processes out to the various cores in the same way they always have on traditional multiprocessor systems.
The fetcher is what the operating system sees. This is why Windows 8/10 report FX-8350 as a "4 core," "8 logical processors" on 1 socket.
Have another (sourced from Intel no less):
http://www.ecs.umass.edu/ece/andras/courses/ECE668/Mylectures/Introduction_to_Multi_Core.pdf
Page 17, diagram and explanation of what it means to transition from one core to two. Area is 2x. Heterogenous cores by definition are copy-pasta and include all parts required to process instructions.
Page 18, Intel highlights the two heterogenous cores on Conroe.
Page 24, processor resources:
-Caches
-General Purpose Registers
-Segment Registers & TLB
-
FP registers, XMM registers
-System Flags
-Control and Data registers, Debug registers, MSRs
-Many more
Page 25, explains differences between CMP, SMP, Hyper Threading, and Software Threading. Particularly relevant:
"Chip Multi Processing, refers to
multiple physical core engines that have unique resources."
Bulldozer's FP registers and XMM registers are not unique resources to the integer cores, they're unique resources to the module. Bulldozer doesn't fit under SMP because that requires sharing all resources.
Page 26, "
Core Architecture (Prescott)" diagram includes everything from instruction TLB to L2 cache. This mirrors AMD's slide showing an Excavator "core" next to a Zen "core."
Page 27, "
Core Architecture (Xeon - Dual Core)" diagram which tells the same story as Prescott. The diagram only includes a single core but on the left most side of it, they have a label depicting "Second core" which means mirror what you see here on the other side of the L2 cache. More confirmation that a "core" is wholistic (fetch-decode-execute), not just what AMD calls an "integer core."
Page 28, "Multi-core platform (Freescale: embedded)" diagram which depicts two clear "e500-mc cores" with "accelerators" and "connectivity" attached to it via "CoreNet fabric."
Page 29, "Multi-core platform (RMI-XLR: embedded)" diagram depicting 8 clear cores on a "Memory Distributed Interconnect."
Page 30, "Tilera - 64 core CPU" diagram depicting 64 interconnected processors.
Page 34, "Tiled Design & Mesh Network" depicts Intel's 80-core Polaris showing each "core" as a "compute element" + "router"
Page 39, "Multi-core: Design Challenges" says "replicating cores improves productivity."
Book or paper, we had enough of looking at die shots and reading 2 paragraph explanations on the internet. You are vehemently claiming this is a
textbook example of core. I am assuming you have stumbled across this very exact description and by that I mean cores that have multiple decode entries, multiple load/store, multiple ALUs/FPUs, etc as is the case with a Bulldozer module. Surely you can pull one example out for us from all the material you read. In everything that I have read however I see this :
View attachment 115241
First, I'll fix the diagram so it's relevant to Bulldozer:
Then I'll point that #1 proves my point:
1. The next instruction to be executed, whose address is obtained from the PC, is fetched from the memory and stored in the IR.
I assume IR stands for "Instruction Register." On all Bulldozer processors, this is part of the Fetch block which is shared for both threads. As far as #1 is concerned, you're only looking at
one CPU.
#2 continues to drive that point home when considering Bulldozer (not Steamroller/Excavator):
2. The instruction is decoded.
The Decode block is shared in Bulldozer so as far as this is concerned, there's only one CPU.
#3 is another task of Fetch block so rewind to what I said above. Two or three steps here dictate we're only dealing with one CPU. See how my tweaked diagram makes a whole lot of sense now?
Finally, step #4 and #5, we get to the
sole components where the cycle deviates
but only if the instruction doesn't include a
floating point instruction otherwise it is back to shared which means #4 and #5 are part of the singular CPU.
TL;DR: At least 3 steps say Bulldozer is a single CPU and without those steps, those integer clusters know not what to do. Pretty clear case a module is a core.
All I see is either generic "instruction decoder" or "timing and control".
Bulldozer shares "instruction decoder" (literally, I didn't modify the diagram at all) and "timing and control" via "Core Interface Unit" (Core IF in diagram):
This diagram has been posted at least twice now. I believe it was sourced from Tom's Hardware which is cited in the lawsuit.
And before you retort that Bulldozer can do two threads simultaneously, remember that it hits blocking scenarios more often than dual-core (or more) processors do as mouacyk pointed out:
This is what baseline core scaling efficiency looks like with the data from
https://openbenchmarking.org/result/1110227-AR-AMDSCAL0184:
View attachment 115088
Even the 2384 does well, because it's got 4 fully independent cores.
Bulldozer underperforms independent cores/processors in c-ray, compress-7zip, npb BT.A, npb FT.B, nbp LU.A, nbp UA.A, and clomp when comparing
Opteron 2384 to FX-8150. For example, in 7-zip, FX-8150 only did 48% better where Opteron 2384 did 102% better. 7zip, as far as I can tell, is very ALU and cache intensive.