# AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit



## btarunr (Aug 28, 2019)

AMD reached a settlement in the Class Action Lawsuit filed against it, over alleged false-marketing of the core-counts of its eight-core FX-series processors based on the "Bulldozer" microarchitecture. Each member of the Class receives a one-time payout of USD $35 per chip, while the company takes a hit of $12.1 million. The lawsuit dates back to 2015, when Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of false-marketing of its FX-series "Bulldozer" processor of having 8 CPU cores. Over the following four years, the case gained traction as a Class Action was built against AMD this January. 

In the months that followed the January set-up of a 12-member Jury to examine the case, lawyers representing the Class and AMD argued over the underlying technology that makes "Bulldozer" a multi-core processor, and eventually discussed what a fair settlement would be for the Class. They eventually agreed on a number - $12.1 million, or roughly $35 per chip AMD sold, which they agreed was "fair," and yet significantly less than the "$60 million in premiums" consumers contended they paid for these processors. Sifting through these numbers, it's important to understand what the Class consists of. It consists of U.S. consumers who became interested to be part of the Class Action, and who bought an 8-core processor based on the "Bulldozer" microarchitecture. It excludes consumers of every other "Bulldozer" derivative (4-core, 6-core parts, APUs; and follow-ups to "Bulldozer" such as "Piledriver," "Excavator," etc.). 



 
Image Credit: Taylor Alger

*View at TechPowerUp Main Site*


----------



## Space Lynx (Aug 28, 2019)

tsk tsk AMD.


----------



## RichF (Aug 28, 2019)

Ridiculous.

1) Meritless suit.

2) Only covers California purchases.

3) Doesn't even cover all of the 8 core consumer-grade FX chips.


----------



## biffzinker (Aug 28, 2019)

Always seen the first FX 8150 as nothing more than a Quad core with additional haradware to give the SMT extra punch when it could deliver.


----------



## RichF (Aug 28, 2019)

biffzinker said:


> Always seen the first FX 8150 as nothing more than a Quad core with additional haradware to give the SMT extra punch when it could deliver.


_Intel's CPUs with hyperthreading enabled, since Nehalem, don't actually have the claimed thread count because hyperthreading has to be disabled because it's a fundamentally insecure implementation.

Intel's CPUs with hyperthreading enabled, on the market after hyperthreading's insecurity (which requires disabling hyperthreading to fix) was made public don't actually have the claimed thread count — because hyperthreading has to be disabled because it's a fundamentally insecure implementation.

CPUs, including Intel CPUs, that don't have AVX-512 enabled or included aren't real CPUs. They simply don't have real cores. Real cores have to have AVX-512.

CPUs that don't have AVX-512 enabled or included aren't real CPUs if they came out after Intel's first consumer-grade CPU with AVX-512 hit the market. These imitation CPUs don't have real cores. Real cores have AVX-512.

CPUs that don't have the ability to process AVX-256 in a single cycle aren't real CPUs._

These are the kind of claims that fit right in with this lawsuit and how you see the FX 8150. They are arbitrary decisions made by the viewer, not backed by reality. (The only one of those claims with any teeth is the claim that Intel shouldn't be selling hyperthreading CPUs with an insecure implementation under the claimed thread count, unless it can be proven that Intel withheld knowledge of its hyperthreading insecurity for a number of years, in which case earlier CPUs would also be relevant. I am including this claim in my sample, though, to illustrate a few points that I think people can figure out on their own, given the context.)

The FPU count of a core is an arbitrary matter. Many CPUs have been sold that didn't even have FPUs. No CPUs have ever been sold that only have FPU cores, which is why counting the number of integer cores is non-arbitrary.


----------



## sutyi (Aug 28, 2019)

lynx29 said:


> tsk tsk AMD.



Bulldozer and it's iterations trough out the years had modules with two pipelines in them,  so technically they did sell people said amount of processing cores albeit some of the front end and FP was shared.
Guess you can view these modules as double wide INT pipelines and call these 4C/8T instead of 8C/8T, but that would not be completely true now would it?

Did AMD lie about the architecture? Not really. Block diagrams show whats what. G
Did AMD lie about core counts? Bit of grey area depending on how one views the architecture as a whole.
Was Bulldozer uarch good? Hell no.
Does this entitle anyone to get money back cause they had chosen poorly even after the reviews? Not really at least in my opinion.

But seemingly you can sue back money if you band together with other people who had bought into FX even after reading trough the benchmarks, not once but even after years of Excator and Piledriver reviews.
'Murica at work right there.


----------



## LDNL (Aug 28, 2019)

These class actions are disgrace. Nobody wins exept the lawyers. There should be a real punishment system to get companies to follow to rule of law. These "settlements" are not even a slap on the wrist.


----------



## RichF (Aug 28, 2019)

LDNL said:


> These class actions are disgrace. Nobody wins exept the lawyers. There should be a real punishment system to get companies to follow to rule of law. These "settlements" are not even a slap on the wrist.


You've already pointed to other people (not just lawyers) winning via these suits. Look at your post again. Who benefits from slaps on wrists? Not just lawyers. Lawyers are easy to blame but they're not the cause of the system of justice we have, certainly not the sole cause.

The system benefits the wealthy who don't care about the environment. In a nutshell.


----------



## Frick (Aug 28, 2019)

RichF said:


> _Intel's CPUs with hyperthreading enabled, since Nehalem, don't actually have the claimed thread count because hyperthreading has to be disabled because it's a fundamentally insecure implementation.
> 
> Intel's CPUs with hyperthreading enabled, on the market after hyperthreading's insecurity (which requires disabling hyperthreading to fix) was made public don't actually have the claimed thread count — because hyperthreading has to be disabled because it's a fundamentally insecure implementation.
> 
> ...



I like to define a core as something a real man uses. I has cores, but not many, but after my core enhancement surgery I’ll be loaded with the bastards.

Anyway. Yawn. I don’t agree with this. Also wasn’t there a time when Windows displayed the 8xxx cpu’s as quad cores with SMT?


----------



## Space Lynx (Aug 28, 2019)

sutyi said:


> Bulldozer and it's iterations trough out the years had modules with two pipelines in them,  so technically they did sell people said amount of processing cores albeit some of the front end and FP was shared.
> Guess you can view these modules as double wide INT pipelines and call these 4C/8T instead of 8C/8T, but that would not be completely true now would it?
> 
> Did AMD lie about the architecture? Not really. Block diagrams show whats what. G
> ...



You misunderstand, I actually don't care either way, I mean tsk tsk as in AMD's legal department wasn't smart enough to prevent this sort of thing to begin with. I'm sure Nvidia consults many lawyers over every tiny little detail, but even then they aren't perfect, aka the .5 gb of vram debate LOL nvidia lost that one in a lawsuit too

these things are going to happen in this industry, they happen to all of them, I honestly don't think it has anything to do with deceit really.  /shrug most people go to work 9 to 5, they don't live and breathe this stuff like a lot of us here do... also /reddit showerthoughts - maybe these companies should hire us as consultants LOL


----------



## GreiverBlade (Aug 28, 2019)

biffzinker said:


> Always seen the first FX 8150 as nothing more than a Quad core with additional hardware to give the SMT extra punch when it could deliver.


but there was 8 Physical core .... only the FP, scheduler  was shared ... so it was a octacore no matter how you look at it, which in term of performance it gave them 80% of a octacore so basically just like a hexacore but with 8 Physical cores, extra hardware core are a core.


----------



## FordGT90Concept (Aug 28, 2019)

This post...


FordGT90Concept said:


> How many "cores" do you see in this picture?
> 
> 
> 
> ...


…totally called it, especially that last sentence.

AMD could never win this argument without changing over a decade of precedents including by competitors like Sun, ARM, IBM, Intel, and even themselves (Athlon 64 X2).  "Integer core" got lost in marketing translation to become something it isn't, a "core."  If AMD accurately advertised the product as having "8 integer cores" this lawsuit would have never been filed.


----------



## seronx (Aug 28, 2019)

biffzinker said:


> Always seen the first FX 8150 as nothing more than a Quad core with additional haradware to give the SMT extra punch when it could deliver.


In which, you are wrong.


sutyi said:


> Bulldozer and it's iterations trough out the years had modules with two pipelines in them,  so technically they did sell people said amount of processing cores albeit some of the front end and FP was shared.
> Guess you can view these modules as double wide INT pipelines and call these 4C/8T instead of 8C/8T, but that would not be completely true now would it?


Bulldozer and forwards are dual-core modules, or monolithic dual-cores without the glue. The front-end, floating-point unit, and L2 unit might be shared, but they are heavily optimized for dual-core(thus, dual-threaded) functionality.


----------



## FordGT90Concept (Aug 28, 2019)

seronx said:


> Bulldozer and forwards are dual-core modules, or monolithic dual-cores without the glue. The front-end, floating-point unit, and L2 unit might be shared, but they are heavily optimized for dual-core(thus, dual-threaded) functionality.


Nope, the "modules" are in fact dual-thread cores (see second point #5 in the quote below: Core Interface Unit).  Cores are wholistic CPUs which "integer cores" are not:


FordGT90Concept said:


> Maybe you can't, but I (and others) did, repeatedly. Have a recap:
> 1. Shared fetcher.
> 2. Depending on iteration, shared decoder.
> 3. Depending on iteration, shared dispatcher.
> ...


This is evidenced in various benchmarks that show under many load conditions, Bulldozer's performance plummets compared to Thuban cores (predating Bulldozer).  Either advertising on Thuban was wrong, or Bulldozer is wrong.  Thuban fit the mold of what is a core, Bulldozer does not; ergo, Bulldozer was misrepresented, not Thuban.


----------



## seronx (Aug 28, 2019)

FordGT90Concept said:


> Nope, the "modules" are in fact dual-thread cores.


Nope, the dual-core modules are in fact two single-thread cores.

=> Each core is dedicated to a thread and contains a four-wide integer execution unit, issue/retire logic, 16-KB four-way set associative L1data cache, and a load/store unit.
Of which there are two.





FordGT90Concept said:


> Cores are wholistic CPUs which "integer cores" are not.  This is evidenced in various benchmarks that show under many load conditions, Bulldozer's performance plummets compared to Thuban cores (predating Bulldozer).  Either advertising on Thuban was wrong, or Bulldozer is wrong.  Thuban fit the mold of what is a core, Bulldozer does not; ergo, Bulldozer was misrepresented, not Thuban.


None of this proves what you have stated.



FordGT90Concept said:


> 1. The fetcher is incapable of saturating the ALUs in a lot of cases where it has to service both integer clusters. Thuban was able to in the same scenarios.
> 2. + 3. AMD choose to split the decoder and dispatcher for reasons revolving around power efficiency and performance.
> 4. AMD was really fixated on the idea that GPUs would take over FPU so, per thread, Bulldozer really offers no improvement over Thuban. Because collisions can happen, in practice it can be slower.



1. 4 macro-ops goes to 4 ALUs+4 AGUs
2. The split decoder is 2 macro-ops for 2 ALUs+2 AGUs.
3. The FPU has 2x 128-bit FMACs + 2x 128-bit MMXs; 2x the FMA capability and 2x the FMISC capability compared to Thuban.


----------



## FordGT90Concept (Aug 28, 2019)

Cores share nothing other than memory ("core replication is obvious").  Bulldozer violates that by sharing fetchers, decoders, dispatchers, and Core Interface Unit.  That's shared logic, not just memory.


----------



## RichF (Aug 28, 2019)

Frick said:


> Anyway. Yawn. I don’t agree with this. Also wasn’t there a time when Windows displayed the 8xxx cpu’s as quad cores with SMT?


Windows has bugs so you'll have to come up with a stronger argument. It's common for software, from CPU-Z to Sandra to anything else that displays spec info — to get those specs wrong until the software is fully updated.

Windows had never seen a CPU with Bulldozer's configuration before. There is also the matter of arbitrary/political choices made by coders and their managers.


FordGT90Concept said:


> Cores share nothing other than memory.


Citation please.


----------



## seronx (Aug 28, 2019)

FordGT90Concept said:


> Cores share nothing other than memory.  Bulldozer violates that by sharing fetchers, decoders, dispatchers, and Core Interface Unit.  That's shared logic, not just memory.


Um no.

A core only needs an instruction bus, a control unit, a datapath, and a data bus.



			https://img.techpowerup.org/190828/core-replication-is-obvious.png


----------



## PanicLake (Aug 28, 2019)

What a shitty money grab law suit.


----------



## FordGT90Concept (Aug 28, 2019)

RichF said:


> Citation please.


It's all in this thread:








						Bulldozer Core-Count Debate Comes Back to Haunt AMD
					

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent...




					www.techpowerup.com
				




I'm not going to reargue all this when it's been settled. 


Here's the specific quote though, just for you (page 11):


			http://accel.cs.vt.edu/files/lecture2.pdf


----------



## Deleted member 163934 (Aug 28, 2019)

RichF said:


> The FPU count of a core is an arbitrary matter. Many CPUs have been sold that didn't even have FPUs. No CPUs have ever been sold that only have FPU cores, which is why counting the number of integer cores is non-arbitrary.



When "Bulldozer" was released how many PC CPUs (sold as new) had less FPU cores count compared to integer cores?

"Bulldozer" architecture was and still is a disgrace.
The FX-4100 running at 3,6Ghz (can boost to 3,8 Ghz) is advertised as a 4 cores cpu.
Athlon II x4 640 is running at 3 Ghz (no boost) and is a 4 cores cpu.
Main problem is that in single-threaded performance the FX-4100 is scaling worst that the Athlon II x4 640 and in term of multi-threaded performance the FX-4100 is actualy worst than the Athlon II x4 640.
So basicaly we had a newer architecture that had worst performance/clock compared to the older architecture. This is something that you see during development phase. So what's the point to push on the market a product that sucks compared to the older architecture?!?

Why AMD insisted with the Faildozer revisions is beyond my understanding. At that time the only real solution to upgrade from a K10 family cpu was an Intel CPU (visible in their shitty market share during Faildozer era).

If it was me taking the call at AMD after seeing how crap is the Faildozer architecture I would had see if it's possible to actually get a bit higher clocks from K10 family and add SSE 4.1/4.2 and AVX and actually try to work on a new CPU architecture (trust me without a single person that took part in the development of Faildozer).

Not really related to this topic. I kinda got tired and sick of people praising/defending AMD. They kinda forget that AMD actually wanted to sell Faildozer at some prices that have nothing to do with the reality of that architecture, that in Windows case AMD drivers were and still are pure junk, that AMD decided to no longer release drivers for Win 8.1 an use that still has more than 3 years to live, that each Ryzen release iteration has been full of problems (you kinda expect the 3rd iteration to be smooth but well not in AMD case... most likely because they just rush the products on the market without real testing something that I said about AMD 7 years ago). I'm not bashing AMD, just pointing to problems that AMD just doesn't care! I don't like Intel or Nvidia, the deal is that Intel, Nvidia and AMD have all 3 shady marketing technique (people accused NVIDIA of crippling the performance of older architectures in drivers, if you think AMD is better maybe you should check better because AMD might had done something that is far worst).


----------



## FordGT90Concept (Aug 28, 2019)

thedukesd1 said:


> Why AMD insisted with the Faildozer revisions is beyond my understanding. At that time the only real solution to upgrade from a K10 family cpu was an Intel CPU.


Because of a paper written in the 2000s that suggested you could vastly improve integer performance with only a 10-15% increase in transistors by having more than one integer cluster per core.  AMD tried it and it didn't go over so great because software couldn't really make heads or tails of the thing.  Software was compiled for complete processors, not dual-thread--asymmetric processor designs.  Bulldozer could have been great if software was compiled to take advantage of it. ...but that doesn't really matter because AMD still lied about how many "cores" their Bulldozer products had.


----------



## seronx (Aug 28, 2019)

thedukesd1 said:


> When "Bulldozer" was released how many PC CPUs (sold as new) had less FPU cores count compared to integer cores?


Bulldozer's FPU is in fact two FPUs in one.  It is shared between the cores because that was a key advantage of a monolithic dual-core over a glued dual-core.


----------



## FordGT90Concept (Aug 28, 2019)

Do a 256-bit FMAC operation and the thread on the other integer cluster is stalled until it completes.  Impossible on a dual-core processor.


----------



## seronx (Aug 28, 2019)

FordGT90Concept said:


> Do a 256-bit FMAC operation and the thread on the other integer cluster is stalled until it completes.  Impossible on a dual-core processor.


None of the general purposes cores get stalled from anything on the FPU core.  They are distinct from the FPU.


----------



## RichF (Aug 28, 2019)

thedukesd1 said:


> When "Bulldozer" was released how many PC CPUs (sold as new) had less FPU cores count compared to integer cores?
> 
> "Bulldozer" architecture was and still is a disgrace.
> The FX-4100 running at 3,6Ghz (can boost to 3,8 Ghz) is advertised as a 4 cores cpu.
> ...


The performance of Bulldozer isn't relevant here.

The value of Bulldozer isn't relevant here.

How "shady" various corporations is isn't relevant here, with the possible exception of pointing out that corporations are shady by definition because profit is extracted by selling things for more than their worth. In order to do that, people must be tricked into parting with more of their money than they should.

What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core. Please, at least, try to rebut what I've said instead of ignoring my arguments entirely. I've done more than just point out that FPUs aren't required.


----------



## Deleted member 163934 (Aug 28, 2019)

FordGT90Concept said:


> Software was compiled for complete processors, not dual-thread--asymmetric processor designs.  Bulldozer could have been great if software was compiled to take advantage of it. ...



Well AMD at that point didn't had the highest market share in pc cpu area. Also when you develop a new cpu you also need to make sure that older software that might not see any optimization for you new cpu architecture are running better on your new cpus.
Both are important. First one because if you are the maket leader newer software will probably be optimized for your newer architecture and maybe some of the older software will see optimizations. Second is also important because if older software runs like crap on your new cpu architecture you might not really sell that many cpus.

By development I understand more than cpu design phase, I also understand the samples testing phase. During samples testing it's impossible not to see them underperforming with the existing software and in this situation you need to go back to design phase and fix what is wrong.
As the underdog if you expect the software to get optimized for your architecture then you are kinda suicidal. The software developers are not really gonna bother, they gonna point that their software works better on your older architecture and on your competition cpus and are not gonna waste their time with your architecture and as result your cpus are gonna perform badly and you are not gonna sell...


----------



## londiste (Aug 28, 2019)

RichF said:


> What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core.


Technical definition of core includes or implies it being independent. Bulldozer modules are independent, Bulldozer cores are not.

If you want to read the discussion/argument going back and forth for several rounds, go though that thread. We probably won't be able to bring anything new to the table here.


FordGT90Concept said:


> It's all in this thread:
> 
> 
> 
> ...


----------



## RichF (Aug 28, 2019)

londiste said:


> Technical definition of core includes or implies it being independent.


So, a core is the entire chip. I've never seen that definition before.


----------



## londiste (Aug 28, 2019)

RichF said:


> So, a core is the entire chip. I've never seen that definition before.


In a multi-core processor, each core is effectively an independent CPU.

While wiki is not always the best source, its description is pretty accurate:


			
				https://en.wikipedia.org/wiki/Multi-core_processor said:
			
		

> A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions, as if the computer had several processors.


----------



## FordGT90Concept (Aug 28, 2019)

seronx said:


> None of the general purposes cores get stalled from anything on the FPU core.  They are distinct from the FPU.


You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress.  Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on.  We see this a lot in Bulldozer multithreaded benchmarks where it only performs at about 120% of a quad core versus 200%.  Bulldozer is great at 7-zip because of the MIPS nature of 7-zip.  In most other tests, it's looks comparable to Intel's Hyper-Threading (but slightly faster because it can simultaneously execute two integer operations).

Judging by performance, it's more like an SMT quad-core than a non-SMT octo-core.  Legit octo-cores (AMD and Intel alike) bitch slap Bulldozer.  ...and it's reflected in the increase of transistors too.


----------



## RichF (Aug 28, 2019)

londiste said:


> In a multi-core processor, each core is effectively an independent CPU.


But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.

So, I'm seeing a paradoxical claim. It's independent except that it's not.


----------



## londiste (Aug 28, 2019)

RichF said:


> But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.


What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?



RichF said:


> So, I'm seeing a paradoxical claim. It's independent except that it's not.


CPU or processing unit as in the wiki sentence above is defined by performing instructions - in this case x86 instructions. CPU and core are defined as the same. CPU/core includes front-end, its execution resources along with control circutry to manage it. Today L2 cache is usually included in the core while by academic definition it is outside it. Both by definition and in practice this bunch of things is independent. 

Memory controller, higher level caches like L3 and newer additions to CPU die like IO are not core functions.

In a multicore CPU, one core can be disabled completely and remaining cores will work as expected. This has always been the case, in practice this is usually implemented with a setting in BIOS.


----------



## RichF (Aug 28, 2019)

londiste said:


> What would you consider a CPU? What does a chip or a logic circuit have to do to be a CPU?


I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.


----------



## Deleted member 163934 (Aug 28, 2019)

RichF said:


> What is relevant is the technical definition of a core. As I already posted, FPUs are not even required to have a CPU core. Please, at least, try to rebut what I've said instead of ignoring my arguments entirely. I've done more than just point out that FPUs aren't required.



The industry (AMD included) for years made it be like this: 1 core = 1 FPU .

Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs at Bulldozer release date.


----------



## londiste (Aug 28, 2019)

RichF said:


> I'm merely pointing out that there are problems with the independence angle. Taken to its full extent, it ends up with the notion that multi-core CPUs have to have-fully independent CPUs, more than one, on a single die.


This notion is completely accurate.


----------



## RichF (Aug 28, 2019)

thedukesd1 said:


> The industry (AMD included) for years made it be like this: 1 core = 1 FPU .
> 
> Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
> By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs. at Bulldozer release date.


Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.

Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.


londiste said:


> This notion is completely accurate.


Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like a common pool of cache and cannot operate with some of the things they share disabled.

Besides, if you believe it's accurate then you really shouldn't use the terminology "multi-core CPU". Instead, you should use the terminology "multi-CPU die". This is because a core is a subset of a CPU. In multi-core CPUs in particular there is the expectation of resource sharing. That is what separates "core" from "cpu".


----------



## londiste (Aug 28, 2019)

RichF said:


> Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like the way Broadwell C's cores shared the L4 cache.


I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:








						Central Processing Unit (CPU) | What, Definition & Summary
					

Candidates should be able to: state the purpose of the CPU describe the function of the CPU as fetching and executing instructions stored in memory explain how common characteristics of CPUs such as clock speed, cache size and number of cores affect their performance. What is the purpose and...




					teachcomputerscience.com
				




Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:








						Zen - Microarchitectures - AMD - WikiChip
					

Zen (family 17h) is the microarchitecture developed by AMD as a successor to both Excavator and Puma. Zen is an entirely new design, built from the ground up for optimal balance of performance and power capable of covering the entire computing spectrum from fanless notebooks to high-performance...




					en.wikichip.org
				











						Skylake (client) - Microarchitectures - Intel - WikiChip
					

Skylake (SKL) Client Configuration is Intel's successor to Broadwell, a 14 nm process microarchitecture for mainstream workstations, desktops, and mobile devices. Skylake succeeded the short-lived Broadwell which experienced severe delays. Skylake is the 'Architecture' phase as part of Intel's...




					en.wikichip.org
				




Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.


----------



## FordGT90Concept (Aug 28, 2019)

RichF said:


> But it's not independent. It shares resources unless it is two or more complete CPU chips on one die, where one can function with the others completely disabled. This independence, in fact, contradicts the concept of a "multi-core processor". Instead, it mandates that it be a multi-processor die.
> 
> So, I'm seeing a paradoxical claim. It's independent except that it's not.


They are independent.  Each core fetches instructions, decodes them, executes them, and stuffs the result back into the memory where they can used again.

In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster.  As far as software is concerned, it only sees the first: the one for the core.  This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right.  Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle.  Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor."  This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!

Years passed and in Windows 10 (I think 8 too), Microsoft finally tackled the problem by making the Windows thread scheduler aware of physical processors and logical processors.  Because of this, Windows can now appropriately delegate floating point (by physical processor) and integer (by logical processors).  Why?  Because it's a relevant problem for all SMT implementations.  Ryzen, Core i#, and Pentium 4 w/ HT only have one integer cluster and one floating point cluster each but they can accept two threads.  The underlying hardware has to be managed in a similar fashion Bulldozer's does.  The only difference is that tiny little detail of Bulldozer having two integer clusters instead of one.

FX-8350 has four cores, each core has two integer clusters.  They are not equivalent.  A core knows how to do SIMD (single instruction multiple data); an integer cluster does not.  Integer clusters are fundamentally calculators, not processors.  They lack awareness (have no knowledge of parallelism), logic (Boolean tests, branching, etc.), and access (only have the data they are given) to be processors.


----------



## RichF (Aug 28, 2019)

londiste said:


> I don't have good academic sources handy to reference. The basics should be the same in course materials, for example:
> 
> 
> 
> ...


Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.

As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the _1 integer core per module _setting.


----------



## londiste (Aug 28, 2019)

RichF said:


> Zen is a more Intel-like design than Bulldozer. Pairing it with Skylake doesn't resolve this issue.


Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.


----------



## FordGT90Concept (Aug 28, 2019)

RichF said:


> As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the _1 integer core per module _setting.


All that does is tell the core fetcher to only accept one thread instead of two.  Fundamentally no different than disabling Hyperthreading.


----------



## RichF (Aug 28, 2019)

londiste said:


> Zen and Skylake are by-the-book multicore CPUs. So have been the rest of x86 CPUs including AMD's K8 and K10.


By Intel's book.

Diverging from common design doesn't mean lawsuit. 


FordGT90Concept said:


> All that does is tell the core fetcher to only accept one thread instead of two.  Fundamentally no different than disabling Hyperthreading.


Citation please. Everything I've read said FX CPUs' CMT is not SMT. I am also interested in how that BIOS setting works, considering that you are saying it's mislabeled by Gigabyte.


----------



## londiste (Aug 28, 2019)

FordGT90Concept said:


> In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster.  As far as software is concerned, it only sees the first: the one for the core.  This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right.  Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle.  Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor."  This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!


There is only one fetch, this is in the frontend of the Bulldozer module. There are multiple schedulers for dispatched micro-ops owing to split execution stage. By the way, Thuban and Zen also have split execution stage and separate schedulers for Integer and FPU clusters.

Microsoft's problem with scheduling was strange. The eventual fix was a change in how a Bulldozer CPU was being handled. Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT (it is noteworthy that Linux did the same much sooner). This inherently addressed both problems plaguing scheduling:
- Moving threads around to undesired cores. Simple example is a second core in single module where first core is already loaded.
- Because of the same reason any FPU-heavy load was now more likely to go to unused module largely negating the shared FPU issue.


----------



## PerfectWave (Aug 28, 2019)

law suite for just 35 bucks each lol


----------



## seronx (Aug 28, 2019)

FordGT90Concept said:


> You have two threads in the core: both threads have a 256-bit FMAC operation required before they can progress.  Both threads are stalled while the shared FPU round robin executes the operations: only one thread per cycle is being worked on.


Both threads are not stalled if both cores schedule a 256-bit op.
2x 80-bit Lo
2x 64-bit Hi
2x 128-bit Mid

If a 256-bit op; it would execute Lo twice for 128-bit and Hi twice for 128-bit on a single port for each thread. Lo to Hi register moves is 1-cycle, Lo(Hi) to Lo(Hi) register moves is 0-cycle.  So, if the second thread is dependent on the first thread it could execute the first half on both. etc, etc, etc.

FPU design is built with two cores in mind.  Front-end is built with two cores in mind.  L2+interface is built with two cores in mind.  There is physically only two cores in the Bulldozer module.  As it is the world's first monolithic dual-core x86 architecture.


----------



## RichF (Aug 28, 2019)

londiste said:


> Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT


Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?

How the Windows scheduler acts with a design it wasn't made for is an interesting topic but hardly proof.


----------



## Deleted member 163934 (Aug 28, 2019)

RichF said:


> Many CPUs had been sold without FPUs. There may have even been an Alpha at the time, shortly before, or after. It doesn't matter. FPUs were never a requirement to have a CPU.
> 
> Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core.



The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.
We are not living in the days when code was properly optimized. Today we are living in the days when "buy better cpu" , "buy more ram" , etc is normal...

If I code something and I know for sure that whoever is leading is changing his/her mind regarding what he/she wants then you can bet I will basicaly abuse FPU usage so I just don't have to go back and change things. You can say it's my fault I will say it's whoever leading fault because he/she clearly has no clear goals in mind.
And trust me the "insert bad words" leader that is changing his/her mind regarding what he/she wants also want things done fast, totaly ignoring the fact that his/her instability is making things slower and lacking optimization.

There is not gonna be a real diference between 4 or 8 heavy FPU threads run on the 8150. This is a reason to raise the question if 8150 is an 8 core or 4 core. Someone might not even say what type of threads is pushing on the 8150 and just make such a benchmark and show that it basicaly no real difference between 4 and 8 threads run on the 8150. Do I have to say what type of threads I'm pushing on a CPU? I think not.


----------



## RichF (Aug 28, 2019)

thedukesd1 said:


> The FPU has become part of a core because it started to be need often and because emulation sucks in term of performance.


1) FPU emulation has always been vastly slower than having a hardware FPU. So, pointless.

2) I'll repeat my question: _Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?_

Zen 1 can't execute 256-bit AVX independently. It has to combine at the 128-bit level. So, it doesn't have any real cores, eh? Not only is it slower at doing 256-bit AVX, it can't do it independently.


----------



## londiste (Aug 28, 2019)

RichF said:


> Everything I've read said FX CPUs' CMT is not SMT.


CMT is a little bit of this, a little bit of that. SMT implies there are no added execution resources for additional threads. AMD added an Integer Cluster for CMT. Aside from that they are very similar.

Keep in mind that added Integer Cluster does not necessarily mean a huge boost in execution resources. Bulldozer Integer Clusters contain 4 pipes each (2 ALU, 2 AGU) and FPU contains 3 pipes (2 FMAC + MMX). At the same time, Zen's Integer Cluster has 6 pipes (4 ALU, 2 AGU) and FPU has 3 pipes (2 FMAC + MMX).


----------



## xtreemchaos (Aug 28, 2019)

ive a bulldozer and 2 piledrivers which I love, ive moved on now but still keep them for all times sake, to tell the truth it don't mean much if there 4 or 8 core to me there a part of my life and I enjoyed them and thay made me happy.


----------



## RichF (Aug 28, 2019)

londiste said:


> CMT is a little bit of this, a little bit of that. SMT implies there are no added execution resources for additional threads. AMD added an Integer Cluster for CMT. Aside from that they are very similar.


To many consumers, SMT is also very similar to a real core. To others it's not. This lawsuit is about consumer perception, apparently. A man is upset that his CPU wasn't as fast as he expected it to be.



londiste said:


> Keep in mind that added Integer Cluster does not necessarily mean a huge boost in execution resources. Bulldozer Integer Clusters contain 4 pipes each (2 ALU, 2 AGU) and FPU contains 3 pipes (2 FMAC + MMX). At the same time, Zen's Integer Cluster has 6 pipes (4 ALU, 2 AGU) and FPU has 3 pipes (2 FMAC + MMX).


Zen is also a much more recent CPU that has the benefit of a smaller node for a larger transistor budget (as well as the talent of Keller and 20/20 hindsight vision). AMD made decisions with Bulldozer that resulted in weak cores. Weak cores don't mean justified lawsuit.


----------



## 64K (Aug 28, 2019)

PerfectWave said:


> law suite for just 35 bucks each lol



Well, Nvidia settled the GTX 970 class action lawsuit against them for $30 each.


----------



## RichF (Aug 28, 2019)

64K said:


> Well, Nvidia settled the GTX 970 class action lawsuit against them for $30 each.


That was a cut-and-dried case, by comparison. The 512 MB partition had abysmal performance, nowhere near the level of GDDR5. Plus, to top it off, it came with XOR contention.

Selling a "4 GB" card with 512 MB of its VRAM being sneaky hidden _slug RAM_ is fraud, pure and simple. This is especially true given the context of marketing it as a 4 GB part when 970 SLI was a big thing because of the high cost of the 4 GB 980.

Plenty of people felt duped because they would have more strongly considered a 980, gone AMD, or waited for something else before spending their money. But, hey, look at the great deal on the 970 in SLI — it has the same amount of VRAM as the expensive 980! Oops.

I remember how Witcher 3's visuals were downgraded to fit into the 970's 3.5 GB. Nvidia is powerful, indeed.


----------



## Deleted member 163934 (Aug 28, 2019)

RichF said:


> 1) FPU emulation has always been vastly slower than having a hardware FPU. So, pointless.
> 
> 2) I'll repeat my question: _Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?_
> 
> Zen 1 can't execute 256-bit AVX independently. It has to combine at the 128-bit level. So, it doesn't have any real cores, eh? Not only is it slower at doing 256-bit AVX, it can't do it independently.



Let's just say I have an workload that I can happy split it easily in an even way in 8 threads or 4 threads, it's a finite worload. Spliting it in 8 threads should finish it considerable faster than 4 threads.
If I do this with 8 FPU only threads and 4 FPU only threads on the 8150 the workload is gonna finish in an similar amount of time in both cases. Main problem is that for an real 8 core cpu this is not what I would expect. If I don't say what type of threads I used, because I don't have to say, you advertised an 8 cores 8 threads cpu so I can put whatever type of threads I want on that cpu, then anyone that see the result will start to wonder why a properly thread splited workload is showing no improvement at all when using 8 threads compared to 4.

You gonna say again about faster. And you are wrong. (If the 8 and 4 threads have nothing to do with FPU the 8150 starts to behave as expected...)

If 8150 would had been advertised as an 8 cores 4 threads cpu I wouldn't had anything to say. (I would wonder why you need 2 cores for 1 thread and how exactly you split that 1 thread internaly between the 2 cores...)
You are free to point to a cpu advertised to have more cores than threads. I'm not aware of such case.


----------



## 64K (Aug 28, 2019)

RichF said:


> That was a cut-and-dried case, by comparison. The 512 MB partition had abysmal performance, nowhere near the level of GDDR5. Plus, to top it off, it came with XOR contention.
> 
> Selling a "4 GB" card with 512 MB of its VRAM being sneaky hidden _slug RAM_ is fraud, pure and simple. This is especially true given the context of marketing it as a 4 GB part when 970 SLI was a big thing because of the high cost of the 4 GB 980.
> 
> Plenty of people felt duped because they would have more strongly considered a 980, gone AMD, or waited for something else before spending their money. But, hey, look at the great deal on the 970 in SLI — it has the same amount of VRAM as the expensive 980! Oops.



I wasn't usisng the GTX 970 as a comparison to AMD's case. I was just pointing out to PerfectWave that these mini settlements ($35) aren't unusual. I doubt anyone expected to get compensated the full amount that they paid for the Bulldozer CPUs.


----------



## RichF (Aug 28, 2019)

64K said:


> I wasn't usisng the GTX 970 as a comparison to AMD's case. I was just pointing out to PerfectWave that these mini settlements ($35) aren't unusual. I doubt anyone expected to get compensated the full amount that they paid for the Bulldozer CPUs.


Ok. The comparison is worth making.

Another further point about it, too: The 970's VRAM wasn't even apparent initially. By contrast, anyone who had bothered to read reviews from legitimate sites would have seen that Bulldozer wasn't delivering (and Piledriver, although, at least, it was very cheap near the end). 970 buyers truly were tricked. Bulldozer and Piledriver buyers were either too impatient (pre-orders) or too lazy (didn't check reviews) and deserve the product their received. AMD was under no legal obligation to make its cores as powerful or more powerful when compared with Intel's.

It's a good thing for Intel that this guy didn't buy an Atom. It wasn't out-of-order, after all. He would have sued on the basis that any modern CPU is expected to have out-of-order design for performance since in-order hadn't been on the market since the Pentium 1. Of course, there's also the bait-and-switch of the chipset using far more power than the CPU, negating the justification for the performance-killing in-order design to conserve power.


----------



## FordGT90Concept (Aug 28, 2019)

RichF said:


> Citation please. Everything I've read said FX CPUs' CMT is not SMT. I am also interested in how that BIOS setting works, considering that you are saying it's mislabeled by Gigabyte.







__





						CPU Instructions  | pclt.sites.yale.edu
					






					pclt.sites.yale.edu
				





> The CPU has to fetch the data, then turn the instruction over to one of the processing units.


CPU = core
processing units = integer cluster


----------



## londiste (Aug 28, 2019)

FordGT90Concept said:


> CPU = core
> processing units = integer cluster


Incorrect. At least from CS point of view.
Integer cluster is execution unit.


----------



## FordGT90Concept (Aug 28, 2019)

Execution units are inside the integer clusters.

1+ CPU -> 1+ core -> 1+ integer cluster -> 1+ execution unit
FX-8350 -> 4 core -> 8 integer cluster -> 64 execution unit (don't quote me on that, fast research)
R7 1800 -> 8 core -> 8 integer cluster -> 48 execution unit (again, fast research)


----------



## RichF (Aug 28, 2019)

FordGT90Concept said:


> __
> 
> 
> 
> ...


And yet Gigabyte didn't label the BIOS setting "One thread per core". Differences of opinion apparently surround this hardware all over tech.


----------



## FordGT90Concept (Aug 28, 2019)

I'm not talking about Gigabyte at all.

I just stated a simple fact that as far as software is concerned, they're the same.  Each core is either given one thread or two.  It's no more complicated than that without drilling deep into the hardware design.


----------



## Vya Domus (Aug 28, 2019)

FordGT90Concept said:


> Do a 256-bit FMAC operation and the thread on the other integer cluster is stalled until it completes.  Impossible on a dual-core processor.



How can you still go back to this, has it not being made clear to you that this has nothing to do with whether or not something is a single or dual core processor ? In what world is it required for a core to do some arbitrary 256-bit instruction ?



FordGT90Concept said:


> Cores share nothing other than memory ("core replication is obvious").  Bulldozer violates that by sharing fetchers, decoders, dispatchers, and Core Interface Unit.  That's shared logic, not just memory.



You have been proven wrong many times on this, cores share caches, crossbars, memory controllers, and even other DSP like units without which they could not function independently.


----------



## RichF (Aug 28, 2019)

FordGT90Concept said:


> I'm not talking about Gigabyte at all.


You cited the words of a software engineer from Yale. I am pointing out that other software engineers may see things differently.

Unrelated to this post but related to the previous one about Atom... In the case of that bait-and-switch, it actually was worse for the shopper if one had done his/her homework early-on because there was a lot of hype around how Intel's engineers had worked so hard with a pure philosophy of only adding functionality if it could be put in without using too much power. It was very dreamy in its presentation, at least that was the intent.

The cold reality was that the chipset was a terrible pig that made the entire thing into a sad joke. Despite that, the hype machine was in full force and netbooks with this terrible tech sold very well as a result.


----------



## londiste (Aug 28, 2019)

RichF said:


> As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the _1 integer core per module _setting.


'1 integer core per module' is honestly a straightforward setting and equates directly to disabling HT or SMT. In this case what it does is disabling CMT.

BIOS settings are sometimes a bit of smoke and mirrors. In the long thread there were examples of settings that allow disabling each core separately which is technically impossible. I mean, you can disable part of a module (Integer Cluster) in Bulldozer and the rest will be a functional core. But you cannot disable anything else without compromizing core functionality. This goes back to the problem of not being independent.


----------



## Mephis (Aug 28, 2019)

Cores are like porn. I know one when I see one.


----------



## FordGT90Concept (Aug 28, 2019)

Vya Domus said:


> In what world is it required for a core to do some arbitrary 256-bit instruction ?


It's not the type of operation that matters, its the fact that it blocks a parallel thread.  Dual cores can't block because they're separate processors.  A single core can block when there are shared logic circuits (e.g. Hyperthreading and Bulldozer's design).



RichF said:


> You cited the words of a software engineer from Yale. I am pointing out that other software engineers may see things differently.


We're talking about truth in advertising.  Your average Tom, Dick, and Harry needs to understand it, not engineers.  AMD made a critical mistake by omitting "integer" on their products.


----------



## ProPain (Aug 28, 2019)

Big Companys like AMD sould learn from there mistakes .... otherwise CLASS ACTION LAWSUITES are a good reminder for them ....

AMD - will stop flase Marketing - and I like that !


----------



## RichF (Aug 28, 2019)

Mephis said:


> Cores are like porn. I know one when I see one.


That is actually one of the worst judicial atrocities in modern history.

Of course, being utterly corrupt, it's widely praised and influential to this day.


FordGT90Concept said:


> We're talking about truth in advertising.  Your average Tom, Dick, and Harry needs to understand it, not engineers.  AMD made a critical mistake by omitting "integer" on their products.


Too vague a statement to be useful here. One guy's problem with AMD's marketing managed to get some success in the courts. That's proof of very little.

Atom is actually the bigger fraud but since it was backed by Intel and not presented as a high-performance architecture people were more content to be duped. Being duped by the illusion of some grand power-conservation _Zen_ magic is more tolerable, I guess — despite the plethora of netbooks that filled stores and which vanished like the mighty dinosaur in a comparative flash.

Atom would have been lame with a good chipset but it would have been lame on its own terms, not false pretenses. At least it would have been quieter, with more battery life, and with less heat.


----------



## FordGT90Concept (Aug 28, 2019)

RichF said:


> Too vague a statement to be useful here. One guy's problem with AMD's marketing managed to get some success in the courts. That's proof of very little.


What's vague about this?




AMD screamed the lie from the top of the mountains.

New processors only have the core/thread count on the tiny little sticker sealing the box now. 

If they just snuck the little word "integer" in there, this would have never happened because the packaging is truthful.  But no.  They had to make their product stand out...


----------



## RichF (Aug 28, 2019)

You're back to recycling the whole FPU thing? Are we doing the time warp again?

(makes Magenta face)

quote: _"AMD made a critical mistake by omitting 'integer' on their products."_


----------



## FordGT90Concept (Aug 28, 2019)

No, this is what an 8 core looks like:




Not this:




This guy is many billion transistors short of what it claims to be.


----------



## Vya Domus (Aug 28, 2019)

FordGT90Concept said:


> Dual cores can't block because they're separate processors.



They can totally block each other and they do it all the time, for example when two threads from different cores try to write to the same cache line of say the L3 cache which is shared.

There isn't a single multi-core processor on the planet that can truly operate 100% independently and that doesn't share anything. This notion of separate processors does not exist in this context, stop using it.

The only truly separate processors are the ones on different boards. End of story.


----------



## londiste (Aug 28, 2019)

L3 cache is not core.


----------



## RichF (Aug 28, 2019)

So, a wafer-scale processor could have 3,000 "quasi-cores" but it's _a single-core chip_ if those "quasi-cores" share things other than cache?


----------



## FordGT90Concept (Aug 28, 2019)

Vya Domus said:


> They can totally block each other and they do it all the time, for example when two threads from different cores try to write to the same cache line of say the L3 cache which is shared.


That's a bandwidth issue, not a logic issue.  Whenever two threads require the same data, one is going to have to take precedent over the other.  In the case of Bulldozer, the data can be completely unrelated and the bandwidth available, but still collide because they are not independent processors.



RichF said:


> So, a wafer-scale processor could have 3,000 "quasi-cores" but it's _a single-core chip_ if those "quasi-cores" share things other than cache?


Yes, that's pretty much what a GPU is.


----------



## londiste (Aug 28, 2019)

RichF said:


> So, a wafer-scale processor could have 3,000 "quasi-cores" but it's _a single-core chip_ if those "quasi-cores" share things other than cache?


If these quasi-cores share a frontend - yes, that would be a single-core chip.


----------



## RichF (Aug 28, 2019)

londiste said:


> If these quasi-cores share a frontend - yes, that would be a single-core chip.


At which point were are in Neverland, where reality doesn't have anything to do with pedantism.


FordGT90Concept said:


> Yes, that's pretty much what a GPU is.


Funny, then, how the industry doesn't refer to GPUs as single core. AMD, for example, has been counting cores for many years. Ooo... he can sue them over that, too!


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> L3 cache is not core.



Who gets to say this ? L1 and L2 didn't used to be core, they had their own sockets on the boards. FPUs didn't used to be core, they too had their own sockets on the board.

Do you see where this goes ? No matter where you go if you try and reduce this problems you always end up in the same place, the definition of a core is archaic and it is no longer relevant in modern times.



FordGT90Concept said:


> That's a bandwidth issue, not a logic issue.



It has nothing to do with bandwidth, two entities try to write to the same addresses.

Can I write to it? Yes or not. It's a logic issue.


----------



## ProPain (Aug 28, 2019)

Stop all this ....  

*AMD ( Advanced Micro Devices, Inc )* was ruled = *G.U.I.L.T.Y - that is 100% fact* .... the Courts, however, decide on compensation claims.

And AMD is happy with this - and all AMD fanboys have to accept !


----------



## FordGT90Concept (Aug 28, 2019)

Vya Domus said:


> FPUs didn't used to be core, they too had their own sockets on the board ?


In the case of x87, it wasn't a core, it was a co-processor that could not function without an x86 master.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> Who gets to say this ? L1 and L2 didn't used to be core, they had their own sockets on the boards. FPUs didn't used to be core, they too had their own sockets on the board ?


Core and CPU are defined by carrying out an instruction. In this thread, the context is x86, so x86 instructions. Core is a piece of logic that gets the instruction and outputs the result. L2 is in a bit of grey area as it does not really fit the classical core but today has become a necessary component to put right next to a core.

L3 fits in on a higher level, initially for communication between multiple CPUs - or today more commonly cores - as well as working with memory controller to make RAM less of a bottleneck. This is a multiprocessor system architecture - multiple processors and other logic units (IO Controller, RAM controller) on a shared bus (HyperTransport, Ring Bus, Infinity Fabric).



ProPain said:


> *AMD ( Advanced Micro Devices, Inc )* was ruled = *G.U.I.L.T.Y - that is 100% fact* .... the Courts, however, decide on compensation claims.


No it wasn't. It never got ruling in the court. AMD settled which means they decided that paying off the complaining parties was a more beneficial way of resolving this dispute.


----------



## RichF (Aug 28, 2019)

ProPain said:


> Stop all this ....
> 
> *AMD (Advanced Micro Devices, Inc )* was ruled = *G.U.I.L.T.Y - that is 100% fact* .... the Courts, however, decide on compensation claims.
> 
> And AMD is happy with this - and all AMD fanboy have to accept !


The judge has to accept the settlement. Moreover, bad rulings are hardly rare. One of the most egregious trends in recent judicial incompetence is the fad for convicting physicians of murder for prescribing opioids to adults. This feeding frenzy of stupidity is also involving massive cash grabs by states and corrupt judges, as in Oklahoma.

People have been lamenting the judiciary's ineptitude when it comes to tech for a long time. Many can't even get the basic concept of physician and adult responsibility right — as if prescriptions don't have labels and as if adults have vanished from America, replaced by large kids. If judges can't understand basic concepts like adult personal responsibility one can't expect much when it comes to advanced tech that even experts apparently disagree on.


----------



## ProPain (Aug 28, 2019)

In the end - AMD is guilty!


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Core and CPU are defined by carrying out an instruction. In this thread, the context is x86, so x86 instructions. Core is a piece of logic that gets the instruction and outputs the result. L2 is in a bit of grey area as it does not really fit the classical core but today has become a necessary component.



Where are these definitions ? Can you link books, papers anything ?

I have never in my life seen a core being labeled as the block that gets to execute instructions, that's usually simply called the ALU or execution unit. And the control portion gets to fetch and decode the instructions that are fed into the ALU.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> Where are these definitions ? Can you link books, papers anything ?


Will try to search some books or papers.





Vya Domus said:


> I have never in my life seen a core being labeled as the block that gets to execute instructions, that's usually simply called the ALU or execution unit. And the control portion gets to fetch and decode the instructions that are feed into the ALU.


At least when we are talking about x86 CPUs today - ALU or execution unit does not execute instructions. These execute (micro)operations. x86 instructions are often complex and are not executed directly or in a single cycle. This is where the decode part comes in - it breaks the instruction down to micro-operations that get sent to execution units.

For example, MOV between registers is a single micro-op that goes to one of the ALUs. MOV from register to memory is several cycles (4?) that involves both ALU and an AGU.


----------



## Frick (Aug 28, 2019)

ProPain said:


> In the end - AMD is guilty!



Technically, no.


Vya Domus said:


> Where are these definitions ? Can you link books, papers anything ?
> 
> I have never in my life seen a core being labeled as the block that gets to execute instructions, that's usually simply called the ALU or execution unit. And the control portion gets to fetch and decode the instructions that are feed into the ALU.



That was wat I was going to ask. That is what it boils down to. If you define a "core" as a "complete single Pentium and upwards" then yeah sure. Is there even a definite definition of what a core is? Surely it has to be seen as contextual.


As for the architecture, it's not great but it has its upsides fo sho. A modern refined Bulldozer would be great for some specific applications.


----------



## Zubasa (Aug 28, 2019)

ProPain said:


> In the end - AMD is guilty!


Common Law 101 = Any person / company is innocent unless proven guilty.
The court has to rule that they are guilty, not you.
Since that has not happened yet, they are legally not guilty.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> At least when we are talking about x86 CPUs today



There's a good deal of fallacy in this. We talk about modern x86 CPUs today but we also get to say what's a core in all designs of all time ?


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> There's a good deal of fallacy in this. We talk about modern x86 CPUs today but we also get to say what's a core in all designs of all time ?


Core definition does not change. 
Instructions may be (and are) different for other architectures (for example RISC or VLIW) and implementations can vary considerably. 

Relevance of other instruction sets in context of Bulldozer - which is an x86 CPU - is questionable. I mean, academically, sure - we can say Integer Cluster is a CPU with whatever its micro-ops look like as an instruction set. But how would that be useful for an x86 CPU?


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Relevance of other instruction sets in context of Bulldozer - which is an x86 CPU - is questionable. I mean, academically, sure - we can say Integer Cluster is a CPU with whatever its micro-ops look like as an instruction set. But how would that be useful for an x86 CPU?



What do you mean useful or an x86 CPU ? None of these things impact the "x86" portion of it, that would just be the instruction set. When you out a "not a core" stamp on something this is independent from the instruction set.

You want to talk about modern CPUs, why not talk about all CPUs ? That ought to be more relevant. 

Don't selectively pick out this Integer Cluster out of all this and ask if that's useful or not, you need take the whole design and consider whether it's useful or not. And as far as I am concerned it is, it's a way to minimize resources while keeping most of the performance intact.

Academically, you can say this has been settled by the paper that describes this where the authors still consider this arrangement as being made up of cores, cores as in plural. Aren't these things peer reviewed ? Don't you think some one would have pointed out "Hey dumbass this is not a core" if that was the case ? We are talking about people far more knowledgeable on the subject that most of us on here.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> What do you mean useful or an x86 CPU ? None of these things impact the "x86" portion of it, that would just be the instruction set. When you out a "not a core" stamp on something this is independent from the instruction set.


Core is defined via instructions.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Core is defined via instructions.



No it's not. I honestly don't even know how you come up with this, it's unbelievably out of place.


----------



## Kissamies (Aug 28, 2019)

I was always like that FX was like "HT on steroids" instead of the core count advertised. Wasn't that wrong I guess.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> No it's not. I honestly don't even know how you come up with this, it's unbelievably out of place.


How would you define a core?


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> How would you define a core?



By it's internals.

However, by instructions ? How would this work ?

Define me a dual core in terms of instructions, I am actually curios to see how this works.


----------



## FordGT90Concept (Aug 28, 2019)

Here you go:





						x86 instruction listings - Wikipedia
					






					en.wikipedia.org
				




Look at those first ones...ASCII.  Integer clusters don't even know what ASCII is because it has no reason to.  A core (processor) can _process_ overarching concepts like strings.  Integer clusters and floating point clusters are tools the processing core uses to execute its instructions.

Like I said, AMD is trying to sell what was little more than a glorified calculator as a processor.  In AMD's own technical documents, they stressed it's an "integer core" and not a processing core.  That critical nuance was lost in AMD's marketing.


Edit: And this is why Bulldozer sucks: instructions have to be decoded twice: once to figure out ALU and FPU scheduling and again to actually execute it in their respective clusters.  You can't get those extra cycles back.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> However, by instructions ? How would this work ?
> Define me a dual core in terms of instructions, I am actually curios to see how this works.



Actually, Wiki's CPU article starts with pretty much the right thing:


			
				https://en.wikipedia.org/wiki/Central_processing_unit said:
			
		

> A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions.


Dual core and multicore are multiprocessor systems. Each core is a separate CPU in system connected via a bus.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Actually, Wiki's CPU article starts with pretty much the right thing:
> 
> Dual core and multicore are multiprocessor systems. Each core is a separate CPU in system connected via a bus.





It doesn't start with everything, don't try to deflect this nonsense. I didn't ask if dual cores are multiprocessor systems.

How do you define a core by instructions ? What would be the instructions that would make something not a core ?


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> How do you define a core by instructions ?


Ok, let's take a step back. 

Instruction is a specific term, not a generic one. There is a finite set of instructions you can feed to CPU that it is able to process called instruction set.
An example - MOV is a specific instruction in x86 instruction set that moves data from one location to another.
Extremely simple example - a calculator has instruction set of 4: addition, division, multiplication and division (lets assume no memory function or anything). 

CPU is the piece that carries out these instructions.
From first example - x86 CPU needs to be able to do MOV. I will use the example above again - MOV with two registers as operand will use an ALU (part of execution stage and unit) and takes one cycle. MOV with register and a memory location as operand will use an ALU and AGU  and will take a couple of cycles to complete.
Calculator example is simpler - these 4 instructions can be fed into execution stage pretty much directly, minor circuitry for fetch and no need for decode. Output can be fed directly to screen buffer.

When it comes to multicore CPUs, these are still defined based on a CPU and this is done via multiprocessor systems. Multiprocessor systems are simply what the name says - systems with multiple processors that are connected together. For multicore CPUs, the important part of these is mainly the homogeneous (in this context, same ISA) systems where multiple CPUs are in the same system connected together with a single bus. As time went by and technology evolved, the physical implementation has changed to put this onto a single die but both definition and principle are still the same. The only noteworthy addition to terminology is that a CPU in such situation is now called a core.

A note on that calculator example - the implementation of that instruction set is basically an ALU. Computer design course starts with building one pretty early on from gates and sometimes transistors, usually without the division though as that is a bit more complex to do. Although you physically need addition and multiplication - division is slight extra bit to addition circutry.


----------



## Vya Domus (Aug 28, 2019)

None of that has anything to do with your claim that cores can be defined as such.

The instructions set isn't tied to the hardware, instructions can't define hardware, they can't tell you what is a core or CPU and what isn't. Any turing complete computer can be made to carry out any kind of instruction no matter how simple or complex, people have figured this out a 100 years ago.

Point me to any instance in a any book or article that says something along the line of "this thing has a MOV instruction therefore it's a core", whenever this is brought up it's done so from a pure hardware perspective. As far as I am concerned you are literally making all of this up.


----------



## R-T-B (Aug 28, 2019)

RichF said:


> Ridiculous.
> 
> 1) Meritless suit.
> 
> ...



Agreed on all but part 1.  I used to say it was meritless, but the definition of what a processor core is has changed since the 90s, and AMD is paying for that.  Honestly, they should.  I'm just sad so little of it goes to consumers.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> Any turing complete computer can be made to carry out any kind of instruction no matter how simple or complex, people have figured this out a 100 years ago.


True. But at the same time turing completeness has no restrictions on how complex it is to program said things or time it takes for the computer to perform these instructions.


----------



## R-T-B (Aug 28, 2019)

thedukesd1 said:


> When "Bulldozer" was released how many PC CPUs (sold as new) had less FPU cores count compared to integer cores?



A lot of arm cores back then did.  Devil's advocate, not sure we want to call early arm cores "performance."


----------



## ManofGod (Aug 28, 2019)

sutyi said:


> Bulldozer and it's iterations trough out the years had modules with two pipelines in them,  so technically they did sell people said amount of processing cores albeit some of the front end and FP was shared.
> Guess you can view these modules as double wide INT pipelines and call these 4C/8T instead of 8C/8T, but that would not be completely true now would it?
> 
> Did AMD lie about the architecture? Not really. Block diagrams show whats what. G
> ...



The Bulldozer uarch was good, absolutely. However, they never fixed the weaknesses in the uarch, like too slow cache and other such stuff and instead, moved on. Probably best for them, since they never produced a new chipset beyond the 990FX.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> Where are these definitions ? Can you link books, papers anything ?





			http://inspirit.net.in/books/academic/Computer%20Organisation%20and%20Architecture%208e%20by%20William%20Stallings.pdf
		

Just a few excerpts. I would definitely recommend reading more, starting with part 1.


			
				Glossary (Page 741) said:
			
		

> central processing unit (CPU): That portion of a computer that fetches and executes instructions. It consists of an Arithmetic and Logic Unit (ALU), a control unit, and registers. Often simply referred to as a processor





			
				1.2 Structure and function (Page 15) said:
			
		

> Each of these components will be examined in some detail in Part Two. However, for our purposes, the most interesting and in some ways the most complex component is the CPU. Its major structural components are as follows:
> • Control unit: Controls the operation of the CPU and hence the computer
> • Arithmetic and logic unit (ALU): Performs the computer’s data processing functions
> • Registers: Provides storage internal to the CPU
> • CPU interconnection: Some mechanism that provides for communication among the control unit, ALU, and registers





			
				3.2 Computer function (Page 68) said:
			
		

> The processor does the actual work by executing instructions specified in the program. This section provides an overview of the key elements of program execution. In its simplest form, instruction processing consists of two steps: The processor reads (fetches) instructions from memory one at a time and executes each instruction. Program execution consists of repeating the process of instruction fetch and instruction execution. The  instruction execution may involve several operations and depends on the nature of the instruction (see, for example, the lower portion of Figure 2.4).





			
				Chapter 8: Multicore computers (page 685) said:
			
		

> A multicore computer, also known as a chip multiprocessor, combines two or more processors (called cores) on a single piece of silicon (called a die). Typically, each core consists of all of the components of an independent processor, such as registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition to the multiple cores, contemporary multicore chips also include L2 cache and, in some cases, L3 cache.


The architectural organization of cores comes primarily from the multiprocessor systems (type of which a multicore system is). With Zen and talking about memory/core access latency, NUMA is a term to keep in mind.


			
				17.2 Symmetric Multiprocessors (page 632) said:
			
		

> The term SMP refers to a computer hardware architecture and also to the operating system behavior that reflects that architecture. An SMP can be defined as a standalone computer system with the following characteristics:
> 1. There are two or more similar processors of comparable capability.
> 2. These processors share the same main memory and I/O facilities and are interconnected by a bus or other internal connection scheme, such that memory access time is approximately the same for each processor.
> 3. All processors share access to I/O devices, either through the same channels or through different channels that provide paths to the same device.
> ...





			
				17.6 Nonuniform Memory Access (Page 660) said:
			
		

> Nonuniform memory access (NUMA):
> All processors have access to all parts of main memory using loads and stores. The memory access time of a processor differs depending on which region of main memory is accessed. The last statement is true for all processors; however, for different processors, which memory regions are slower and which are faster differ.


----------



## neatfeatguy (Aug 28, 2019)

About every 6 months one of these posts shows up about Bulldozer....


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> http://inspirit.net.in/books/academic/Computer%20Organisation%20and%20Architecture%208e%20by%20William%20Stallings.pdf
> 
> 
> Just a few excerpts. I would definitely recommend reading more, starting with part 1.
> ...



Thanks but I looked really carefully throughout all that and not once was a core described by it's ability to execute instructions. It's always done so under the more generic "processor" label and it's not clear at all how you go from processor to core, are these two really interchangeable and equivalent ? The author doesn't think so :



> Processor: A physical piece of silicon containing one or more cores. *The processor is the computer component that interprets and executes instructions*. If a processor contains multiple cores, it is referred to as a multicore processor.



So the attribute of executing instructions is given to the higher level abstraction that is the processor. I also find it funny you ignored the really interesting definition that was just below the one about the CPU:



> ■ Core: An individual processing unit on a processor chip. *A core may be equivalent in functionality to a CPU on a single-CPU system. Other specialized processing units, such as one optimized for vector and matrix operations, are also referred to as cores.*



"May be equivalent", not a requirement. Also the authors reckons even SIMD like executions units can be called cores, all of these things match AMD's claims. This is getting very murky, this endeavor to prove AMD's cores aren't cores sees no light at the end of the tunnel no matter where you look.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> "May be equivalent" also the authors reckons even SIMD like executions units can be called cores, all of these things match AMD's claims. This is getting very murky, this endeavor to prove AMD's cores aren't cores has no light at the end of tunnel no matter where you look.


This is AMD's heterogenous computing initiative (read: APUs) that matches this description. Any type of processing unit still works the same way, processing instructions.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> This is AMD's heterogenous computing initiative (read: APUs) that matches this description. Any type of processing unit still works the same way, processing instructions.



As far as I am concerned this is some independent author's definition irrespective of AMD's initiatives. So this is meant to apply to any type of processor not just AMD's APUs.

So what do we do after all of this, what is a core at the end of the day ? The material out there is in accordance with AMD's claims, or rather, it doesn't contradict them in any way if you prefer it that way.

Why would I consider that AMD settled because they thought they couldn't win ? I seriously doubt that, the plaintiffs could never prove this, there is simply no way to construct a counter argument that makes sense and which doesn't trump a million other definitions as well. They know this and that's why they accepted the settlement too.

They settled because this is actually a really small amount of money that they need to pay under obscure terms and this way they don't look like a big evil corporation squashing some displeased customers.


----------



## londiste (Aug 28, 2019)

So, do you think GTX970 4GB scandal was bullshit as well? They did deliver 4GB as promised, after all.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> So, do you think GTX970 4GB scandal was bullshit as well? They did deliver 4GB as promised, after all.



Don't try to get get this subject astray, that had nothing to do with cores or changing the definition of things.

Nvidia provided incorrect specifications about one of their products and not only that but there was no way you could verify this easily. AMD didn't, even if you thought their modules didn't actually contain two cores you could simply go and look at the diagrams and information that they released about their micro-architecture.

Nothing was hidden about what was going on under the hood of their products as was in Nvidia's case.


----------



## londiste (Aug 28, 2019)

Given how eager you are to try to twist things and find loopholes to nitpick, I was just curious if you could tell me why Nvidia decided to settle that one or why they shouldn't have.

But ontopic - How do you define a core?


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> How do you define a core?



Id say a core is one of those two things AMD put in a module. The book you provided made that pretty clear if you want to stick by it.


----------



## HD64G (Aug 28, 2019)

londiste said:


> So, do you think GTX970 4GB scandal was bullshit as well? They did deliver 4GB as promised, after all.


It wasn't DDR5 all of it as it was written in the specs...

Whereas AMD had the 4-modules with 2 cores each that shared some resources and that was clear from the start. No matter what we use to call cores, it has 8 of those. Might be weaker in some calculations and better in others (it was a server focused arch afterall not helped by the manufacturing process of 32nm vs the 22nm used by Intel back then) but it wasn't a lie. Those arch's shortbacks were clearly shown in the day-1 reviews' results where the previous gen AMD CPUs were almost equal in some tasks and fell back in others. In conclusion, AMD didn't gain market share or sales from the 8-core label on the box.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> why Nvidia decided to settle that one or why they shouldn't have.



Sure, they wrote that their product had something which it didn't and no material that they provided could have indicated that it didn't.

In other words, they were falsely advertising something and actively hiding information. There was no way that they could have denied this, if I remember correctly they even admitted they did a "mistake". That's why they settled.

But these two cases aren't alike, they deal with different matter. You brought this up because I think you like to twist and nitpick at things.


----------



## londiste (Aug 28, 2019)

HD64G said:


> It wasn't DDR5 all of it as it was written in the specs...


It was. GTX970 had 4GB of DDR5.


Vya Domus said:


> Sure, they wrote that their product had something which it didn't and no material that they provided could have indicated that it didn't.
> 
> In other words, they were falsely advertising something and actively hiding information. There was no way that they could have denied this, if I remember correctly they even admitted they did a "mistake". That's why they settled.


It wasn't memory. That they could have gotten away with.
It was ROPs and L2. Whether their admitted 'mistake' was a mistake is arguable but I am pretty sure it wasn't. With correct specs the memory problem would have been found much quicker 



HD64G said:


> Whereas AMD had the 4-modules with 2 cores each that shared some resources and that was clear from the start. No matter what we use to call cores, it has 8 of those. Might be weaker in some calculations and better in others (it was a server focused arch afterall not helped by the manufacturing process of 32nm vs the 22nm used by Intel back then) but it wasn't a lie. Those arch's shortbacks were clearly shown in the day-1 reviews' results where the previous gen AMD CPUs were almost equal in some tasks and fell back in others. In conclusion, AMD didn't gain market share or sales from the 8-core label on the box.


Not some resources. Most of them. This is why definition of a core matters.

Given a sufficiently parallelized workload a perfectly valid expectation for 8-core CPU is 8 cores to perform 8 times better than 1 core (perhaps 7.9 times). With Bulldozer, that is not the case. It will perform about 80% of that, sometimes worse (with FPU load).


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Given a sufficiently parallelized workload a perfectly valid expectation for 8-core CPU is 8 cores to perform 8 times better than 1 core



No, it isn't. You can get widely different results and even witness a regression in performance, despite the fact that a task can be parallelized.

parallel =/= faster (not always faster)

more cores =/= faster parallel execution (definitely not always faster)

That's what your average Joe might think , that 8 means times 8 performance. But if he feels cheated by AMD he should feel cheated by any CPU manufacturer that does multi-core CPU because that's the case for all of them.

But it always gets down to the same complaint in the end, that AMD's CPU wasn't as fast as others. In the end this core debate garbage isn't even relevant to most people. I am sure as hell they wouldn't have given a damn if AMD didn't have real cores but they were 10 times faster.


----------



## londiste (Aug 28, 2019)

Post your Cinebench R23 Score
					

Download Cinebench R23  Benchmarks FFXV Benchmark   Cinebench R23 "Multi" Scores are clickable, leading to the original post.  NameCPUCoreClockSingleMultiCooling mirrormaxEPYC 7742 (x2)128C/256T@ 3160 MHz 100981 cbAir nepuEPYC 770264C/128T@ 2499 MHz 48844 cbAir Bret WeeksRyzen Threadripper...




					www.techpowerup.com
				



Going through the list looking for x-core/x-thread CPUs with single core results and OC (to be kind of sure sc runs at the same clock):
i5 8600K - 6C/6T @ 5282 MHz - 557 cb - 3276 cb - 5.88x (98%)
FX 8370 - 8C/8T @ 5080 MHz - 263 cb - 1719 cb - 6.54x (82%)
i3 8350K - 3C/3T @ 4988 MHz - 525 cb - 1537 cb - 2.93x (98%)
FX-8320 - 8C/8T @ 3813 MHz - 195 cp - 1264 cb - 6.48x (81%)

FX-83x0 is Piledriver with minor improvements over Bulldozer.


----------



## Vya Domus (Aug 28, 2019)

That's cool listing of results for Cinebench, I don't know what you are trying to prove though.

Not everything is Cinebench and let's just say that the user that was expecting perfect scaling and felt lied to wasn't probably thinking about running just Cinebench.

There is a ton of software that hits a dead end after a certain point in terms of multithreading, as a matter of fact that's generally the rule not the exception.


----------



## londiste (Aug 28, 2019)

Cinebench is an example of a benchmark where CPU should scale perfectly to cores. It does and always has.
For some reason, it does not work like that on Bulldozer (or Piledriver).

20%-ish performance deficiency is noteworthy, isn't it?
This would be an awesome result for SMT and is quite noteworthy for CMT but is very worrying for independent cores.


----------



## Mistral (Aug 28, 2019)

Cheaper to pay up than to drag this through court and win, I guess...


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> 20%-ish performance deficiency is noteworthy, isn't it?



Not really, I don't find it earth shattering. I suppose you can call anything nonzero noteworthy.



londiste said:


> For some reason, it does not work like that on Bulldozer (or Piledriver).



For some reasons it works in the other way around too for other CPUs were you get higher than the number of cores scaling. What about that ? Do those have extra cores and the manufacturers forgot to tell us ?

Or is it that software scales in all sorts of ways irrespective of how many cores there are ?


----------



## londiste (Aug 28, 2019)

Edit:
I get it, you are just trying to be a contrarian. OK.



Vya Domus said:


> For some reasons it works in the other way around too for other CPUs were you get higher than the number of cores scaling. What about that ? Do those have extra cores and the manufacturers forgot to tell us ?


Examples?


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Examples?




*i9 9980XE**18C/36T**@ 5200 MHz**525 cb**11687 cb*

22x scaling. This one does have SMT but hey, 18 cores are 18 cores. Your time x cores math is a valid expectation, according to you. Cores are cores and everything is set in stone. 

Also, you are aware that in your amazing comparison of Piledriver with other CPUs that those newer Intel processors have much more aggressive single core and multi core turbos, right ?

Check this out :


*
 i5 6500**4C/4T**@ 3200 MHz**374 cb**1326 cb*

1326 / 374 = 3.54x , 88.5% , interesting. Not quite 100% but much closer to 80% the deficiency goes down to about 6%, noteworthy not noteworthy what do you say. Is this really a quad core I wonder ?



londiste said:


> I get it, you are just trying to be a contrarian. OK.



And you replying to every comment of mine for the past couple of pages are ... not ... right ?


----------



## jmcosta (Aug 28, 2019)

I always called the full bulldozer a quad core simply because it was outperformed by a intel quad core with HT, even in many applications that loved multithreading an I5 3570k did.


----------



## danbert2000 (Aug 28, 2019)

I don't think anyone is very surprised by this. AMD could have changed its marketing to 4 cores, SMT+, or 4 FP core, 8 integer core. They didn't because they desperately needed some reason for consumers to buy their crappy CPUs over the reigning Intel 4 core 8 thread champ. By selling "8 cores" they were suggesting that they had twice the processing power of an Intel processor, or at least 50% more since Intel was using HT. The fact that AMD settled means that they were not sure they could win the case. And that's about as close to an admission of guilt than anything else you'll get out of a large company nowadays.

Let's bury the hatchet. AMD clearly figured out SMT with Zen. They took their lumps and now can truthfully say they have many 8 core processors. But there's not much argument any more, the FX line was a lie to try to help AMD limp past their technical disadvantage.


----------



## londiste (Aug 28, 2019)

Vya Domus said:


> *i9 9980XE**18C/36T**@ 5200 MHz**525 cb**11687 cb*
> 22x scaling. This one does have SMT but hey, 18 cores are 18 cores. Your time x cores math is a valid expectation, according to you. Cores are cores and everything is set in stone.
> Also, you are aware that in your amazing comparison of Piledriver with other CPUs that those newer Intel processors have much more aggressive single core and multi core turbos, right ?
> 
> ...


9980XE is 18 cores plus HT. 22.26x - 124%. 24% boost is accurate enough for HT.
i5 6500 runs at 3200MHz all core and 3600MHz single core (12.5% faster). Take that into account and it scales perfectly - 3.99x - effectively 100%.

SMT and similar do increase performance over 100% per core.
Intel HT gives a 25-30% extra, Zen/Zen2 SMT gives 30-35%.
Bulldozer's CMT tends to give 60-65% 



Vya Domus said:


> And you replying to every comment of mine for the past couple of pages are ... not ... right ?


I don't know. Hoping some meaningful discussion comes out of this somehow? 
Or at least we can learn something.


----------



## Jism (Aug 28, 2019)

Was'nt the opteron 6x00 series based on 2 glued FX or Bulldozer chips? How about those? If there is or was a class action lawsuit, you'd say that they would chase the server area as well?

Really i dont understand the fuss from people. Yes it's a 4 core 8 thread CPU or a different approach to having SMT or some shit. It might not perform as it did but it did it's task. It was sufficient for the most tasks and esp hardcore rendering. Looking at the price point you coud'nt complain either. It was one of the CPU's that allowed them to OC all the way up to 5Ghz ~ 5.2Ghz. They where alot of fun to play with.

These days Intel locks their CPU's and AMD already pushes for max clocks using XFR. There is'nt that much fun in CPU world these days.


----------



## seronx (Aug 28, 2019)

Vya Domus said:


> I have never in my life seen a core being labeled as the block that gets to execute instructions, that's usually simply called the ALU or execution unit. And the control portion gets to fetch and decode the instructions that are fed into the ALU.


The control unit is within the core.  Bulldozer's core design is decoupled from the front-end, FPU execution, and the back-end through the L2.

The front-end doesn't decide that thread A can run on core B, vice versa.  The cores decide which thread runs on them and its hardlocked.

The scheduler unit, the 40-entry unified scheduler decodes macro-ops from the retire queue which can only be for one thread.  Into, micro-ops to be executed within the superscalar datapath, which results can be stored into memory via the data bus.

None of this is controlled by the supposed shared front-end, all of this is handled purely by the core.  Of which there are two cores in Bulldozer.


----------



## biffzinker (Aug 28, 2019)

jmcosta said:


> it was outperformed by a intel quad core with HT


You forgot to mention the out going Phenom II was outperforming the FX as well. When the prior CPU is out scoring the new hotness that's going to get noticed. At the time software wasn't ready for the CMT approach AMD took, some optimization work would of improved the benchmark scores.

Edit: At the time I had a Phenom II x4 960T that unlocked to five cores. I was looking to replace the Phenom II but reviews for the new FX 8150 showed my slight overclock of 3.7 GHz Phenom II x5 was faster.
Later confirmed in some benchmarks I tested. I ended up moving to a Intel Xeon 1240 V2.


----------



## londiste (Aug 28, 2019)

seronx said:


> The control unit is within the core.  Bulldozer's core design is decoupled from the front-end, FPU execution, and the back-end through the L2.
> The front-end doesn't decide that thread A can run on core B, vice versa.  The cores decide which thread runs on them and its hardlocked.
> The scheduler unit, the 40-entry unified scheduler decodes macro-ops from the retire queue which can only be for one thread.  Into, micro-ops to be executed within the superscalar datapath, which results can be stored into memory via the data bus.
> None of this is controlled by the supposed shared front-end, all of this is handled purely by the core.  Of which there are two cores in Bulldozer.





			https://upload.wikimedia.org/wikipedia/commons/b/b0/AMD_Bulldozer_block_diagram_%28CPU_core_block%29.png
		

Fetch, decode and dispatch are frontend.


----------



## seronx (Aug 28, 2019)

londiste said:


> Fetch, decode and dispatch are frontend.


Nope, native fetch is done at the retire, native decode/dispatch is done at the scheduler.


----------



## londiste (Aug 28, 2019)

What are you talking about? Native?
Did you look at the linked diagram?


----------



## seronx (Aug 28, 2019)

londiste said:


> What are you talking about? Native?
> Did you look at the linked diagram?


The core in every processor architecture is the one that executes native instructions.  Everything else is an heterogeneous accelerator.

Branches don't execute in the front-end, they execute in the cores.
Instructions are fetched and retired by the cores, not the front-end.
etc.

Whether the front-end was VMT2 with absolute competitive sharing or if it was Cluster-based multithreading with no sharing.  Would not determine the amount of cores in the module.


----------



## londiste (Aug 28, 2019)

seronx said:


> The core in every processor architecture is the one that executes native instructions.  Everything else is an heterogeneous accelerator.


OK, cool. What is the native instruction set for Bulldozer?


seronx said:


> Branches don't execute in the front-end, they execute in the cores.
> Instructions are fetched and retired by the cores, not the front-end.
> etc.


Branches? 
Front end is part of core. 
Fetch is at the very top of the diagram. 
Retire... is not quite what you seem to think it is.


----------



## seronx (Aug 28, 2019)

londiste said:


> OK, cool. What is the native instruction set for Bulldozer?


RISC AMD64, of which control, general-purpose, integer is executed by the cores and FPU is executed in the FPU, etc.





londiste said:


> Front end is part of core.


Front-end is an optional unit.  It can have seperate designs and it can be totally removed if the application doesn't need to be x86, x86-64 compatible.

FE, L2, FPU => optional units.


----------



## londiste (Aug 28, 2019)

Interesting.
Tell me, how many cores are there in a Zen CCX?
Also, is FPU a core or not?


----------



## seronx (Aug 28, 2019)

londiste said:


> Tell me, how many cores are there in a Zen CCX?








A single Zen processor has a single core, there eight Zen processors which are eight cores.  In a CCX there is four cores, thus four single-core monolithic processors.




However, a single Bulldozer processor has two cores, thus four Bulldozer processors are also eight cores.

Stoney, single dual-core monolithic processors => two cores
Carrizo, two dual-core monolithic processors => four cores
Kaveri, two dual-core monolithic processors => four cores
Trinty, two dual-core monolithic processors => four cores
Orochi, four dual-core monolithic processors => eight cores


----------



## londiste (Aug 28, 2019)

What defines a processor? A core?
Is there an OS (or software) that can run on RISC AMD64? Can such software be written?

Doesn't monolithic normally refer to a die?


----------



## seronx (Aug 28, 2019)

londiste said:


> What defines a processor? A core?


The industry does... FE/LSU/FPU/L2 doesn't make a core, only the mid_core makes the core.


londiste said:


> Is there an OS (or software) that can run on RISC AMD64? Can such software be written?


This is not a concern.


----------



## londiste (Aug 28, 2019)

Is ALU enough to be a core?


----------



## seronx (Aug 28, 2019)

londiste said:


> Is ALU enough to be a core?


It needs a control unit, instruction bus, data bus as well.

Instruction bus => Retire queue
Control unit => Scheduler
Data bus => Load/store unit
Datapath => EX0/EX1/AGLU0/AGLU1
^-- easily defined a core.


----------



## Steevo (Aug 28, 2019)

About as smart as the suit against HDD manufacturing difference in data size, raw and formatted, sector size and much else the average consumer doesn't understand twisted by lawyers looking for a paycheck.


----------



## londiste (Aug 28, 2019)

seronx said:


> It needs a control unit, instruction bus, data bus as well.


What functionality does it need in the control unit? 
When it comes to buses, it needs access to these and not much more, right?


----------



## seronx (Aug 28, 2019)

londiste said:


> What functionality does it need in the control unit?


The control unit interconnects the ALUs, AGUs, data buses, and manages the execution of instructions.  It also executes the branches, thread control if SMT, etc.


londiste said:


> When it comes to buses, it needs access to these and not much more, right?


It already has access to all buses in production or else the second core wouldn't work.


----------



## Totally (Aug 28, 2019)

FordGT90Concept said:


> This post...
> 
> …totally called it, especially that last sentence.
> 
> AMD could never win this argument without changing over a decade of precedents including by competitors like Sun, ARM, IBM, Intel, and even themselves (Athlon 64 X2).  "Integer core" got lost in marketing translation to become something it isn't, a "core."  If AMD accurately advertised the product as having "8 integer cores" this lawsuit would have never been filed.









When people don't understand, they are going are to see what they want to see regardless if the proof is right there in front of them. Also if you slice the second image horizontally  across and swap the bottom and top then rotate the image 90 degrees it looks an awfully a lot like the first.


----------



## Vya Domus (Aug 28, 2019)

londiste said:


> Is ALU enough to be a core?



While AMD's design is certainly controversial to some, more Interestingly Nvidia is getting much more liberal with these definitions. They reckon that yes, ALUs and FPUs are considered a core, hence their famous CUDA core counts that are in the thousands. We know for sure those can't be seen as a CPU core equivalents at the very least, they are not magicians they can't fit thousands of independent cores on a single die.

The secret sauce is that these are grouped in SMs which subsequently execute instructions , more specifically just one instruction, in groups of 32 because that's the only thing the control unit can do. It can't issue instructions to these cores independently because they can't do the fetch and decode or anything that has to do with control signals on their own. AMD does the same with their GPUs but at the very least they call them by the more generic label of "processors".

If anyone should be crucified for ambiguous and misleading labels, it should be Nvidia.


----------



## londiste (Aug 28, 2019)

AMD definitely did try calling CUs compute cores when bringing in APUs and HSA initiative.


----------



## biffzinker (Aug 28, 2019)

Steevo said:


> understand twisted by lawyers looking for a paycheck.


Who's going to turn down the payout of up to 3.6 million out of the 12.1 million set aside by AMD?


----------



## Vya Domus (Aug 28, 2019)

Totally said:


> When people don't understand, they are going are to see what they want to see regardless if the proof is right there in front of them.



Indeed, at the end of the day that's the biggest issue here.

Big square blocks that are arranged symmetrically, preferably with a bright red border, that's the only definition that they'll accept for a core because that's the extent to which their understanding goes.


----------



## londiste (Aug 28, 2019)

Well, seronx is of course completely correct in what constitutes a core.
The problem with applying this to Bulldozer is that the resulting core is not exactly useful to us. AMD64 RISC is not exposed in any way. X86 goes out the window the moment you remove frontend.


----------



## Chrispy_ (Aug 29, 2019)

Bulldozer was the perfect example of sunk-cost fallacy.

$12.1M is small change that AMD will gladly give to put dubious advertising behind them. If only such a lawsuit were applied to Donald Trump or Brexit....


----------



## seronx (Aug 29, 2019)

londiste said:


> X86 goes out the window the moment you remove frontend.


The benefit is that it is defined as an accelerator.  Just like upgrading a GPU(part of the system, but not part of the CPU) they can improve the front-end.  Modified without modifying the cores.

The front-end for a monolithic dual-core can be statically partitioned between cores in multi-core fashion(CMP2).
The front-end for a monolithic dual-core can be competitively shared between cores in vertical multithreaded fashion(VMT2).
The front-end for a monolithic dual-core can be algorithmic-priority partitioned between cores in simultaneous multithreaded fashion(SMT2).
The front-end for a monolithic dual-core can be competitively partitioned between cores in clustered multithreaded fashion(CMT2).

Even in VMT just because it fetches/decodes/dispatches for a single core, doesn't mean there aren't two cores.  As the FE isn't actually a defining feature of a core, it's an optional feature.  They could always skip the front-end and directly interconnect to the cores themselves.

---
Including the FPU, one would have to prove against a similar design that they made a FPU that is only optimized for single-core usage.

Husky per core; 1x 128-bit add + 1x128-bit mul + 1x128-bit fmisc // 84-entry flight window + 42-entry FPU scheduler + 120-entry PRF <== 32-nm node
Bobcat per core; 1x 64-bit add + 1x 64-bit mul // 40-entry flight window + 18-entry FPU scheduler + 88-entry PRF <== 40-nm node
Bulldozer per dual-core; 2x 128-bit Fused-multiply add, 2x 128-bit Packed Integer vALUs // 2*128-entry flight window + 64-entry FPU scheduler + 160-entry PRF <== 32-nm node
Zen per core; 2x128-bit FMUL, 2x128-bit FADD // 192-entry flight window + 36-entry FPU scheduler + 160-entry PRF <== 14-nm node

Clearly, that is not the case.

40nm => 160nm CPP/120nm Mx, 130nm My, and within Intel's 32nm league with density, thus within spitting of GloFo's 32nm.
32nm => 130nm CPP/104nm Mx, two cores Husky and Bulldozer.
14nm => 78nm CPP/64nm Mx, basically two full nodes from 32nm.


----------



## FordGT90Concept (Aug 29, 2019)

londiste said:


> Is ALU enough to be a core?





seronx said:


> It needs a control unit, instruction bus, data bus as well.


Have a look here:





"control unit" = "Core IF"
There's only four of those, because there's only four cores.



Totally said:


> View attachment 130309View attachment 130310
> 
> When people don't understand, they are going are to see what they want to see regardless if the proof is right there in front of them. Also if you slice the second image horizontally  across and swap the bottom and top then rotate the image 90 degrees it looks an awfully a lot like the first.


Except your lines are completely wrong on Bulldozer (but right on the 8-core Xeon).  You just evidenced that they aren't cores because, as Feng said, "core replication is obvious."




...I'm not even sure that picture is 100% accurate...


AMD settled.








						settlement
					

Definition of settlement in the Legal Dictionary by The Free Dictionary




					legal-dictionary.thefreedictionary.com
				





> In civil lawsuits, settlement is an alternative to pursuing litigation through trial. Typically, it occurs when the defendant agrees to some or all of the plaintiff's claims and decides not to fight the matter in court.


Integer clusters are not cores and the plaintiffs were, in fact, mislead by AMD's claims to the contrary.  If AMD tries to create a CMT architecture in the future, this case will be used as precedent to again declare that integer clusters are not cores.


----------



## eidairaman1 (Aug 29, 2019)

Idk, Piledriver is pretty good today vs when it first appeared.

Ohwell, what I have still does what it does. Fx 8350.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> "control unit" = "Core IF"
> There's only four of those, because there's only four cores.


The control unit is within this image;





On a native octo-core, there eight of them on the die.  On AMD's Husky this unit is called the Instruction Control Unit.  Even though it is renamed in this image it still is the control unit.


----------



## FordGT90Concept (Aug 29, 2019)

That doesn't look like Bulldozer, how is it relevant?


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> That doesn't look like Bulldozer, how is it relevant?


That is a Bulldozer core. Of which, there are two in a module.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> AMD settled.
> 
> 
> 
> ...


ok they "settled" ... just to get rid of this ... but that doesn't make the thing right ...

i.e.:  a friend has an argument i don't agree and is very stubborn with it, (in that case, that was also about core count on something but i can't remember what it was  ah yes ... it was on core count of a Pentium 4 HT back in the days ) i tell him "ok ok, you are right, in your own perception, ok i pay you a beer so we can settle this" so he can stop rambling about it and trying to prove his point.

4 dual core module which each core on each modules share the same FPU/Scheduler is, 4x2=8, an octacore (if it was a quadcore ... it wouldn't beat a Intel Equivalent quadcore on certain heavily threaded application and not only by a 2more core margin same for the hexacore FX 6XXX which had 3 dual core modules .)

they paid so it can shut up.


other than that, you are right, wanna grab a beer?


----------



## RichF (Aug 29, 2019)

londiste said:


> It was. GTX970 had 4GB of DDR5.





HD64G said:


> It wasn't DDR5 all of it as it was written in the specs...


The performance was so poor that it didn't even come close to qualifying as DDR5. So, simpleminded appraisals of the situation fail.

By contrast, despite the weakness of the individual cores in Bulldozer/Piledriver, 8 of its cores were faster than 6 or 4 or 2 or 1.

Apples and oranges comparison. The 512 MB partition of the 970 was completely unacceptable in its extreme slowness. It was a clear case of fraudulent marketing.

"Half-truths" are not truths. When the performance is as bad as it was for that partition it doesn't qualify in anyone's book for DDR5-class.


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> That is a Bulldozer core. Of which, there are two in a module.


The layout is completely wrong.  On that note, I found a document where AMD themselves laid out the core structure:


			http://pds.ucdenver.edu/document/hardware/AMDbulldozer-IEEE-Computer-2011.pdf
		





There's also a contradiction in this document in the intro:


> It combines two independent cores intended to deliver high per-thread throughput with improved area and power efficiency. A monolithic building block, the Bulldozer module can execute two threads via a combination of shared and dedicated resources.


Those are antonyms.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> The layout is completely wrong.  On that note, I found a document where AMD themselves laid out the core structure:
> 
> 
> http://pds.ucdenver.edu/document/hardware/AMDbulldozer-IEEE-Computer-2011.pdf
> ...


good ... 2 core 1 fp 1 scheduler indeed.


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> ok they "settled" ... just to get rid of this ... but that doesn't make the thing right …


You don't settle if you the facts are on your side.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> You don't settle if you the facts are on your side.


it's not about fact but about perception as i mentioned in the example i used ... an INT/LS (EX/LS) is a core ... but some don't view it as one ... so it isn't a core?

as i said ... wanna grab a beer? you are right.

additionally ... not everythings ruled by a court is 100% right ...


----------



## RichF (Aug 29, 2019)

FordGT90Concept said:


> You don't settle if you the facts are on your side.


The fact is that the court system isn't about objective truth. It's about optics.

(This, incidentally, also works in the favor of corporate executives. Insiders on MSNBC, for example, stated — without a hint of concern over the morality of the policy — that the Justice Department has a strong "unofficial" policy of going out of its way to not charge corporate executives with crimes — instead resorting to fines. The argument the insiders presented, which is utterly specious, is that the policy is better for the little people — the workers at the corporations — because, according to the broken logic, fines won't put companies out of business but, somehow, sending executive-class criminals to jail will. This illogic requires the absurd belief that those jobs can't be filled by others. It also ignores the apparent fact that those fines are more likely to come out of the compensation package/jobs of the lower-level workers than they are likely to dent the golden parachutes of the CEOs and such.)


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> additionally ... not everythings ruled by a court is 100% right ...





RichF said:


> The fact is that the court system isn't about objective truth. It's about optics.


Courts aren't involved in settlements other than filings.  Settlements are private contracts.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> Courts aren't involved in settlements.


same kind same kind ... settlements is the end of the path and well ... whatever ... sooooo, about that beer ...?


----------



## RichF (Aug 29, 2019)

FordGT90Concept said:


> Courts aren't involved in settlements other than filings.  Settlements are private contracts.


The article I linked to said _a judge_ has to approve the settlement.

"Fine and forget" is the standard operating procedure for the transfer of wealth into the upper crust these days. It gives the system the illusion of accountability.


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> same kind same kind ... settlements is the end of the path and well ... whatever ... sooooo about that beer ...?


I don't do alcohol. 


There's something about Bulldozer that's unique to its first iteration of design that was never done before it:



Instruction decode is 1:2 instead of 1:1.  This proves dependence because without an independent means to decode instructions, the integer clusters cannot operate independently.

AMD proved this was a design flaw because in Steamroller, they decided to not share instruction decode...




...they wouldn't have done that if their original argument was correct.  They decided more independency is better, affirming the plaintiffs argument that AMD misrepresented their product.


----------



## 64K (Aug 29, 2019)

The best thing AMD could have done was to settle for this trifle of money (12.1 million dollars). Even if they believed they had a good defense and were lawyered up to the teeth. Most lawyers will tell you that when it comes to a jury they are unpredictable. AMD might have won the case at trial but they may have lost in a much bigger way. You never know with a jury. I think in a civil case only 9 out of the 12 jurors have to agree on a verdict. They might have found for the plaintiff and awarded full compensation for the cost of the Bulldozer CPUs to the plaintiffs. Now were up to around 80 to 100 million dollars. Additionally, I think, they might have even awarded punitive damages.


----------



## RichF (Aug 29, 2019)

FordGT90Concept said:


> AMD proved this was a design flaw because in Steamroller, they decided to not share instruction decode...
> 
> ...they wouldn't have done that if their original argument was correct.


Steamroller did not replace Piledriver. It was originally supposed to be designed for the high-performance bracket but AMD changed course and made Steamroller a weaker product to fit into the niche of reduced power consumption and production cost. That is why it never came in 8 cores and it was made on the inferior 28nm node. 

Jaguar also had design compromises to fill its reduced power consumption and reduced production cost niche. It is hardly a repudiation of Bulldozer/Piledriver either as it has worse IPC than even Bulldozer as far as I know.

Piledriver was the direct replacement for Bulldozer and it was never replaced until Zen 1.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> AMD proved this was a design flaw because in Steamroller, they decided to not share instruction decode...
> ...they wouldn't have done that if their original argument was correct.  They decided more independency is better, affirming the plaintiffs argument that AMD misrepresented their product.


Um, that(those) instruction decode(s) isn't part of the core.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> I don't do alcohol.
> 
> 
> There's something about Bulldozer that's unique to its first iteration of design that was never done before it:
> ...


are all beer alcoholic? pffff ... not fun ... (plus it's one of the most wholesome drink in the whole world ... drink responsibly and alcohol is not an issue ... ) "point of view"

still 8 core on 4x2core module.

not a fact, a point of view on what is a INT/LS (EX/LS) (hint: a core...)

steamroller has the same INT/LS (EX/LS) pair of core per module ... they just splitted the decode in 2 soooooo "2 INT/LS (EX/LS) 1 decode" is a single core and "2 INT/LS (EX/LS) 2 decode" is a dual core .... sooooo the core are defined by the decode unit? (hint they are not, that class action lawsuit was only a mean to cash on the fact that BD was slower than intel ... although on certain heavily threaded applications ... they weren't but those who use that wouldn't fill a class action lawsuit ... because it only really mattered in gaming performance ... thus: pissing in the wind)

soooo how about that non alcoholic beverage of your choice?


----------



## 1d10t (Aug 29, 2019)

If I remembered correctly, IBM PowerPC also use same methods in earlier day of their multi threads implementation.
One question remains, how these court filling applied to, will it's applied to all Bulldozer uArch and derivatives?


----------



## RichF (Aug 29, 2019)

1d10t said:


> One question remains, how these court filling applied to, will it's applied to all Bulldozer uArch and derivatives?


As I noted in the post on the first page, the article I linked to said it doesn't cover all of the 8 core Bulldozer/Piledriver parts and only applies to purchases in California. I do not know if the settlement insulates AMD from lawsuits applying to customers from other states or not.

If the article I linked to is accurate you can see which parts are included.


----------



## FordGT90Concept (Aug 29, 2019)

RichF said:


> The article I linked to said _a judge_ has to approve the settlement.


Yes, because only a judge can close a case.  Plaintiff files a case -> plaintiff and defendant present evidence/make case -> either goes to trial or settles -> judge closes case (serves as a witness to the settlement).  Unless it goes to trial, the court just enforces procedure.



RichF said:


> Steamroller did not replace Piledriver. It was originally supposed to be designed for the high-performance bracket but AMD changed course and made Steamroller a weaker product to fit into the niche of reduced power consumption and production cost. That is why it never came in 8 cores and it was made on the inferior 28nm node.
> 
> Piledriver was the direct replacement for Bulldozer and it was never replaced until Zen 1.


Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.


----------



## RichF (Aug 29, 2019)

FordGT90Concept said:


> Yes, because only a judge can close a case.  Plaintiff files a case -> plaintiff and defendant present evidence/make case -> either goes to trial or settles -> judge closes case (serves as a witness to the settlement).  Unless it goes to trial, the court just enforces procedure.
> 
> Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.


Neither of those responses are effective rebuttals. I'm disengaging from you in this topic from this point forward.


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> Um, that(those) instruction decode(s) isn't part of the core.


Every other microprocessor architecture on the market disagrees so, either you're wrong (and AMD too) or everyone else is.



1d10t said:


> One question remains, how these court filling applied to, will it's applied to all Bulldozer uArch and derivatives?


The law firm will have to collect the information of those that want to join the class action settlement and in that, they will specify what products specifically apply to the class action lawsuit.  It might already say in the settlement but...can't be arsed to go digging.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> Just because they never sold it as an 8-threaded product doesn't detract from the fact that they saw the need for a change and did it.


Technically, what they did was make it worse;
-> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
-> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.

How is that a good change?  It is two times slower than the previous generation.
Bulldozer fetches up to 32B every cycle.
Steamroller fetches up to 16B every cycle.
Bulldozer decodes up to 4 macro-instructions every cycle.
Steamroller decodes up to 2 macro-instructions every cycle.


----------



## RichF (Aug 29, 2019)

seronx said:


> Technically, what they did was make it worse;
> -> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
> -> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.
> 
> How is that a good change?  It is two times slower than the previous generation.


The Stilt also said that the 32nm SOI process, particularly once it had matured, offered better characteristics than 28nm for high-performance parts. 28nm bulk was used because it was cheaper to make chips with, not because it was an upgrade.

Obviously, if AMD had decided to follow through with its original intention, it would have made Steamroller in no less than 8 core parts and wouldn't have cut away other things like cache. Steamroller was, obviously (as there was no 8-core part — not even a 6-core part), designed mainly to fit into the roles of reduced power consumption and reduced production cost. The minor IPC improvements from Steamroller and Excavator came at the cost of frequency and core count, both of which trumped the IPC gains in the high-performance realm — particularly when compared with mature-process Piledriver at performance-optimal clock, which is probably around 4.4 GHz. The designs were further hampered by an inferior socket/VRM spec and 28nm process.


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> Technically, what they did was make it worse;
> -> Processor models 00h–1Fh can perform an instruction block fetch every cycle, while model 30h–4Fh processors can perform a block fetch every 2 cycles.
> -> In processor models 00h–1Fh, the decode unit scans two of these windows in a given cycle decoding a maximum of four instructions. In processor models 30–4Fh, the two decode units scan two of these windows every two cycles decoding a maximum of four instructions.
> 
> ...


There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.





						AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
					






					www.anandtech.com
				





> One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:





> Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle.





> Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.


If they were really independent cores in the first place then this change wouldn't matter.  Integer clusters aren't cores.  They never were and they never will be.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.


There was no collision or blocking in Bulldozer, btw.





FordGT90Concept said:


> If they were really independent cores in the first place then this change wouldn't matter.  Integer clusters aren't cores.  They never were and they never will be.


They aren't integer clusters, they are cores.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> There's two decoders per core in Steamroller: the throughput is the same when comparing apples to apples...less the opportunity for collision/blocking because the decoders are independent.
> 
> 
> 
> ...


decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module ...

one more time


GreiverBlade said:


> steamroller has the same INT/LS (EX/LS) pair of core per module ... they just splitted the decode in 2 soooooo "2 INT/LS (EX/LS) 1 decode" is a single core and "2 INT/LS (EX/LS) 2 decode" is a dual core .... sooooo the core are defined by the decode unit? (hint they are not, that class action lawsuit was only a mean to cash on the fact that BD was slower than intel ... although on certain heavily threaded applications ... they weren't but those who use that wouldn't fill a class action lawsuit ... because it only really mattered in gaming performance ... thus: pissing in the wind)



point... of... view...


nooowwww i think we all should stop ... because it's becoming ridiculous, i am right, seronx is right, you are right (ok in a 2:1 ratio about point of view but well can't have the same point of view ... right?)

oh man ... how much i would gladly pay to settle this (and keep my point of view on what define a core.)


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> There was no collision or blocking in Bulldozer, btw.


Then explain why AMD changed it and AnandTech explicitly said it resulted in the "largest performance improvement."


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> Then explain why AMD changed it and AnandTech explicitly said AMD changed it for a large "performance improvement."


There was no large performance improvement from the fetch/decode switch.  It was made to reduce the overall power consumption, less work every cycle means faster clocks.  You talk about Steamroller's supposed front-end improvements, yet ignore the butchered floating-point unit?  Again, a change meant to reduce power consumption, not performance improvement.


----------



## GreiverBlade (Aug 29, 2019)

"pissing in the wind" : "To waste time on a pointless or fruitless task; do something that is ineffective. You can make a complaint if you like, but you'll just be *pissing in the wind*."


----------



## RichF (Aug 29, 2019)

Piledriver was a small modification of Bulldozer that was released not long after its release. The high-performance sector of AMD's processor business remained *frozen* from the time Piledriver was released until Zen 1 replaced it. The only improvement was a minor tightening of leakage due to the maturation of the 32nm SOI node, which resulted in the 8370E. Steamroller and Excavator are basically irrelevant to this discussion.

• Neither were released in 6+ cores.

• Neither were released on a high-performance node.

• Neither were released on a high-performance VRM socket spec.

• Neither were released with high-performance amounts of cache.

Minor improvements to IPC pale in comparison to the lack of cores and frequency in Steamroller and Excavator, except in the niche they targeted where power consumption and cheap production cost were favored over performance.


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module …


A core can't do anything without instruction decoding.  A core is also independent.  The two facts combined are contradictory by AMD's definition of a core but not the industry's definition of a core.  AMD, therefore, was not truthful in advertising.



seronx said:


> There was no large performance improvement from the fetch/decode switch.  It was made to reduce the overall power consumption, less work every cycle means faster clocks.
> 
> You talk about Steamroller's supposed front-end improvements, yet ignore the butchered floating-point unit?


Read the article.  Splitting decode meant adding transistor real estate which naturally means higher power consumption; however, those are also transistors they can shut off when that thread isn't being used.  You know, a feature of independent cores.  Again, this is more proof that the integer clusters aren't cores.  With each iteration, AMD split more and more hardware to make it more closely mimic a dual core, but it never got there.


*An AMD Bulldozer/Piledriver/Steamroller/Excavator "module" is a "core" and AMD concurred by settling.* 'nuff said.


----------



## GreiverBlade (Aug 29, 2019)

ffs... how come unwatching thread doesn't work ... did they lie to me?



FordGT90Concept said:


> *An AMD Bulldozer/Piledriver/Steamroller/Excavator "module" is a "core" and AMD concurred by settling.* 'nuff said.


tho the core of that core... "module" ...  is made of 2 core ... sharing a scheduler and a FP but there isn't 2 core ... riiiiight

as i said


GreiverBlade said:


> decoder don't define core, which are what the INT/LS (EX/LS) unit are ... and there are indeed 2 of them per module ...
> 
> one more time
> 
> ...





GreiverBlade said:


> "pissing in the wind" : "To waste time on a pointless or fruitless task; do something that is ineffective. You can make a complaint if you like, but you'll just be *pissing in the wind*."


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> Read the article.  Splitting decode meant adding transistor real estate which naturally means higher power consumption; however, those are also transistors they can shut off when that thread isn't being used.  You know, a feature of independent cores.  Again, this is more proof that the integer clusters aren't cores.  With each iteration, AMD split more and more hardware to make it more closely mimic a dual core, but it never got there.


That is not proof.  I have already talked about the split decode, it wasn't a performance enhancement but a power enhancement.  A core doesn't get four macro-ops every cycle, it only gets two macro-ops every cycle with Steamroller.  If the second core is second class in Bulldozer, it is definitely third class in Steamroller.

A decode is not a feature of independent cores.  The decode is independent of the cores.  All of this is antagonistic to your reasoning.  Which is more proof that the replicated parts in Bulldozer are in fact cores.


----------



## 64K (Aug 29, 2019)

GreiverBlade said:


> ffs... how come unwatching thread doesn't work ... did they lie to me?



They've sucked you into this debate and they will never let you leave now.


----------



## londiste (Aug 29, 2019)

This seems to boil down to terminology and expected functionality.

Minimal core is basically an ALU with a couple registers.
Realistically a core does need some control circuitry, this fits to things like Bulldozer parts or GPUs (CU, CUDA Core, EU).
In most literature this gets called execution core as its not all too useful by itself in a big complex processor and is part of execution stage or unit.

When talking about Bulldozer, the question boils down to expected functionality. What exactly should core be able to run?
- If it's what is generally referred to as micro-ops, then most pipes qualify as cores.
- In the way Bulldozer works, I think these were called macro-ops but if it needs some control, integer units qualify as cores. Technically, FP unit could qualify as well. 
- If we want a core to run x86 instructions there really is no way around frontend.
None of these is wrong.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> You don't settle if you the facts are on your side.



So why didn't the plaintiffs go further with this and accepted the settlement?


----------



## Chomiq (Aug 29, 2019)

GreiverBlade said:


> ffs... how come unwatching thread doesn't work ... did they lie to me?
> 
> 
> tho the core of that core... "module" ...  is made of 2 core ... sharing a scheduler and a FP but there isn't 2 core ... riiiiight
> ...


----------



## seronx (Aug 29, 2019)

londiste said:


> - If we want a core to run x86 instructions there really is no way around frontend.


Global Front-end; translates x86 into native instructions.  No physical core has x86 fetch, x86 decode, branch predictors, etc.

However every physical core has a control unit, an instruction bus, a data bus, and the datapath.
Buffer for inflight native bundles, scheduler to control all parts, a datapath to execute instructions, and a way to load and store data.

The global front-end also has the capability of Intel's "Anaphase".  Which it can project a virtual core across all physical cores in the design as it contains a second-level control unit, instruction bus, data bus, but no physical datapaths.

Intel bought the company for cheap, from those that developed Pentium 3.  So, the definition of a core is actually patented by Intel now.
	

	
	
		
		

		
			





Notice what is missing in this physical SMT4 quad-core with a single physical SMT4 core idled? That is right no decode!


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> So why didn't the plaintiffs go further with this and accepted the settlement?


It would only go to trial if AMD choose to fight it.


I was looking through AMD Zen slides and came across this one:




Which reminded me of a diagram from the Hot Chips PDF:




Aren't they strikingly similar? Zen's picture is undeniably a "core" (see the last line on the right). It makes no sense to redefine what a "core" is for Bulldozer when it was well established before and after Bulldozer existed.





Even going all the way back to the original Pentium, instruction decoding was not decoupled from the core (because you'd have a calculator instead of a processor):


			Microprocessor Systems: Understanding the Issues
		







*FX-8350 is a quad-core, eight-thread processor.*  AMD simply choose to add a second integer cluster (and later a decoder) to accelerate the second thread.  There's nothing wrong with that.  What is wrong is that AMD misrepresented their product to the public.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> FX-8350 is a quad-core, eight-thread processor.


You have yet to prove that.


----------



## Proedros (Aug 29, 2019)

This debate woke me up....

I just leave this here....









						AMD lawsuit over false Bulldozer chip marketing is bogus - ExtremeTech
					

AMD is facing a lawsuit over claims that it misrepresented the core counts of its eight-core Bulldozer products, but the lawsuit's technical merit seems extremely weak.




					www.extremetech.com


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> It makes no sense to redefine what a "core" is for Bulldozer when it was well established before and after Bulldozer existed.



We have to continue with this endless limbo but, when was it established and by who ?

All of you naysayers keep repeating this over and over and yet all material out there disagrees with you. It's accepted that a core doesn't need to fetch, decode and execute instructions in it's own and can be something as simple as a SIMD unit. The only level at which fetching and decoding must happen (as in a requirement) is at the "processor" level which may or may not contain multiple cores.

I posed this question many times but I never got a definitive answer, are you telling me that the authors of the conjoined cores paper mislabeled the subject of their research ?

Were cores such as AMD's the norm ? No, but that doesn't they weren't part of this generic classification of "cores". Pointing fingers and saying this block does not look identical to this other block is a really, really primitive way of arguing about this. You are essentially throwing any information that goes more than skin deep out the window.


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> You have yet to prove that.


Yes, I did.  Many references provided.  At no point until AMD pursued CMT was an integer cluster called a "core."  It might be called an "execution core" but that's a huge distinction from a multiprocessor environment where core means independent processor.



Vya Domus said:


> We have to continue with this endless limbo but, when was it established and by who ?


Alan Turing and the Turing Machine which all CPUs mimic.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Alan Turing and the Turing Machine which all CPUs mimic.



That's funny, but we both know you have zero answers for the questions that I posed. I didn't expected that you'd be able to address them anyway, thanks.


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> I posed this question many times but I never got a definitive answer, are you telling me that the authors of the conjoined cores paper mislabeled the subject of their research ?


Here's the paper:


			https://www.microarch.org/micro37/papers/18_Kumar-Conjoined-Core.pdf
		

Look at the title: "*Conjoined-core* Chip Multiprocessing"

If AMD put that on their box this suit wouldn't have happened.



Vya Domus said:


> Were cores such as AMD's the norm ?


Hell no, paper was published in 2004.  There's no record of the idea before that.

AMD's first dual-core debuted in 2005 and so did Intel's.  Bulldozer is the only commercial conjoined-core chip to be sold and it didn't debut until 2011.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> If AMD put that on their box this suit wouldn't have happened.



Maybe, they'd still be cores though.



> The usage of the shared resource can be based on a policy decided either statically, such that it can be accessed only during fixed cycles by a certain core, or the accesses can be determined based on certain dynamic conditions visible to both cores (given adequate propagation time).



Certain core, both cores. Hmm, almost as if there is more than one core. But what do these researchers know with their years of academic experience, us forum dwellers have it figured it out.


----------



## FordGT90Concept (Aug 29, 2019)

Execution cores, not multi-processor cores.  This distinction occurred simultaneously by AMD and Intel in 2005, a year after that technical paper was published.

If the authors were to revise their paper today, they'd be more careful about broadness they use the word "core" to describe things.  On page two, they actually contradict themselves:


> Both the studies conclude that the hybrid design, a chip multiprocessor where the individual cores are SMT, represents a good performance-complexity design point. They do not share resources between cores, however.


First instance of core describes a multiprocessor "core" where the second instance describes execution "core."  CMT shares multiprocessor "core" resources but not execution "cores."  Paper is confusing AF because they use multiple definitions of "core" interchangeably.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Execution cores, not multi-processor cores.



A core is a core, there was never made a distinction in this regard because we'd open another can of worms. But it doesn't even matter.

A processor contains cores -> multiple cores -> multi-processor cores. An execution cores as you want to call it, is still a core alright.


----------



## NC37 (Aug 29, 2019)

RichF said:


> Ridiculous.
> 
> 1) Meritless suit.
> 
> ...



Because Cali is the only state where a suit like this could succeed. The fact that it only covered so few means AMD got off real easy.


----------



## FordGT90Concept (Aug 29, 2019)

Oh look, they did it again:


> *Conjoined-core chip multiprocessing* deviates from a conventional chip multiprocessor design by sharing selected hardware structures between *adjacent cores* to improve processor efﬁciency.


First is multiprocessor core, second is execution core.



Vya Domus said:


> A processor contains cores -> multiple cores -> multi-processor cores. An execution cores as you want to call it, is still a core alright.


Negatory.  Execution cores lack the ability to manage memory and logic.  They're glorified calculators.  When Athlon 64 X2 and Pentium D on the market, they didn't market them as dual-core as in dual-glorified-calculators, they marketed them as dual-core, as in dual-processor, which is a statement of demonstrable fact (in marketing, in performance, and in design).  Further, in operating systems, this change was marked by the driver changing from "uniprocessor" to "multiprocessor."

A modern uniprocessor driver can drive both threads of Bulldozer with no degradation in performance...just like Pentium 4 w/ HT or Zen with one core enabled.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Oh look, they did it again:
> 
> First is multiprocessor core, second is execution core.
> 
> ...



Correct me if I am wrong, was this distinction ever made in this case that was filed against AMD ?


----------



## FordGT90Concept (Aug 29, 2019)

Yes, it was literally the entire point of the lawsuit.


			https://regmedia.co.uk/2019/01/22/amd-core-class-action.pdf
		



> Plaintiffs argue in their complaint that Defendant's Bulldozer products do not contain eight
> “cores” as claimed and advertised.  Id. ¶ 8.  According to Plaintiffs, a “core” is a processing unit
> that is able to operate (e.g., perform calculations and execute instructions) independent from other
> cores positioned on a chip.  Id. ¶ 23–24.


AMD made no attempt in marketing to explain to the public that "8-core" in their branding is "8-execution core." Lie by omission; false advertising.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Yes, it was literally the entire point of the lawsuit.
> 
> 
> https://regmedia.co.uk/2019/01/22/amd-core-class-action.pdf
> ...



This stipulation literally does not exist. I recommend going over the text again, there isn't a single instance where they make the distinction between an execution core and a multiprocessor core. And don't try to say this is supposed to be implied, it's not. 

A "core that is able to operate independent from other cores". Sorry, that's not it, an execution core can also operate independently from other execution cores.


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> This stipulation literally does not exist. I recommend going over the text again, there isn't a single instance where they make the distinction between an execution core and a multiprocessor core. And don't try to say this is supposed to be implied, it's not.


They don't have to because of Pentium D, Athlon 64 X2, Athlon X2, Core 2 Duo, Core 2 Quad, Core I#, Phenom X4, etc.  Every single one of theses processors were marketed as "x core" which meant multiprocessor core ("core replication is obvious").  It wasn't a problem until AMD decided to be dishonest and go back to describing a core as an execution core; hence, the lawsuit.

Bulldozer was the exception, not the rule.  The rule is what the public understands it to be.



Vya Domus said:


> A "core that is able to operate independent from other cores". Sorry, that's not it, an execution core can also operate independently from other execution cores.


Except that AMD itself disagreed with that assessment when it launched Athlon 64 X2: the "cores" were multiprocessor and independent.

AMD can't redefine the word to its advantage: it must be truthful in advertising.


----------



## seronx (Aug 29, 2019)

Vya Domus said:


> Sorry, that's not it, an execution core can also operate independently from other execution cores.


Even with shared resources...











Core replication is obvious it checks out, boys lets pack up and go home.  Sixteen independent cores in this processor.


----------



## FordGT90Concept (Aug 29, 2019)

And what are those diagrams of?  Without extra context, I'd say that's a quad-core, 16-thread chip.


----------



## seronx (Aug 29, 2019)

Sun's unreleased 16-core processor called "Rock".
//
This design enables resource sharing among cores within a cluster, thus reducing the area requirements. All cores in a cluster share an instruction fetch unit (IFU) that includes the level-one (L1) instruction cache. We decided that all four cores in a cluster should share the IFU because it is relatively simple to fetch a large number of instructions per cycle. Thus, four cores can share one IFU in a round-robin fashion while maintaining full fetch bandwidth. Furthermore, the shared instruction cache enables constructive sharing of common code, as is encountered in shared libraries and operating system routines.

Each core cluster contains two L1 data caches (D cache) and two floating-point units (FGU), each of which is shared by a pair of cores, these structures are relatively large, so sharing them provides significant area savings.
//
Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip, Hakan Zeffer, Marc Tremblay - Sun Microsystems
ROCK, SUN’S THIRD-GENERATION CHIP-MULTITHREADING PROCESSOR, CONTAINS 16 HIGH-PERFORMANCE CORES


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> They don't have to



No, they totally have to. "Core" is not enough if you want to make a distinction between a core of some king and another, as you so adamantly want to prove. You don't file lawsuit because about obscure stuff such as this and say that you need to stipulate the one thing that would make your point clear.

Your distinction has nothing to do with what is written in this filling because that was never meant to be part their argument, they simply never went that far.

Their only point was about cores being "independent", if you read carefully they never actually directly bring into question what is supposed to be a core, just that they believe it has to be  an "independent processing unit". Good luck with equating that to anything, there is an endless list of ICs that fit that description. 

And on that note, I'll repeat myself, execution cores are independent.


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> Sun's unreleased 16-core processor called "Rock".
> //
> This design enables resource sharing among cores within a cluster, thus reducing the area requirements. All cores in a cluster share an instruction fetch unit (IFU) that includes the level-one (L1) instruction cache. We decided that all four cores in a cluster should share the IFU because it is relatively simple to fetch a large number of instructions per cycle. Thus, four cores can share one IFU in a round-robin fashion while maintaining full fetch bandwidth. Furthermore, the shared instruction cache enables constructive sharing of common code, as is encountered in shared libraries and operating system routines.
> 
> ...


So yeah, a quad-core, 16-thread chip.  Has four multi-processor cores and 16 execution cores.



Vya Domus said:


> And on that note, I'll repeat myself, execution cores are independent.


They're not processors.


Multi-processor cores don't share anything except memory.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> So yeah, a quad-core, 16-thread chip.  Has four multi-processor cores and 16 execution cores.


"Each Rock processor has 16 cores, each configurable to run one or two threads; thus, each chip can run up to 32 threads. The 16 cores are divided into core clusters, with four cores per cluster. This design enables resource sharing among cores within a cluster, thus reducing the area requirements."

Gotcha again.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> They're not processors.



It must be tiring moving that goal post every time. 

This is about cores, not processors.


----------



## seronx (Aug 29, 2019)

FordGT90Concept said:


> Multi-processor cores don't share anything except memory.


They share a glue, that is pretty slow and inconsistent some might say.

Dual-core communicating through the SRI => glue-interconnect
Dual-core communicating through cache unit => no glue


----------



## FordGT90Concept (Aug 29, 2019)

seronx said:


> "Each Rock processor has 16 cores, each configurable to run one or two threads; thus, each chip can run up to 32 threads. The 16 cores are divided into core clusters, with four cores per cluster. This design enables resource sharing among cores within a cluster, thus reducing the area requirements."
> 
> Gotcha again.


Not really, I wasn't aware that the CMT execution cores were also SMT.  Doesn't change anything other than the thread count.

Execution cores are sharing FGUs and instruction decoders.  Multiprocessor cores share nothing; ergo, my statement is correct: it's a quad-core processor. "Core replication is obvious."  A quad-core is obvious in that diagram, each having a dedicated L2.



Vya Domus said:


> It must be tiring moving that goal post every time.
> 
> This is about cores, not processors.


Define "core."  I'm spelling it out because people like to call two different things a "core"  when they're very different things (honestly, everyone in this thread should know better by now).  To be very blunt: "core" to the public is synonymous with "multiprocessor core."  If you're referring to the other kind of core (the very technical component of a processor which executes instructions), it must be clarified as, for example, an "integer cluster" or an "execution core."  "Core" since 2005, has never referred to "execution core" unless pretext gives it that context.  AMD didn't on their marketing materials.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Define "core."



Sure, I'll repost the definition that I found a couple of pages ago, maybe you missed it :



> ■ Core: An individual processing unit on a processor chip. A core may be equivalent in functionality to a CPU on a single-CPU system. Other specialized processing units, such as one optimized for vector and matrix operations, are also referred to as cores.



Found in this book : Computer Organization and Architecture Designing for Performance Tenth Edition, William Stallings

* "may be equivalent"*

May not, execution cores classify as cores too according to this definition and dependencies are not even brought into question. 



FordGT90Concept said:


> I'm spelling it out because people like to call two different things a "core"  when they're very different things



And one of them is wrong in assuming that core must meant what you're saying it means. Or what this lawsuit says it means.


----------



## FordGT90Concept (Aug 29, 2019)

That's funny because I found the 8th edition which was published in 2010, before Bulldozer released:


			https://inspirit.net.in/books/academic/Computer%20Organisation%20and%20Architecture%208e%20by%20William%20Stallings.pdf
		

On page 18:


> Multicore processors: The eighth edition now includes coverage of what has become the most prevalent new development in computer architecture: the use of multiple processors on a single chip. Chapter 18 is devoted to this topic.


They literally added a chapter describing multicore processors.  Chapter 18 starts on PDF 707 which contradicts the 10th edition:


> A multicore computer,also known as a chip multiprocessor,combines two or more processors (called cores) on a single piece of silicon (called a die).Typically, each core consists of all of the components of an independent processor,such as registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition to the multiple cores,contemporary multicore chips also include L2 cache and,in some cases,L3 cache.



In other words, the author, William Stallings, redefined "core" himself to accommodate AMD's lie.  I don't know what year the 10th edition was published but I guarantee you it is after Bulldozer debuted in 2011.  This is an inconsistent/poor source.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> In other words, the author, William Stallings, redefined "core" himself to accommodate AMD's lie.



We are getting into conspiracy territory and it does not bode well if you want to look believable.

Why are you assuming he was accommodating AMD's lie and not that he was accommodating an archaic definition for a more modern and relevant one instead ?



FordGT90Concept said:


> This is an inconsistent/poor source.



It sure as hell is better than none, I didn't write any book on the subject and neither did you, few tried to classify these things, unsurprisingly.

But let's go to the extremes, check out flynn's taxonomy which dates back into the 60s. He classifies units such as SISD, SIMD as being a type of "computer" and "processing units", not quite cores because that wasn't even a thing back then but it sure gives you something to think about.

You're trying very hard to split the term core into something that bears multiple meanings and claim that only one is correct.

It's not my fault the lawsuit was formulated in the worst possible way from this perspective. If they made it clear that they believed AMD was trying to market one _kind _of core as something else or that one certain type of core was counted in a incorrect manner, yes I would have agreed to you but they didn't. They just said AMD lied about having "independent processing unit", as I said, a million things match that description.

AMD's Bulldozer architecture has, at the very least 8 independent cores/processing units.


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> Why are you assuming he was accommodating AMD's lie and not that he was accommodating an archaic definition for a more modern and relevant one instead ?


There was 5 years of precedent leading up to the writing of the 8th that is additionally consistent with decades of writing on the subject before that and is still consistent with the definition of "core" = "multicore processor" today which AMD agreed to settle with.

I really don't see the purpose of continuing this.  Your own sources turned against you.


----------



## Vya Domus (Aug 29, 2019)

Again, if you want to classify this as conspiracy theory, I too see no point in continuing this.


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> AMD's Bulldozer architecture has, at the very least 8 independent cores/processing units.


Execution cores/integer clusters, not multiprocessor cores.  "Core," since 2005, has only referred to the latter.  FX-8350 is a quad-core processor with eight execution units.  This is a statement of demonstrable fact that AMD agreed to in settling the case.



Vya Domus said:


> Again, if you want to classify this as conspiracy theory, I too see no point in continuing this.


The only "conspiracy" is that AMD tried to oversell their product and they admitted they did this by settling.


There's literally nothing left to debate: it's settled.  "Core" in the context of computer technology, henceforth means "multiprocessor core."  By that very basic test, FX-8350 is a quad-core.


----------



## Vya Domus (Aug 29, 2019)

Nope, core is any independent processing unit. There is no context really, not like this anyway.

That's why GPU manufacturers say their GPUs have thousands of cores because well, those are one type of core even though they are very different from CPU cores.

That's why even manufacturers that make AI chips say they have x amount of cores. Those are cores too even though they look nothing like CPU cores.


----------



## FordGT90Concept (Aug 29, 2019)

NVIDIA calls them "Tensor Cores" and "CUDA Cores," not simply "cores" because "multiprocessor cores" they're not.  AMD calls them "stream processors;" they don't use "core" nomenclature at all on GPU products.

I'd have to see specific AI chip models you're referring to in order to comment.


----------



## Vya Domus (Aug 29, 2019)

This is irrelevant, the meaning or the use of the term core is not contextual unless you want to talk about a specific type. You don't want to let go of the "multiprocessor core" thing even though this is never brought into question here. We are talking about the standalone term that is "core". That's it, don't attach anything to it. 

In this regard anyone that makes a chip which sports multiple processing units is free to call them cores, of what kind it's their business but they are cores nonetheless.


----------



## FordGT90Concept (Aug 29, 2019)

False, AMD can't call an "execution core" a "core" in marketing again or the FTC will string them up for false advertising.  $12.1 million fine would be a trifle compared to FTC's damages.


----------



## Vya Domus (Aug 29, 2019)

If an execution core is not a core ... then what is it ? A waffle ?

Stop attaching attributes to this term. "Core" should work as a standalone term and it does, a lot of people use it for different types of chips.

There are CPU cores, GPU cores, DSP cores, etc. And none of them look similar and they sure as hell don't have to comply to a certain type of description. 

An independent processing unit is a core.


----------



## FordGT90Concept (Aug 29, 2019)

Exclusively when it refers to a "multiprocessor core."  Any other use of "core" must be predicated by the type, examples: "CUDA core," "tensor core," "execution core."  In the context of CPUs, it means one thing and one thing alone: a complete processor.  This box only contained four complete processors:


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Exclusively when it refers to a "multiprocessor core."



This is a requirement made up by yourself.

A more recent example on how liberal is the use of this term :



			https://www.cerebras.net/wp-content/uploads/2019/08/Cerebras-Wafer-Scale-Engine-Whitepaper.pdf
		




			Technology - Cerebras
		


This is plastered with the word core unbound to any type because it doesn't have to be.



FordGT90Concept said:


> This box only contained four complete processors:



It contained one processor with 8 cores.


----------



## FordGT90Concept (Aug 29, 2019)

That's another technical document.  We're talking about commercial products like that in the image I gave above.  Class action lawsuits are about public harm.

That said, technical documents should be clear in their definitions of "core" as well.  Your referenced document does this by specifying "compute cores."  The very next sentence it stresses FMAC:


> Accelerating calculation is most directly achieved by increasing the number of compute cores. More cores—specifically more floating point multiply accumulate units-- do more calculations in less time.


Every use of "core" in that document after that point has an established context: FMAC-heavy compute core.

This isn't rocket science.  Giving context to "core" is akin to giving proper units in algebra.  If it is without context (like computing products on store shelves), it applies to "multiprocessor cores."  Everyone should be on the same page now; there should be no further confusion from vague use of "core."



Vya Domus said:


> It contained one processor with 8 cores.


One socketable package with four processors.  The industry cleaned that mess up back in 2005 too when multiprocessor no longer implied multisocket.  Processors had to be divorced from their physical attachment in software.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> That's another technical document.  We're talking about commercial products like that in the image I gave above.



I am sorry but no, there is no reason to believe this term behaves differently if it's written on a webpage, in a paper or on a box. It's a word that bears the same meaning, all the time. 

You are gain moving the goal post, one time is about multiprocessor cores, the next about complete processors and now you insist to convince me that this debate should only be analyzed in a certain context. 

There is no way I'll ever agree to any other definition for a core other than an individual processing unit, irrespective of the make of the chip that it's part of. 

This goes against every sane assumption that people made about chips for ages. The world isn't full of only Intel and AMD CPUs, there's a lot more out there.


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> There is no way I'll ever agree to any other definition for a core other than an individual processing unit, irrespective of the make of the chip that it's part of.


Then there's no reason to continue this; the matter is settled.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> One socketable package with four processors.



If I ever meet anyone IRL that shows me a single die processor and tells me with a straight face that this is "four processors", I'll feel obliged to slap them.


----------



## Adam Krazispeed (Aug 29, 2019)

wheres my money??? iv got several Buldozer and up cpus?? where my $35 bucks each??? wtf?

any way there are 4x dual core modules?? did have 8 cores, each pair was 2 cores sharing resources, wtf? people are lame when they dont know what CMT  is, its two cpu cores with shared cache, fpu etc..

it was just a flopped design that did suck..cant sue for a bad design, wheres intels lawsuits over only giving us 4 core cpus for 500+ dollars for over a decade


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> If I ever meet anyone IRL that shows me a single die processor and tells me with a straight face that this is "four processors", I'll feel obliged to slap them.


Query NUMBER_OF_PROCESSORS environmental variable in Windows.  It returns the number of logical processors which is the number of threads the underlying hardware accepts.

This is a relic dating back to at least Windows 2000 where multiprocessors only existed in multisockets.  Microsoft had to shape how the operating system sees processors over time in order to best schedule workloads.  If you're writing software and you want to know how many threads are required to saturate the hardware, NUMBER_OF_PROCESSORS is as correct today as it was two decades ago; however, Windows (and other operating systems) are now more knowledgeable about the hardware they run on as a function of necessity.

So let's dig up the first thread on this subject here:








						AMD Dragged to Court over Core Count on "Bulldozer"
					

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in...




					www.techpowerup.com
				



Specifically this post:


VulkanBros said:


>


"Kerner" is Danish for "core." Would you look at that?  FX-9590 is a quad-core according to Microsoft!  But look at what AMD named it at the top: "eight-core."  False advertising much?

The operating system has to look at the cores when load balancing more so than the logical processors because multiple threads on a single core share resources.  This is a fact of CMT and SMT; CMT just shares less.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> One socketable package with four processors.  The industry cleaned that mess up back in 2005 too when multiprocessor no longer implied multisocket.  Processors had to be divorced from their physical attachment in software.


nope actually one processor with 4 modules containing 2 core and 1 FP/Scheduler each ... and it still stay like that : multiprocessor is still a 2P 4P 8P etc system and a multicore processor is still a single socket unit having 2x 4x 6x 8x 10x 12x cores

My SuperMicro H8DCE with 2 Opteron 270 is a 2P system with 2 cores in each Processor 



FordGT90Concept said:


> "Kerner" is Danish for "core." Would you look at that?  FX-9590 is a quad-core according to Microsoft!  But look at what AMD named it at the top: "eight-core."  False advertising much?
> 
> The operating system has to look at the cores when load balancing more so than the logical processors because multiple threads on a single core share resources.  This is a fact of CMT and SMT; CMT just shares less.


well in one revison of W7 my FX6300 shown as 3 cores and later after some patch it did shows as 6 cores ... as if Microsoft was a reference, well WMI was a reference for long 
also Logical core aren't Physical and my FX6300 had 6 physical cores ... windows only probably look at the number of scheduler and considere a pair of core and 1 FP scheduler as "1 physical core" 

also ... conjoined-core hehehe well that would mean ... 2 core together to if AMD did put "4 conjoined-cores" that would mean ... well ... 8 cores.



64K said:


> They've sucked you into this debate and they will never let you leave now.


 well that's why  got some mean for killing time during my day off 


aaaaaaaaannddd now i am off ... i need to check my notification settings, seriously xD


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> FX-9590 is a quad-core according to Microsoft!



The same Windows which didn't know what to do TR, Zen and Zen 2 basally every AMD CPU for the past couple of years ? I'll take the liberty of assuming they didn't know what to do with this one either. Did they fix the bug with TR yet that made the OS stack the same threads onto the same cores all the time ?

If you seriously consider Windows as a reliable source of information regarding what it does with the cores of the CPU it means you're getting desperate in your search for proof.  



FordGT90Concept said:


> This is a relic dating back to at least Windows 2000 where multiprocessors only existed in multisockets.



You called it, it's a relic, not indicative of modern architectures. Don't use Windows for evidence in this case OK ?

A processor is a single die of silicon which may contain multiple cores and threads.


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> My SuperMicro H8DCE with 2 Opteron 270 is a 2P system with 2 cores in each Processor


It's equally accurate to call it four processors.  How they're arranged is an implementation detail.  That's why Microsoft elected to show "sockets" instead of "processors."  Have an example:




Calling it a "processor" instead of a socket is too vague.



GreiverBlade said:


> also ... conjoined-core hehehe well that would mean ... 2 core together to if AMD did put "4 conjoined-cores" that would mean ... well ... 8 cores.


A "conjoined-core" is a "core" which may contain two or more "execution cores."



Vya Domus said:


> You called it, it's a relic, not indicative of modern architectures. Don't use Windows for evidence in this case OK ?


I've already presented a mountain of evidence spanning three threads now.  Windows confirming it is just icing on the cake at this point.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> Windows confirming



It doesn't confirm anything, just that it's a inaccurate way of counting the cores of a CPU.

There are tons of other software that recognizes this CPU as having the corect amount of cores but you pick out just one. Not exactly a mountain of evidence, more like a small tiny boulder.


----------



## FordGT90Concept (Aug 29, 2019)

Oh, it's totally accurate.  It just proves you wrong, so you disagree.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> It just proves you wrong, so you disagree.



In order to prove someone wrong you need arguments and proof. One single case of software mislabeling a piece of hardware is anything but proof.

But then again you said that people were writing wrong definitions to hide AMD's lies and that implies it was all a conspiracy theory to prove AMD right. Oh, and let's not forget the processor with 4 processors, that one was also pretty good. So I expect you to use pretty much everything as proof at this point.

Also, didn't you said you were done like 2-3 times by now ?


----------



## FordGT90Concept (Aug 29, 2019)

Vya Domus said:


> ...that implies it was all a conspiracy theory to prove AMD right.


No, it doesn't.  AMD threw it out there and people gobbled it up on their own without questioning the rationale for it.  The conspiracy is solely in AMD's original lie ("8-core").



Vya Domus said:


> Also, didn't you said you were done like 2-3 times by now ?


Sure did. Sadly, I'm easily baited.


----------



## Vya Domus (Aug 29, 2019)

FordGT90Concept said:


> people gobbled it up on their own without questioning the rationale for it.



Right, and then they proceeded to write books and papers about this one wrong thing. Got it, makes sense. Doesn't sound like a conspiracy at all.


----------



## GreiverBlade (Aug 29, 2019)

FordGT90Concept said:


> Calling it a "processor" instead of a socket is too vague.


well a socket host .... a processor ... which in turn host one or multiple core (rather a socket is a .... oh, socket where the processor is ... uh, socketed? isn't it so? )
and obviously yes M$ would call it socket ... 2p/4p/8p system have 2/4/8 socket and Windows reference that, that's all.
although they could wrote CPU instead of socket ... but they wouldn't use a synonym of "processor", instead of the interface where the CPU/Processor is connected to the mobo, for that ... that would be confusing, right? 



FordGT90Concept said:


> A "conjoined-core" is a "core" which may contain two or more "execution cores."


which are, as you write it, *cores... *


----------



## FordGT90Concept (Aug 29, 2019)

GreiverBlade said:


> well a socket host .... a processor ... which in turn host one or multiple core (rather a socket is a .... oh, socket where the processor is ... uh, socketed? isn't it so? )


It's a chip housing one or more processors.  Each processor on said chip may be called a core.

Since the advent of chip-multiprocessing: one or more socket/land grid array -> central processing unit -> one or more physical processors (cores)-> one or more  logical processors



GreiverBlade said:


> which are, as you write it, *cores... *


[facepalm.jpg]
A "execution core" understands this:

```
a += b
```
A "core" understands this:

```
if (c > 0) a += b
```
One is math, the other processing.  One is a calculator, the other is a processor.

A "core" relies on nothing other than memory subsystems to carry out instructions.  A "core" will call on various "execution cores" to carry out instructions by delegating tasks to them.  For example, floating point math gets set to the floating point unit (a type of "execution core") while integer math will get set to the arithmetic logic unit (another type of "execution core").  In the examples given above, if "a" is of float type, it will be routed to FPUs, if it is of int32 type, it will be routed to ALUs.

All of the parts combined make a processor and a processor, when packaged with many on one chip, is collectively called a "core."  "Core," without context in computing, is synonymous with "processor."


----------



## Vya Domus (Aug 29, 2019)

A core has cores.
A processor has processors.

Something ain't right. This bizzare recursion needs to be addressed somehow.

An if statement is just a conditional jump that only neends an ALU to work, or an execution core. There is nothing special that is required and that can only be found in some other type of core.

Moreover, the execution of every instruction no matter how complex, can be driven pretty much exclusively by lookup tabels.

You can have a memory where every adress corresponds to an instruction and the value at that address represents the control signals required for it's execution. From then on you just need the ALU and wiring.

Of course actual CPUs aren't really implemented like this but this proves everything else other than the execution core is redundant. That's the only dinamic bit required for a processor.


----------



## GreiverBlade (Aug 30, 2019)

previously to that settle matter still they defined what is a core ... which is kinda confusing ... well a core is a core be it execution, cuda, tensor, stream or pure simple core...
\
point of view.


----------



## jaggerwild (Aug 30, 2019)

That's a lot of Ryzen CPU'S 12.1 mil ugh!


----------



## seronx (Aug 30, 2019)

jaggerwild said:


> That's a lot of Ryzen CPU'S 12.1 mil ugh!


Or around FX's cheapest, which is at least 121,000 native eight-core CPUs.


----------



## Keviny Oliveira (Aug 31, 2019)

It's clear who will win in this process are the lawyers besides he actually has eight cores but were a grambiarra with modular cores, and before the Intel fanboys comment something stupid remember that the Core 2 Quad were two glued Core 2 Duo, but because Windows Scheduler is bad on AMD's processors, it says 4/8, nothing more stupid on Microsoft's part.


----------



## FordGT90Concept (Sep 1, 2019)

Keviny Oliveira said:


> ...Core 2 Quad were two glued Core 2 Duo...


Core 2 Duo was a legit module containing two complete processors (aka cores).  Core 2 Quad was a multi-chip module where two Core 2 Duo modules were attached to the same wafer exposing a total of four complete processors on the front-side bus.




Here's what the individual dies look like ("core replication is obvious"):




In the case of Bulldozer, each "module" only contained one complete processor (aka core).  That's why in the literature, it's called a "conjoined-core."


			
				Kumar et al said:
			
		

> This paper proposes *conjoined-core chip* multiprocessing – topologically feasible resource sharing between *adjacent cores* of a chip multiprocessor to reduce die area with minimal impact on performance and hence improving the overall computational efﬁciency


"conjoined-core" is referring to "chip"-level "core" which is synonymous with "processor" consistent with Pentium D and Athlon 64 X2 (which were out at the time).
"adjacent cores" is referring to execution "cores" which share resources in a "conjoined-core."  These do not qualify as "processors."


----------



## Vya Domus (Sep 1, 2019)

They totally qualify as cores and in the papers the authors refer to the arrangement as being a pair of cores, *as in two*.

They don't mean two execution cores or waffles or anything else, *they mean just two cores*.


----------



## FordGT90Concept (Sep 1, 2019)

Sure they do, further in, have another quote:


> Conjoined-core chip multiprocessing deviates from a conventional chip multiprocessor design by sharing selected hardware structures between adjacent cores to improve processor efﬁciency.


They're using two different definitions of "core" interchangeably.

"Conjoined-core" refers to this:


> What is a Core?
> 
> 
> Computer dictionary definition of what core means, including related links, information, and terms.
> ...


"Adjacent cores" refers to this:


> Discussion 10: Execution Unit
> 
> 
> The execution unit contains the data registers and the ALU.



The use of the phrase "execution core" is *rare* outside of conjoined-core literature.

So you see the problem? Bulldozer "execution cores" lack the hardware to decode AMD64 instructions which is a function of the "core" (aka processor).  "Execution cores" as defined in Bulldozer lack the hardware necessary to be considered a "core:" they are merely "execution units."  ...and these are the wheels the turn the gears of false advertising.


----------



## Vya Domus (Sep 1, 2019)

This is purely an invention of yours, as are most of your arguments.

They simply do not ever make a distinction between the kinds of cores that they are talking about because they don't have to, a core is a core in any circumstance. "Adjacent" simply refers to the pair of cores that share the resources, nothing less nothing more.

Have these quotes in which it's crystal clear what they mean by those adjacent cores in relation to the traditional cores :



> Wires connecting the FPU to the left core and the right core can be interdigitated, so no additional horizontal wiring tracks are required



"Connecting the FPU *to the left and right core*.". An FPU classifies as an execution core, clearly they don't mean it's shared between other _execution cores_. 



> A core can alternate accesses between the two banks. It can fetch 4 instructions every cycle but only if their desired bank is available. A core has access to bank 0 one cycle, bank 1 the next, etc., with the other core having the opposite allocation



There you go, each core fetches instructions, in an alternating fashion. It cannot get any more obvious that this, they mean cores as in *not execution cores*. An execution core can't fetch instructions on it's own.

You are simply wrong, end of story.


----------



## FordGT90Concept (Sep 1, 2019)

"Conjoined-core" is never plural.  Think of another context where "conjoined" is commonly used: "conjoined-twins."  Note "twins" is plural because they are, in fact, separate entities but they both share a birth defect: being joined to each other.

If the other's intent was truly to say monolithic-core and conjoined-core were indistinguishable, they would have used the plural form of core: "cores."  They do not, because they're not independent processors; they are in fact very dependent on each other.  The two combined, therefore, make an indivisible new entity: a conjoined-core.

"left and right core" are referring to execution units, not the whole "conjoined-core."

"A core can alternate accesses between the two banks" is referring to the "conjoined-core" where the "two banks" are the "execution units."

As I said, and you just demonstrated again, the article is using two definitions of "core" interchangeably.  It's a technical document that assumes the reader will understand the difference.


----------



## Vya Domus (Sep 1, 2019)

*Execution cores can't fetch instructions.* They mean fully functional cores, if they meant one core, then who is the other core that they are talking about ?

You are out of touch with the technical aspects of this papers.


----------



## FordGT90Concept (Sep 1, 2019)

Vya Domus said:


> An execution core can't fetch instructions on it's own.


And this is where Bulldozer is hilarious: there's actually two types of instructions:
1) x86 which is what the "conjoined-core" exposes to the system.
2) microOPs which is what the "adjacent cores" process and aren't directly accessible.

They both fetch their respective instructions.  This is probably why they love using two meanings of "core."  But only one of them matters to the public.


Not that it matters.  In Steamroller, they split instruction decode too but the "module" is still a "conjoined-core" sharing resources--aka a "core" (not plural).

Remember how Sun designed a conjoined-core on steroids?  Why do you think they never released it?  My guess: poor performance like AMD saw.  Even after four generations of conjoined-core designs, AMD abandoned it entirely.  Sun's chip likely had the same problems AMD's chip did, but four fold, because they shared a crapload more than AMD did.  There was no market for a chip that performs that badly, so they never launched it.  The cost to support it (hardware platforms and software) would have compounded the losses.


----------



## Vya Domus (Sep 1, 2019)

And here you end up contradicting yourself.

You've battled for the last couple of pages to prove execution cores can't be cores because they are just "glorified calculators". But now what do you know, turns out a calculator can even fetch instructions from memory, hmm.



FordGT90Concept said:


> A "core" relies on nothing other than memory subsystems to carry out instructions.



It's settled, they _are _cores.


----------



## FordGT90Concept (Sep 1, 2019)

How can they calculate if they have no data?  Point is, microOPs afford very little capability; hence, glorified calcultors.

Anyway, the x86 decoder (as like all processors), hands the microOPs to the execution units on a silver platter known as L1 Instruction Cache.  You know where I'm going with this.


----------



## Vya Domus (Sep 1, 2019)

Everything a processor executes consists of microOPs. Either everything is a glorified calculator or nothing is.


----------



## FordGT90Concept (Sep 1, 2019)

The Bulldozer "execution unit" is incapable of processing FADD.  There's different types of execution units and the processor (aka "core") has to make sure the appropriate data gets to the appropriate unit then collates the results.

"conjoined-core" is very, very different from "adjacent cores."


----------



## Vya Domus (Sep 1, 2019)

You are clutching at straws with what a Bullzdozer core can or can't do. It's no question that it's capabilities are more limited compared to a conventional core but it's a core nonetheless, it can fetch, decode and execute instructions on it's own. If any of those stages are blocked by another core, it's a different matter but the two are very much obvious distinct entities.


----------



## FordGT90Concept (Sep 1, 2019)

AMD disagrees with your assessment:


----------



## Vya Domus (Sep 1, 2019)

Well, for one there aren't two execution units, there are four. Two for integer, two for floating point and they can be driven independently by two threads with limitations.

That makes it a dual core.


----------



## FordGT90Concept (Sep 1, 2019)

Now you're confusing execution units for components of them (ALUs, AGUs, MMXs, and FMACs).  More detailed slide:


----------



## Vya Domus (Sep 1, 2019)

You literally have it spelled out for you mate.

*Dual 128-bit FMAC pipes.*

Plus the two integer clusters, four. Four execution units, two for integer, two for floating point.

If you want to brake them down fine, you'd have :

- 2x two ALUs
- 2x two AGUs
- 2x 128-bit FP units

But they are grouped like that for a reason, because each integer cluster can be used by one thread and the two FP units can either be shared or used by one thread in the case of 256-bit instructions.


----------



## FordGT90Concept (Sep 1, 2019)

Look at the picture again.  These are pipelines which are part of the execution units (two integer, one floating point):
4 ALUs (EX/MUL pipeline + EX/DIV pipeline * 2)
4 AGUs (AGen pipeline * 4)
2 128-bit MMX pipelines
2 128-bit FMAC pipelines

That's a total of 12 pipelines for each Bulldozer conjoined-core.   Each thread has 4 pipelines (2 x ALU + 2 x AGU) dedicated to it. When counting the FPU, pipeline usage can expand up to 8 when performing an AVX + 2 MMX instruction.  In these instances, the other thread is deprived of progress on FPU tasks.


Still don't know why you insist on carrying on with this train of thought: the decoder and fetcher in Bulldozer is undeniably shared and "cores" don't share logic.  It's a "conjoined core" which means the whole of it is a "core," not specific components as AMD would have you believe.  AMD intentionally called the execution units "cores" to mislead the public in respect to its performance (overselling the capabilities of its product).


----------



## Vya Domus (Sep 1, 2019)

I am looking and I see 4 groups, two for integer, two for floating point. This is better illustrated here, one blue block, one green and two yellow. That's the higher level grouping of these execution units.





The problem here is that you are getting confused because your definitions of what is an execution core or whatever fall into a strange twilight zone. It's neither a core nor an ALU, the only thing left it's a collection of ALUs/FPUs of which a Bulldozer module has 4.

Everyone either thinks in terms of cores or execution units (ALUs or FPUs). You are making this unnecessarily difficult in your pursuit of differentiating cores from anything else.



FordGT90Concept said:


> Still don't know why you insist on carrying on with this train of thought: the decoder and fetcher in Bulldozer is undeniably shared and "cores" don't share logic.



Because even though logic is shared multiple instructions end up being processed. That's the whole point, get work done with less logic.


----------



## FordGT90Concept (Sep 1, 2019)

Vya Domus said:


> I am looking and I see 4 groups, two for integer, two for floating point. This is better illustrated here, one blue block, one green and two yellow. That's the higher level grouping of these execution units.
> 
> View attachment 130550


I see four cores as clearly indicated by fetchers and decoders.

Oh look, Zen looks similar:




Look at the text below the diagram: AMD is referring the whole (from Fetch to L2) as the core (not just the integer execution unit).  AMD doesn't get to change the rules for its own advantage on Bulldozer.  It was well understood what a "core" was before and after Bulldozer debuted.

Oh look! Zen even has 2 x 256-bit FMACs + 1 x MMX per core!  Gee, I wonder why Bulldozer gets dragged through the mud for being pokey.  Maybe it's because AMD *really* skimped on floating-point performance in the name of supporting more integer-heavy threads?  Considering Zen's design, it's clear AMD believed this was a mistake in Bulldozer.



Vya Domus said:


> The problem here is that you are getting confused because your definitions of what is an execution core or whatever fall into a strange twilight zone. It's neither a core nor an ALU, the only thing left it's a collection of ALUs/FPUs of which a Bulldozer module has 4.


These phrases are not my own.  They're phrases used in different literature to describe the same circuits.  Why I keep changing phrasing is to stay consistent with the sourced documents.  To be perfectly clear: "integer cluster" = "execution core" = "adjacent core" which is not to be confused with the singular "core" which is synonymous with "processor."

The best way to describe Bulldozer is thusly:
FX-8350 is a quad-core processor with each core accepting two threads.  The integer payload of each thread is executed by a dedicated integer cluster while the floating-point payload is handed off to the shared floating-point cluster.  The result of this design is accelerated performance in multi-threaded, integer-heavy scenarios like 7-zip compression; however, any workload that strains the processor cores' shared resources (like AVX), performance tanks.


----------



## Vya Domus (Sep 1, 2019)

FordGT90Concept said:


> I see four cores as clearly indicated by fetchers and decoders.



I see eight cores, each pair of two cores sharing some fetch and decode logic. You can see in the picture posted by yourself that the module has 4 decode units, enough to feed two independent threads, at the very least, and enough execution units to be driven by them.



FordGT90Concept said:


> The result of this design is accelerated performance in multi-threaded, integer-heavy scenarios like 7-zip compression; however, any workload that strains the processor cores' shared resources (like AVX), performance tanks.



Again, it's irrelevant how performance tanks or doesn't. CPUs behaved differently because of the way they used resources all throughout history, the first Pentium that had MMX suffered from major performance degradation in other workloads when MMX was used because it would stall other pipelines.


----------



## FordGT90Concept (Sep 1, 2019)

"You can lead a horse to water but you can't make him drink."

I've demonstrated repeatedly, from multiple angles, with multiple sources what a "core" is and AMD redefined it for personal gain; they also agreed to not do it again by settling.  What more is there to discuss?


----------



## Vya Domus (Sep 1, 2019)

Certainly not when the water is stale.

Bulldozer has 8 cores. You've only speculated what *you *think a core is by constantly inventing new definitions and rules outside the subject and context in which this was discussed, that's a big difference. And sources ? Don't make me laugh, you don't get to say that when your response to actual material that proved my point was "they lied".


----------



## hzy4 (Sep 1, 2019)

Next headline AMD sued because Ryzen 3000 cannot reach advertised clock speeds.
Ryzen 3000 Boost Survey


----------



## seronx (Sep 1, 2019)

FordGT90Concept said:


> So you see the problem? Bulldozer "execution cores" lack the hardware to decode AMD64 instructions which is a function of the "core" (aka processor).  "Execution cores" as defined in Bulldozer lack the hardware necessary to be considered a "core:" they are merely "execution units."  ...and these are the wheels the turn the gears of false advertising.


"More specifically, this invention relates to processors that convert an x86 instructions into RISC-type operations for execution on a RISC-type core."
"The core of the processor is a RISC superscalar processing engine."
"The heart of the AMD-K6 processor is a RISC core known as the enhanced RISC86 microarchitecture."
"The AMD-K5 processor’s superscalar RISC core consists of six execution units: two arithmetic logic units (ALU), two load/store units, one branch unit, and one floating-point unit (FPU).  This superscalar core is fully decoupled from the x86 bus through the conversion of variable-length x86 instructions into simple, fixed-length RISC operations (ROPs) that are easier to handle and execute faster. Once the x86 instruction has been converted, a dispatcher issues four ROPs at a time to the superscalar core. The processor’s superscalar core can execute at a peak rate of six ROPs per cycle. The superscalar core supports data forwarding and data bypassing to immediately forward the results of an execution to successive instructions."
"AMD-K6 MMX Processor : High-performance RISC core : Yes / 6-issue (RISC86)"
"The execution engine implements a superscalar, out-of-order, reduced instruction set computing (RISC) architecture"
"The dual instruction decoders translate X86 instructions on-the-fly into corresponding RISC86 Ops. The RISC86 Ops are executed by an instruction core that is essentially a RISC superscalar processing engine."
"From the viewpoint of packing multiple primitive operations into a coarser schedulable unit and performing schedule and execution of macro-ops, the proposed microarchitecture employs a counter approach to recent x86 processor implementations that crack a CISC instruction and convert it into multiple RISC semantics running on RISC-style cores."
"As a coarser-grained approach in the opposite direction, the AMD K7 and the Intel Pentium M have adopted techniques to allow an issue queue entry to accommodate multiple micro-ops as a form of fused operations for certain types of x86 instructions. Original micro-ops are loosely coupled in a fused operation from the scheduler’s perspective; they are scheduled individually according to the readiness of corresponding source operands."
RISC86 which poorly interpreted x86 to Macro-ops which were better interpreted for x86.

"AMD-K7 ™ Processor Architecture => Three Parallel x86 Instruction Decoders => Decoding Pipelines can dispatch 3 MacroOps to Execution Unit Schedulers, Load / Store Queue Unit => Result Busses from Core"
^-- this one is the most intriguing as the only mention of a core is for the LSU slide.






"The AMD Athlon processor microarchitecture is a decoupled decode/execution design approach. In other words, the decoders essentially operate independent of the execution units, and the execution core uses a small number of instructions and simplified circuit design for fast single-cycle execution and fast operating frequencies."

Then, K8 happens... oh dear...
"The AMD64 architecture employs a decoupled decode/execution design approach. In other words, decoders and execution units essentially operate independently; the execution core uses a small number of instructions and simplified circuit design for fast single-cycle execution and fast operating frequencies."
"The AMD Athlon 64 and AMD Opteron processors implement the AMD64 instruction set by means of micro-ops—simple fixed-length operations designed to include direct support for AMD64 instructions and adhere to the high-performance principles of fixed-length encoding, regularized instruction fields, and a large register set. The enhanced microarchitecture enables higher processor core performance and promotes straightforward extensibility for future designs"

That is all dandy, but then it explodes: CPU cores, cores, processor cores, etc.  Which isn't the core they originally defined; "This superscalar core is fully decoupled from the x86 bus through the conversion of variable-length x86 instructions into simple, fixed-length RISC operations (ROPs) that are easier to handle and execute faster."

"A processor core for supporting the concurrent execution of mixed integer and floating point operations includes integer functional units utilizing 32-bit operand data and a floating point functional unit utilizing up to 82-bit operand data."


			https://patentimages.storage.googleapis.com/ff/c4/e7/7da222a99f9ccb/US5574928-drawings-page-11.png
		

"FIG. 13 is a schematic diagram of a layout of a mixed floating point/integer processor core"


----------



## FordGT90Concept (Sep 1, 2019)

Bravo! TL;DR: that's the story of how the x86 front-end is interpreted into a series of microOPs which are executed by the execution units.



Spoiler: Technical mumbo jumbo






seronx said:


> That is all dandy, but then it explodes: CPU cores, cores, processor cores, etc.  Which isn't the core they originally defined; "This superscalar core is fully decoupled from the x86 bus through the conversion of variable-length x86 instructions into simple, fixed-length RISC operations (ROPs) that are easier to handle and execute faster."


Exactly why we have two definitions of "core" now and why it is imperative to declare which is being discussed, especially in marketing materials.  In this quote (without extra context), it sounds like they're referring to an execution unit as a "superscalar core."  P5 was the first superscalar x86 architecture and, if you look at what it had (MMX, FPU, ALU-Y, ALU-U), it's easy to understand why they went with a superscalar approach:




I make it clear what is what here:



You put two execution units in a core, you get two execution units, not two cores.  The reason why everyone went with SMT now is because SMT has all of the benefits of more execution units without putting a physical wall between them. Wider execution units are preferable to many execution units...at least when dealing with x86...because the odds of being able to saturate all of the pipelines in the execution unit are better.  This means greater efficiency.

The picture above is in line with the terms AMD settled with: "core" includes fetcher to L1 data cache.

If you disagree, I remind you that Kumar et. al. (likely the inspiration for Bulldozer) described the design as a "conjoined-core" intentionally omitting the plural form of core.  That is not a mistake and the hyphen makes it clear which definition of the word "core" is used for the purpose of the technical paper.



Going back to K8...that was actually mirrored to become the first dual-core x86 processor...and AMD put it on the box loud and proud:




I actually found the product PDF too, straight off of AMD's servers:
Athlon 64 X2 Dual-Core Product Data Sheet

Looky what it says about "dual-core:"



*AMD explicitly *includes* L2 (off to the left) in their definition of "core" which means the definition of "core" goes far beyond just the "execution unit" (left part of the gray area):*



The only parts of this image that AMD effectively excluded from the definition of "core" is SRQ, XBar, and IMC.

See what happened there?  *AMD contradicted themselves with Bulldozer* and the motive is clear (over-represent their product).




Spoiler: Intel concurs with AMD Athlon 64 X2 definition of core



As for Intel?  Well, they were never so brash as to put "#-core" on a box as AMD did.  They usually like to hide it behind model numbers and fine print.  Anyway, let's have a look at Pentium D which is two MCM'd Pentium 4s:








						Product Specifications
					

quick reference guide including specifications, features, pricing, compatibility, design documentation, ordering codes, spec codes and more.




					ark.intel.com
				



When you click the little (?) next to "# of cores" it says: "Cores is a hardware term that describes the number of independent central processing units in a single computing component (die or chip)."  Remember how I said Core 2 Quad has four processors?  That's basically Intel's words, innit?  Intel has been extremely consistent in describing their "cores" as discreet processors networked together since the beginning.


----------



## seronx (Sep 1, 2019)

FordGT90Concept said:


> See what happened there?  *AMD contradicted themselves with Bulldozer* and the motive is clear (over-represent their product).


No, AMD contradicted themselves with K8.  What K8 defines as a core is a processor.  Whereas what Bulldozer defines as a core is a core.

K8 is a dual-processor CPU with each processor having a single core.
Family 15h 70h-7Fh is a single-processor CPU with that one processor having two cores.

Bulldozer uses the older definition with priority rather than K8s which seems to be a marketing gaff.

Even, Intel called it the core with AMD;
"In-Order Front End -> Its job is to supply a high-bandwidth stream of decoded instructions to the out-of-order execution core, which will do the actual completion of the instructions. These IA-32 instruction bytes are then decoded into basic operations called uops (micro-operations) that the execution core is able to execute."

"The P6 microarchitecture is made up of in-order front end, *out-of-order core* and in-order retirement units.

The front end includes Instruction Fetch, Instruction Decode, Branch Target Buffer, Micro-instruction Sequencer, and Register Address Table units. *The out-of-order core is made up several execution units; the units include Floating Point Execution units, Integer Execution units, and Address Generation units.* The in-order retirement back end includes the Re-order Buffer and the Register Retirement File units."

"Intel ® Microarchitecture Code Name Sandy Bridge:
-> An in-order issue front end that fetches instructions and decodes them into micro-ops (micro-opera-
tions). The front end feeds the next pipeline stages with a continuous stream of micro-ops from the
most likely path that the program will execute.
-> *An out-of-order, superscalar execution engine that dispatches up to six micro-ops to execution, per
cycle. The allocate/rename block reorders micro-ops to "dataflow" order so they can execute as soon
as their sources are ready and execution resources are available.*
-> An in-order retirement unit that ensures that the results of execution of the micro-ops, including any
exceptions they may have encountered, are visible according to the original program order.

The out-of-order core consist of three execution stacks, where each stack encapsulates a certain type of
data. The execution core contains the following execution stacks:
• General purpose integer
• SIMD integer and floating-point
• X87"

Intel used AMD's misdefintion in Core Duo, Core Quad, and pretty much everything relating to multi-processors.  However, the docs point to the out-of-order portion of the Intel processor as the core.

Multi-core processor => AMD's Family 15h(two cores)/Sun's Rock(four cores)
Multi-processors with single-cores => K8, Core Duo, Core 2 Quad, Athlon/Phenom X4, etc.
Multi-processors with multi-cores => Family 15h Model 00h-0Fh(8 cores)/Sun's rock(16 cores)


----------



## FordGT90Concept (Sep 1, 2019)

seronx said:


> K8 is a dual-processor CPU with each processor having a single core.


That is not what AMD put on the box.  It says "dual-core" not "dual-processor."  They confirm the two are one in the same in the product data sheet.

They not only said the same thing on Phenom II but they added "true quad-core design:"



Remember why they did that? Because AMD's marketeers felt the Core 2 Duo "module" wasn't really a "dual-core" because of the shared L2 cache with AMD processors didn't share until _Bulldozer_ (among many other things), if memory serves.

Hilarious, isn't it?  It's almost like AMD kept changing the definition of "core" to suit themselves.  Because they did.  The only difference is that AMD's moving goal post didn't really mean anything to consumers until Bulldozer because they were pushing this notion that consumers were getting twice the number of cores as they really were.  That's an argument that can be taken to court for damages--and they were.


As I pointed out: 2005 established the definition of what a "dual-core processor" was and AMD and Intel were united on that front.  Intel has been consistent since; AMD has not: dual-core (Athlon 64 X2) = processor replication -> "true" dual-core (Phenom) = no sharing L2 cache (this was and remains a stupid argument) -> conjoined-core (Bulldozer) = really one core but we're going to sell it as two -> quad-core (Zen) = processor replication.  I think it's fairly safe to say that this ridiculous AMD chapter is closed for good.



seronx said:


> Even, Intel called it the core with AMD;
> "In-Order Front End -> Its job is to supply a high-bandwidth stream of decoded instructions to the out-of-order execution core, which will do the actual completion of the instructions. These IA-32 instruction bytes are then decoded into basic operations called uops (micro-operations) that the execution core is able to execute."
> 
> "The P6 microarchitecture is made up of in-order front end, *out-of-order core* and in-order retirement units.
> ...


Quotations without citations are plagiarism.

Your Sandy Bridge quote yet again makes my point for me: they quit calling it a "core" because since 2005, that means exclusively processor; they instead call it a "superscalar execution engine" which eliminates the confusion.  Call anything that isn't a complete processor inside of a multi-processor chip a "core" today, beware of the lawyers.


----------



## seronx (Sep 1, 2019)

FordGT90Concept said:


> That is not what AMD put on the box.  It says "dual-core" not "dual-processor."  They confirm the two are one in the same in the product data sheet.
> 
> They not only said the same thing on Phenom II but they added "true quad-core design:"


The definition of a processor includes a core.  So, regardless a processor must have a core.  So, four processors would always net at minimum four cores.

Phenom X4/Phenom II X6 can either be referred by the processor count or the core count.  It would still be correct.  However, it isn't a true quad-core design for your reasoning.  It is a _monolithic _quad-core design, where as the Core 2 Quad was _two monolithic_ dual-core designs.

"Remember why they did that? Because AMD's marketeers felt the Core 2 Duo "module" wasn't really a "dual-core" because of the shared L2 cache with AMD processors didn't share until _Bulldozer_ (among many other things), if memory serves."
As stated above, it was a jab at Intel for their MCM design.






Intel's Xeon is a true 28-core/28-processor design, while AMD's EPYC is only an 8-core/8-processor design * 4.

Relative to x86/x86-64, the Bulldozer compute unit processor microarchitecture, is the first true x86 dual-core processor.  Athlon X2 for example is two single-core processors that are glued together.

K8/10h/12h processor microarchitecture doesn't include two cores; Only AMD Bulldozer's compute unit processor microarchitecture contains two cores.  Of the two, only Bulldozer can keep the native dual-core architecture. So, if AMD comes out with an octo-core processor microarchitecture, then neither FX or Ryzen can be said to be natively octo-core.

With AMD's Stoney Ridge, it can be called a native/true/etc dual-core processor and it will convey the message well.  However, AMD for Raven2 would definitely want to market it as a two single-core processors design glued together through the shared L3 cache.  Raven2 isn't natively or truely a dual-core processor as it is two replicated single-core processors which are glued together via L3.

Raven2 without SMT => Core 0 writes X in a FPU register, Core 1 is dependent on X for a FPU opteration.  100s to 1000s of cycles from PRF(core 0) -> L1(core 0) -> L2(core 0) -> L3(shared) -> L2(core 1) -> L1(core 1) -> PRF(core 0) -> execution.
^- non-native dual-core

Stoney with CMT => Core 0 writes X in a FPU, Core 1 is dependent on a X for a FPU operations.  A few cycles from renaming -> PRF-tag with Core0 -> PRF-tag with core 1 -> execution.
^- native dual-core


----------



## FordGT90Concept (Sep 2, 2019)

You're missing the point. Again. AMD explicitly said that K8 "core" included L1 and L2 caches which includes front end, execution units, L2 cache, floating point units, and everything in between.  Intel mirrored that definition since Pentium D.

This thing you like calling a "core" has a dozen names that fundamentally means the same thing but it is no longer singularly called a "core," it is always prefaced with a descriptor (superscalar, out of order, execution, integer, etc.).  "Core" by itself, especially in marketing, means one thing and one thing alone since 2005: a processor which includes front end, the various types of units, and sometimes L2+ caches.  It's hardware that receives x86 instructions, processes, and returns the full result.

This feels like a merry-go-round.  It's settled.  I'm done here.


----------



## seronx (Sep 3, 2019)

FordGT90Concept said:


> AMD explicitly said that K8 "core" included L1 and L2 caches which includes front end, execution units, L2 cache, floating point units, and everything in between.  Intel mirrored that definition since Pentium D.


The earlier definition has priority or precedence in this case.  However neither definitions are technically correct.  A core is only; a control unit, a datapath, an instruction bus, and a data bus.

The control unit in K7/K8/10h/12h is called the instruction control unit.  Which is directly interconnected with the 3-wide OoO integer datapath and remotely interconnected with the 3-wide OoO floating point datapath.  The instruction bus and data bus are easily confirmed can instructions flow in and can data flow out.  Boom its a core.

K7 core is instruction control + integer datapath
K8 core is instruction control + integer datapath
Family 10h/12h core is instruction control + integer datapath
Family 15h core is instruction control + integer datapath, which Family 15h's compute unit has TWO.
etc.

L1 cache, L2 cache, FPU, Front-end, etc are not part of the core.


----------



## Keviny Oliveira (Sep 11, 2019)

FordGT90Concept said:


> "cores" which share resources in a "conjoined-core." These do not qualify as "processors."


But not make sense, lol, If the processor has 2 cores him is a dual core, even though it has lower performance than a 1-core processor, it is logical that FXs cores are non-logical physical, so the FX 8300 is an octa-core where certain tasks have the same performance as a quad core.


----------



## R-T-B (Sep 17, 2019)

FordGT90Concept said:


> Remember how Sun designed a conjoined-core on steroids? Why do you think they never released it?



Probably because Oracle bought them out and released it as one of the many shared FPU sparcs at the time...



Keviny Oliveira said:


> But not make sense, lol, If the processor has 2 cores him is a dual core, even though it has lower performance than a 1-core processor, it is logical that FXs cores are non-logical physical, so the FX 8300 is an octa-core where certain tasks have the same performance as a quad core.



Wut


----------



## gaximodo (Sep 19, 2019)

"Real Man use Real Cores"


----------



## Aquinus (Sep 19, 2019)

I've read this thread somewhere else... and I'm still not convinced because they're still the same flawed arguments predicated on "what has been in a core," not, "what actually defines a core," on top of the fact that such rigid definitions of a "core" is only going to tie new designs to an archaic way of looking at things just for the purposes of legal bullshit.

Honestly, I don't think this has anything to do with if they're cores or not. That's just the legal argument. I honestly think that this really is about some rich idiot who feels duped because he didn't understand what he was buying because a normal person isn't going to buy a CPU then choose to sue because it wasn't good enough because most people are bright enough to look at reviews and make a judgement call themselves. Most people also aren't willing to invest the resources in suing because it takes money to do that.

I honestly think this entire debate is despicable.

I also think the last group of people who are qualified to make this call are armchair warriors.


----------



## svan71 (Sep 27, 2019)

ever notice these B.S. lawsuits all come out of cali? who can I go after to get my $90 shoes replaced after walking through the human feces and needles in downtown sanfransicko?


----------



## WHOFOUNDFUNGUS (Sep 30, 2019)

Lies. Lies, lies, lies, lies. AMD lies, Intel lies, they all lie. Anyway this is just a test to see if my signature works. It probably doesn't but just the same...


----------



## Keviny Oliveira (Oct 9, 2019)

WHOFOUNDFUNGUS said:


> Lies. Lies, lies, lies, lies. AMD lies, Intel lies, they all lie. Anyway this is just a test to see if my signature works. It probably doesn't but just the same...


Yes, but to say that an FX 8xxx is a quad core being that it has eight very slow modular cores is a joke


----------

