Wednesday, August 28th 2019
AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit
AMD reached a settlement in the Class Action Lawsuit filed against it, over alleged false-marketing of the core-counts of its eight-core FX-series processors based on the "Bulldozer" microarchitecture. Each member of the Class receives a one-time payout of USD $35 per chip, while the company takes a hit of $12.1 million. The lawsuit dates back to 2015, when Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of false-marketing of its FX-series "Bulldozer" processor of having 8 CPU cores. Over the following four years, the case gained traction as a Class Action was built against AMD this January.
In the months that followed the January set-up of a 12-member Jury to examine the case, lawyers representing the Class and AMD argued over the underlying technology that makes "Bulldozer" a multi-core processor, and eventually discussed what a fair settlement would be for the Class. They eventually agreed on a number - $12.1 million, or roughly $35 per chip AMD sold, which they agreed was "fair," and yet significantly less than the "$60 million in premiums" consumers contended they paid for these processors. Sifting through these numbers, it's important to understand what the Class consists of. It consists of U.S. consumers who became interested to be part of the Class Action, and who bought an 8-core processor based on the "Bulldozer" microarchitecture. It excludes consumers of every other "Bulldozer" derivative (4-core, 6-core parts, APUs; and follow-ups to "Bulldozer" such as "Piledriver," "Excavator," etc.).Image Credit: Taylor Alger
Source:
The Register
In the months that followed the January set-up of a 12-member Jury to examine the case, lawyers representing the Class and AMD argued over the underlying technology that makes "Bulldozer" a multi-core processor, and eventually discussed what a fair settlement would be for the Class. They eventually agreed on a number - $12.1 million, or roughly $35 per chip AMD sold, which they agreed was "fair," and yet significantly less than the "$60 million in premiums" consumers contended they paid for these processors. Sifting through these numbers, it's important to understand what the Class consists of. It consists of U.S. consumers who became interested to be part of the Class Action, and who bought an 8-core processor based on the "Bulldozer" microarchitecture. It excludes consumers of every other "Bulldozer" derivative (4-core, 6-core parts, APUs; and follow-ups to "Bulldozer" such as "Piledriver," "Excavator," etc.).Image Credit: Taylor Alger
291 Comments on AMD to Cough Up $12.1 Million to Settle "Bulldozer" Core Count Class-Action Lawsuit
If you want to read the discussion/argument going back and forth for several rounds, go though that thread. We probably won't be able to bring anything new to the table here.
While wiki is not always the best source, its description is pretty accurate:
Judging by performance, it's more like an SMT quad-core than a non-SMT octo-core. Legit octo-cores (AMD and Intel alike) bitch slap Bulldozer. :laugh: ...and it's reflected in the increase of transistors too.
So, I'm seeing a paradoxical claim. It's independent except that it's not.
Memory controller, higher level caches like L3 and newer additions to CPU die like IO are not core functions.
In a multicore CPU, one core can be disabled completely and remaining cores will work as expected. This has always been the case, in practice this is usually implemented with a setting in BIOS.
Ok. At Bulldozer release date for what you could use a CPU that totaly lacked FPUs (and also had nothing to emulated a FPU).
By what you are writing you are basicaly saying that FPUs count doesn't matter. So I decided to push it at the limit and ask you what usage had a CPU that totaly lacked FPUs at Bulldozer release date.
Your argument is like saying that RAM wasn't really RAM until DDR. Doubling the data rate doesn't make RAM into RAM. It merely makes it DDR. An FPU, similarly, is an addition upon the basic spec. FPUs could be eliminated again from CPUs and emulated in software. The performance would be bad for FPU-dependent processing but it would still run. Software FPU was used for many years. The FPU is a superset of the CPU core. Do you have a better citation than a wiki? The last time I checked, multi-core CPUs could share things, like a common pool of cache and cannot operate with some of the things they share disabled.
Besides, if you believe it's accurate then you really shouldn't use the terminology "multi-core CPU". Instead, you should use the terminology "multi-CPU die". This is because a core is a subset of a CPU. In multi-core CPUs in particular there is the expectation of resource sharing. That is what separates "core" from "cpu".
teachcomputerscience.com/cpu/
Using both AMD's Zen and Intel's Skylake block diagrams from WikiChip as an example here. Check the SoC (entire die) diagram as well as the core parts that follow:
en.wikichip.org/wiki/amd/microarchitectures/zen#Block_Diagram
en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Block_Diagram
Any one core of these can be disabled independently, both in theory and in practice.
IO/Memory controllers, L3 cache and Infinity Fabric/Ring Bus are not part of the core.
In Bulldozer, there's actually four instruction fetchers: one for the core, one for integer cluster 1, one for integer cluster 2, and one for the floating point cluster. As far as software is concerned, it only sees the first: the one for the core. This is why Microsoft had a hell of a time trying to get Windows thread dispatching to work right. Windows had to be modified to more intelligently control threading so it wouldn't inadvertently move threads around in a manner that would overwhelm one core while leaving another idle. Older versions of Windows (I think it was 7) addressed this by making each thread (associated with an integer cluster) a "processor." This solved the problem of Windows shifting threads around but it created a problem of overwhelming the floating point cluster because it couldn't appropriately load balance integer and floating point!
Years passed and in Windows 10 (I think 8 too), Microsoft finally tackled the problem by making the Windows thread scheduler aware of physical processors and logical processors. Because of this, Windows can now appropriately delegate floating point (by physical processor) and integer (by logical processors). Why? Because it's a relevant problem for all SMT implementations. Ryzen, Core i#, and Pentium 4 w/ HT only have one integer cluster and one floating point cluster each but they can accept two threads. The underlying hardware has to be managed in a similar fashion Bulldozer's does. The only difference is that tiny little detail of Bulldozer having two integer clusters instead of one.
FX-8350 has four cores, each core has two integer clusters. They are not equivalent. A core knows how to do SIMD (single instruction multiple data); an integer cluster does not. Integer clusters are fundamentally calculators, not processors. They lack awareness (have no knowledge of parallelism), logic (Boolean tests, branching, etc.), and access (only have the data they are given) to be processors.
As for the claim that Bulldozer doesn't function with intra-modular cores disabled... I guess you're unfamiliar with BIOS settings that can do that. I've run both the 8320E and 8370E with 4 cores via the 1 integer core per module setting.
Diverging from common design doesn't mean lawsuit. :rolleyes: Citation please. Everything I've read said FX CPUs' CMT is not SMT. I am also interested in how that BIOS setting works, considering that you are saying it's mislabeled by Gigabyte.
Microsoft's problem with scheduling was strange. The eventual fix was a change in how a Bulldozer CPU was being handled. Initially Bulldozer was treated as full 8-core processor and situation improved considerably when it started to be treated as 4-core with SMT (it is noteworthy that Linux did the same much sooner). This inherently addressed both problems plaguing scheduling:
- Moving threads around to undesired cores. Simple example is a second core in single module where first core is already loaded.
- Because of the same reason any FPU-heavy load was now more likely to go to unused module largely negating the shared FPU issue.
2x 80-bit Lo
2x 64-bit Hi
2x 128-bit Mid
If a 256-bit op; it would execute Lo twice for 128-bit and Hi twice for 128-bit on a single port for each thread. Lo to Hi register moves is 1-cycle, Lo(Hi) to Lo(Hi) register moves is 0-cycle. So, if the second thread is dependent on the first thread it could execute the first half on both. etc, etc, etc.
FPU design is built with two cores in mind. Front-end is built with two cores in mind. L2+interface is built with two cores in mind. There is physically only two cores in the Bulldozer module. As it is the world's first monolithic dual-core x86 architecture.
How the Windows scheduler acts with a design it wasn't made for is an interesting topic but hardly proof.
We are not living in the days when code was properly optimized. Today we are living in the days when "buy better cpu" , "buy more ram" , etc is normal...
If I code something and I know for sure that whoever is leading is changing his/her mind regarding what he/she wants then you can bet I will basicaly abuse FPU usage so I just don't have to go back and change things. You can say it's my fault I will say it's whoever leading fault because he/she clearly has no clear goals in mind.
And trust me the "insert bad words" leader that is changing his/her mind regarding what he/she wants also want things done fast, totaly ignoring the fact that his/her instability is making things slower and lacking optimization.
There is not gonna be a real diference between 4 or 8 heavy FPU threads run on the 8150. This is a reason to raise the question if 8150 is an 8 core or 4 core. Someone might not even say what type of threads is pushing on the 8150 and just make such a benchmark and show that it basicaly no real difference between 4 and 8 threads run on the 8150. Do I have to say what type of threads I'm pushing on a CPU? I think not.
2) I'll repeat my question: Zen 2 also performs better with AVX-256 than Zen 1 because it can execute 256-bit instead of combining 128s. Does that mean Zen 1 didn't have any real cores in it?
Zen 1 can't execute 256-bit AVX independently. It has to combine at the 128-bit level. So, it doesn't have any real cores, eh? Not only is it slower at doing 256-bit AVX, it can't do it independently.
Keep in mind that added Integer Cluster does not necessarily mean a huge boost in execution resources. Bulldozer Integer Clusters contain 4 pipes each (2 ALU, 2 AGU) and FPU contains 3 pipes (2 FMAC + MMX). At the same time, Zen's Integer Cluster has 6 pipes (4 ALU, 2 AGU) and FPU has 3 pipes (2 FMAC + MMX).