Friday, November 6th 2015

AMD Dragged to Court over Core Count on "Bulldozer"

Nov 6th, 2015 05:05 Discuss (511 Comments)

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.

The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.

Source: LegalNewsOnline

Add your own comment

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

#351

FordGT90Concept

"I go fast!1!11!1!"

The public believes core fits the Athlon X2 and Intel model which is discreet processors in one socket. Bulldozer's "cores" are not discreet. That's all the judge has to look at and decide. It's not unlike how NVIDIA sold the GTX 970 with "4 GiB of VRAM" but didn't notify the public that the last 0.5 GiB of that underperforms the rest by a huge margin. Excluding important information like your "cores" sharing FPUs or your memory gimps itself are doors for the public to seek damages.

Seagate did not counter sue Microsoft for not correctly labeling hard drive capacity (using math for GiB but showing a GB label). AMD could certainly try to sue Microsoft but where Seagate had a strong case against Microsoft (and still does), AMD really doesn't against Microsoft. What AMD wants to call a "core," no one else does. Microsoft would have to make an exception for Bulldozer and how can Microsoft adequately explain to the public what is weird about Bulldozer in two words? They can't. AMD really brought this on itself by not making it clear to the public the product is different and it will have to pay the price for it.

There is no "minus" for a core. It either is a complete processor or it isn't.

#352

cdawall

where the hell are my stars

It is a complete processor, each core inside of a module can function without each other and independently. They are physically present therefore they are not "logical cores" they are "physical cores"

#353

FordGT90Concept

"I go fast!1!11!1!"

There's a lot of hardware there that indicates it isn't two physical processors:

Pretty much everything is shared except the integer clusters. We're talking about 20% of a CPU that isn't shared. 20% a processor does not make. One core: two integer clusters and two threads.

#354

Aquinus

Resident Wat-man

FordGT90ConceptWe're talking about 20% of a CPU that isn't shared. 20% a processor does not make.

That "20% addition" gives you a full core worth of performance in most cases. The only time that changes is when you're exclusively using the FPU on both "integer clusters" at the same time which is an unrealistic use case. Once again, AMD gimped overall performance per clock, but, if you consider how applications scale to pure parallel workloads, it's pretty close to linear speed up for every thread added which feels a lot like a real core. Most forms of SMT don't have those kinds of performance characteristics, hyper-threading certainly doesn't.

FordGT90ConceptThere is no "minus" for a core. It either is a complete processor or it isn't.

This line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.

#355

FordGT90Concept

"I go fast!1!11!1!"

Let's look at it from a different perspective: failure. If an instruction fetcher fails in a Bulldozer chip, you lose two integer clusters and one FPU. If an instruction decoder fails in a quad-core Deneb or Zen, you lose one FPU and one integer cluster. Is it really separate when a single point of failure (a component that is shared) can disable both?

#356

FR@NK

AMD can claim there are two cores per module but thread scaling seems to disagree:

This is a FX-8320 piledriver 8c/8t.

Now compare an Ivy bridge 4c/8t.

It looks like based on what i've seen, each module was designed as one core with extra hardware for multithreading under most workloads. Somewhere along the way they decided to market it as each module was two cores. I think this was a mistake; bulldozer would have looked like a much faster chip if the FX-8150 was marketed as a 4c/8t chip. Instead we got an 8c/8t chip that marginally hurt the performance when 2 threads were scheduled to the same module compared to spreading them out between modules before doubling them up.

#357

FordGT90Concept

"I go fast!1!11!1!"

It does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core. Absolutely nothing suggests it is except AMD's marketing material.

I take it Dhrystone sees HTT and limits itself to 4 cores?

#358

Roph

I'd like to see such a graph generated on a Phenom 1, as you scale up and starve it of its pathetic cache. Is that not a quad core?

#359

FordGT90Concept

"I go fast!1!11!1!"

Since it only has four cores and no simultaneous multithreading, it would look like both of those do up to 4 and then steady at 1 beyond that. It would exactly look like Dhrystone on Ivy--any quad-core without some kind of in-core multithreading would.

#360

cdawall

where the hell are my stars

AquinusThis line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.

This is the same issue I keep seeing, why do some people assume monolithic dies are the only things that can be described as a core? These have the ability to independently work, something you cannot do with HT. These are and always will be physically there, it isn't a "logical" core.

#361

MalakiLab

FR@NKAMD can claim there are two cores per module but thread scaling seems to disagree:

This is a FX-8320 piledriver 8c/8t.

Now compare an Ivy bridge 4c/8t.

It looks like based on what i've seen, each module was designed as one core with extra hardware for multithreading under most workloads. Somewhere along the way they decided to market it as each module was two cores. I think this was a mistake; bulldozer would have looked like a much faster chip if the FX-8150 was marketed as a 4c/8t chip. Instead we got an 8c/8t chip that marginally hurt the performance when 2 threads were scheduled to the same module compared to spreading them out between modules before doubling them up.

It is called Amdahl's Law. It depends on a lot of things. First on the hardware to be able to handle all the workload, as the dispatching takes more and more resources. The more processors or the more cores you throw in a processor, the more overhead there will be in management of the flow.

Second is the parallelism of the workflow you have to input in processor. Not everything can be parallelized infinitely. Codes nowadays are not well designed to be on so many cores, and you'll see on other 8 cores and 12 cores Intel processors, you'll begin to see the same exact behaviour.

If you want to read some good text, written by Intel, you can purchase this one : www.computer.org/csdl/mags/so/2011/01/mso2011010023-abs.html

Without talking of hyperthread or anything, after 4 cores, the current way we compile and program stuff, don't take all advantage of being parallelized. Graphic is also from Intel, on how they see cores behave, it's part of the article above.

It's funny though seeing you trying to analyse data and interpret them to suit your point of view on something.

#362

MalakiLab

FordGT90ConceptIt does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core. Absolutely nothing suggests it is except AMD's marketing material.

I take it Dhrystone sees HTT and limits itself to 4 cores?

Wrong. Dhrystone is calculating MIPS. Hyperthreading don't boost MIPS, but squeeze instructions to be dealt more efficiently, so can speed up by making sure every core don't have some execution holes and everything is well coordinated to maximize usage of all the processor components. The only way to boost the instructions per second is adding more cores. And it might not be linear after 4 cores.

#363

FordGT90Concept

"I go fast!1!11!1!"

cdawallThese have the ability to independently work, something you cannot do with HT.

There is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

#364

MalakiLab

FordGT90ConceptThere is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

You're wrong. If you worked with Xeons, in datacenters, company medium clusters or supercomputers, you'd know that. Software and the workload are having a huge repercussion on how multicore and multiprocessor behave.

I'll show you yet another example of a program i personally worked on. compression.ca/pbzip2/

Nothing in real life have an infinite linear curve. There's always a limit to which an application can be parallelized. Every processor, every software and every instructions show that kind of behaviour.

It is a law, it is calculated, it is calculable. research.cs.wisc.edu/multifacet/amdahl/

Get your science right.

EDIT : You remained blind to all proof presented to you so far. Don't say there's no proof, everything converge as a proof. If you had an once of honesty i can prove you are wrong. I made the test myself some times ago, as i worked on the ondemand governor, which made the core clock fluctuate depending on the workflow. I made the test to be 100% sure if one core is at 1600MHz and the other is at 3900MHz, that the virtual machine bound to one of the module core won't get affected by the other at lower speed. Both integers and FPU are slower on the 1600MHz core, and both are faster on the 3900MHz core. Both work independently. Until i throw an AVX instruction, then the entire FPU clock is getting at the speed of the core asking for the unification, until the instruction is done. You can even try it yourself with QEMU/KVM, while putting a core affinity to the VMs, one on the first module core, other on the second module core. Very easy to replicate. When i don't use AVX, i have 100% of the time 2 cores in a module. But you continue putting your head in sand, playing blind.

EDIT 2 : I am also surprised on how inaccurate your point of views are. It's actually the opposite, AMD have a semi SMT, because it can't have another thread on a core in a module. It's one of the bottleneck of the cores in the module. with the fact it is not ordering well. Hyperthreading is a much much better SMT implementation, and takes a lot more space on the die. Complete opposite. For that you seem to talk about, you're claiming it's transparent to the OS, when it's not, at all. 95% of the thread management is made my the kernel, the threads library and the kernel, all software. The processor only order them to the right core/module and then to the right thread. The processor don't decide what core will take what thread, the kernel do. What the processor decide is what it will do with the thread. The Intel SMT is better too because it don't have 2 cores to supply, the hyperthreaded one don't have to be on time and constantly supplied like the AMD Bulldozer have to.

Microsoft is very very bad at handling threads, unlike Linux. Because Linux was used with SPARC and other servers to have like 8 chips, with 16 cores, 128 threads. But before hyperthreading, Windows kernel never had a proper threading library. It's normal for a SPARC to have so many threads, as it's a RISC, not a CISC. In Linux they just have to modify some point of the kernel to make it recognize the module as a core with multiple threads. It doesn't change anything, except it will address threads like it should be. If the big SMT would be detrimental for the design, you can be sure a SPARC or ALPHA processor would be bottleneck like there's no tomorrow. But it's not. Pretty much everything composing your logic is opposite to what is established in computer engineering.

#365

cdawall

where the hell are my stars

FordGT90ConceptThere is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

That's great and all until you end up being incorrect.

#366

FordGT90Concept

"I go fast!1!11!1!"

Meh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand. AMD is going to lose.

#367

Aquinus

Resident Wat-man

FordGT90ConceptMeh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand. AMD is going to lose.

The ignorance of the consumer makes AMD wrong? Interesting. Is that seriously what you've reduced your argument to? That makes absolutely no sense. It's like saying the consumer was ignorant so AMD will be punished. That's laughable at best.

#368

FordGT90Concept

"I go fast!1!11!1!"

It made Seagate wrong when they were class-action sued.

#369

FR@NK

AquinusThe ignorance of the consumer makes AMD wrong?

Who do you think is sitting in the jury box? Ignorant consumers!

FordGT90ConceptIt made Seagate wrong when they were class-action sued.

I remember feeling very cheated when my new 60GB drive was only showing 55.87GB. I imagine it would be much worse when you realize you are missing half of your cores.

#370

Aquinus

Resident Wat-man

FordGT90ConceptIt made Seagate wrong when they were class-action sued.

The high failure rate class action? How was the consumer wrong and ignorant when their drives actually did have high failure rates? The funny thing about that is that failure rate can be quantified, just as performance can be and you'll find out very quickly that performance really drops off when you're doing all floating point math which isn't a realistic load for a normal application as there are usually a lot of integer operations mixing in with floating points ops so, even if there is a shared FPU, realistically multi-core performance won't suffer as much.

When people complain about Bulldozer, what is the #1 complaint? I'll give you a hint: It's not multi-core performance. The biggest complaint is single-threaded performance so, even without another task trying to utilize the FPU, performance still sucks and that isn't because BD doesn't have "real cores." The fact that the FPU is shared is beside the point but you seem to be incredibly intent on making it an upfront issue. The simple fact is that BD's performance blows because the number of uOps BD can execute at any given time was seriously reduced over K12. Since dispatch width per core is significantly reduced, it's completely possible that instructions before that might take 3 or 4 clock cycles now might take 5 or 6 because of the reduced width of even integer operations because AMD slimmed down the core, they didn't just share some parts like the FPU and dispatch/decode hardware as opposed to beefing up each core which would take up more die space and would reduce how many cores you could cram in for a given size. AMD's mistake was that multi-core performance didn't make up for the loss in single-threaded performance. Pair that up with poor hit rates for cache and pipeline stalls due to a very long pipeline and you have a recipe for a disaster.

People need to stop being obsessed with reducing this problem to something as simple as "it doesn't have real cores," because the problems with Bulldozer are much greater and larger in number than merely a shared FPU but that's all everyone seems to be focused on because honestly, if you need so much floating point bandwidth that a single SIMD unit is too slow, you should be using something optimized for massively parallel SIMD operations like GPUs.

Lets say for a minute Bulldozer didn't have the second integer core, okay? Would you still be pissed off because performance is crap because the FPU has half of the floating point capability as both K12 and SB and later Intel CPUs? The FPU literally can do twice as much on K12 and SB+ because it's twice as wide as Bulldozer's.

So if you want to get pissed off about something, get pissed off about that because a second integer core doesn't change the fact that the FPU already is seriously under-powered, even if it wasn't shared, which will continue to plague AMD if they don't change that in Zen.

FR@NKI remember feeling very cheated when my new 60GB drive was only showing 55.87GB.

You mean how you can still buy a 1TB drive and find that 92.7GB is "missing" because people don't realize that HDD manufactures state SI prefixed bytes and not binary prefixed bytes?

#371

FordGT90Concept

"I go fast!1!11!1!"

www.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1

#372

FR@NK

AquinusThe high failure rate class action?

I see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.

The class wanted a 7% refund on the drives they bought which as you can see below nearly matches the difference when referring to gigabytes. Also notice how the difference increases as harddrives get larger and use larger prefixes.

Aquinusbecause people don't realize that HDD manufactures state SI prefixed bytes

Again: Who do you think is sitting in the jury box? Ignorant consumers! This is why im not surprised AMD is getting sued over core counts.

#373

Aquinus

Resident Wat-man

FordGT90Conceptwww.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1

That required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.) The difference is that there is nothing wrong with stating that a core is merely registers and combinational logic. If AMD has to do anything, it will the fine print on the back of the box that explains the difference between integer cores and their relationship to the FPU.

The argument falls apart when you consider what would happen if AMD had doubled the width of the single FPU (not add a second one,) per module and it's impact it would have had on floating point performance and I'm willing to bet that you would instantly make up the difference but, that still doesn't fix the integer cores which is where a lot of performance is lost. Once again, the class action makes it sound like bulldozer sucks because it has a shared FPU when it's really because it has gimped FPUs. Sharing it was smart, slimming it out was not. A similarly clocked Intel quad core will have double the floating point performance than an "8 core" BD chip at the same clock. It also happens to be the case (as I said before,) that the FPU per module is half of the width of the FPU on K12 and SB through at least Haswell. If BD had FPUs that were twice as wide, it would still be shared but, if you consider the clocks that BD runs at, you make up some of that difference and floating point performance would line up more with a 6c Intel CPU if that were the case instead of somewhere between a dual-core and quad-core Intel chip at the same clock.

Simply put, you could still have a FPU on every core but, if they make the FPU half as wide than it is now per every module, you're still stuck with the same crappy performance because your ability to dispatch hasn't been improved. When using any streaming SIMD task with floating point data, the wider FPU at any given clock speed will always be faster than a narrower one because half the width means twice as many cycles to do the same thing and fewer cycles to complete a task means better IPC. So despite having twice as many FPUs, the reduced width of each unit harms overall throughput.

tl;dr: Increasing the width of the already shared FPU by double would have the same performance characteristics as doubling the number of FPUs with the current width which is reason alone to reject the "it's not 8 cores," claim based strictly on the FPU itself. Simply put, caveat emptor.

#374

FordGT90Concept

"I go fast!1!11!1!"

FR@NKI see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.

Yup, even DVD+/-Rs have 1 GB = 1,000,000,000 bytes on the packaging. I noticed with newer hard drives, they don't even put the capacity on the HDD label. All it has is the model number which you can usually figure out capacity from (e.g. ST1000 = 1TB, ST3000 = 3 TB).

AquinusThat required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.)

AMD needs to explain what is a core and what is not a core because what they provide doesn't fit the mold of what people expect.

#375

Aquinus

Resident Wat-man

FordGT90ConceptAMD needs to explain what is a core and what is not a core because what they provide doesn't fit the mold of what people expect.

BitTechWe asked Bernard Seite, technical advisor, AMD, whether we really should regard the two execution units within a Bulldozer Module as cores and were told, ‘If you take the overall group of applications that are running on x86, 90 per cent is integer… We look at how efficient Hyper-Threading [is]. Sometimes you have negative impact, but most of the time, you have something which is in between zero and 40. The Bulldozer Module will never be negative [in its performance gains] – you have two threads, and the two threads are not going to clash.’
...
The only time two threads within a Bulldozer Module could clash, we were told, was if each required 256-bit floating point precision, for example if both threads used the new 256-bit AVX capabilities of the CPU. This is because the floating point unit – as previously alluded to – is shared and comprises two 128-bit fp units which can be ganged to produce a single 256-bit unit. However, it’s very unlikely that we’ll see many 256-bit fp threads any time soon as the standard is new and will take time to adopt. Seite also pointed out that ideally the OS (or the complier, potentially) should be aware of the capabilities of the Module and assign the second 256-bit thread to another Module, perferrably one not running any hardcore fp work.

www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2

Except it sounds exactly like people are going to expect. Once again, people keep equating bad performance to "not really being cores."

I'm pretty sure you need to read #374 again and not just the beginning about Seagate.

Add your own comment

AMD Dragged to Court over Core Count on "Bulldozer"

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Dragged to Court over Core Count on "Bulldozer"

Related News

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts