Friday, November 6th 2015

AMD Dragged to Court over Core Count on "Bulldozer"

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.

The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
Source: LegalNewsOnline
Add your own comment

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

#401
Aquinus
Resident Wat-man
FordGT90ConceptEven though the FPU is separate in SPARC, it behaves like an internal coprocessor.
Hint: It acts like an internal co-processor when it's dedicated per core as well. There is a very fine line where the FPU starts and ends and isn't fully coupled into the integer core like you claim. Yes, it does allow the result generated by the FPU to flow back to the integer core but, that's usually so the AGU can figure out where to put it in memory after the calculation is complete.

A FPU can not function as a processor of any kind by itself. Integer math is a requirement for any modern day machine used personally or in servers. Even GPUs which are designed to do massively parallel floating point computations must have the ability to do integer math because floating point means nothing without it. Is it really so hard to comprehend that a CPU can exist without a FPU but a CPU can't exist without integer logic?

Also, IBM's POWER7 has four DP FPUs per core and can do SMT with up to 4 threads per core. The dedicated FPUs didn't make it a core but, the singular pairs of ALUs and AGUs did. How is that not any different from the reverse case? If I recall correctly, multi-core POWER CPUs have shared instruction decode logic that gets put on to queues for each core. So not only does it have dedicated FPUs contained within a single "core", it has shared logic for all of the cores to dispatch instructions. By your logic, the POWER7 is a one core CPU because it shared resources between all of the cores but, could be 4 times as many cores because of the number of FPUs.

Either way, even if BD had a more FPUs or a beefier FPU, I think people would have still called foul on the terrible integer performance which begins with single-threaded applications running alone. AMD hoped that more cores was going to offset the degradation of IPC but, they were wrong. Haswell's integer core has twice as many ALUs as BD and one more AGU. That alone should tell you something.

Simple fact is that AMD told the public that Bulldozer was going to have a 256-bit FMA FPU per module. There was no deception. The problem is that most people don't know what the hell that means. People also don't probably know that their Intel CPU probably has dual dispatch 256-bit FPUs per integer core. Different CPUs with different goals. That's it.
Posted on Reply
#402
FordGT90Concept
"I go fast!1!11!1!"
cdawallIf it was established why can't you find a definition?
I already gave one from Webopedia. You can look at most architectures and see it matches Webopedia's definition. Example UltraSPARC T2 (UltraSPARC T1 had the FPU connected to the crossbar):
AquinusHint: It acts like an internal co-processor when it's dedicated per core as well.
The FPU is like x87 where it is connected to a system bus (crossbar in UltraSPARC T1). It's a discreet processor that handles its own instructions with its own caches. It shares nothing with any core. In Bulldozer, one instruction decoder handles three components (FPU + two integer clusters). No processor exists before or since with that kind of layout.
AquinusIs it really so hard to comprehend that a CPU can exist without a FPU but a CPU can't exist without integer logic?
I never said it couldn't but in recent history, everytime it was done, it was considered an error in hindsight. Examples: UltraSPARC T1 had one FPU to 8 cores; UltraSPARC T2 moved the FPU into the 8 cores so there's a total of 8. Bulldozer and sons had one FPU per two integer clusters; Zen is moving to one FPU per core. Gimping the FPU is a great way to lose processor sales to the competition. So technically it can be done but in application, it's foolish.
AquinusAlso, IBM's POWER7 has four DP FPUs per core and can do SMT with up to 4 threads per core. The dedicated FPUs didn't make it a core but, the singular pairs of ALUs and AGUs did. How is that not any different from the reverse case? If I recall correctly, multi-core POWER CPUs have shared instruction decode logic that gets put on to queues for each core. So not only does it have dedicated FPUs, it has shared logic for all of the cores. By your logic, the POWER7 is a one core CPU because it shared resources between all of the cores.
Oh look, it's all packed into each core like expected:

Seriously, stop thinking so hard. It is very simple.
Posted on Reply
#403
cdawall
where the hell are my stars
FordGT90ConceptI already gave one from Webopedia. You can look at most architectures and see it matches Webopedia's definition. Example UltraSPARC T2 (UltraSPARC T1 had the FPU connected to the crossbar):
So where in that image does it say every core has to be setup in this exact configuration to qualify as a core? That isn't even an x86-64 CPU so design on that end alone would allow differences.
FordGT90ConceptThe FPU is like x87 where it is connected to a system bus (crossbar in UltraSPARC T1). It's a discreet processor that handles its own instructions with its own caches. It shares nothing with any core. In Bulldozer, one instruction decoder handles three components (FPU + two integer clusters). No processor exists before or since with that kind of layout.
It took 3 generations of CPU's for Intel to implement HT again after the fiasco that was netburst. Remember that before playing the "never existed" card.
Posted on Reply
#404
FordGT90Concept
"I go fast!1!11!1!"
cdawallSo where in that image does it say every core has to be setup in this exact configuration to qualify as a core? That isn't even an x86-64 CPU so design on that end alone would allow differences.
Each core is fully autonomous. That is the defining feature of a core. Nothing is shared. Bulldozer shares a lot, UltraSPARC T1 shares nothing (has to leave the core to reach it making it a coprocessor).
cdawallIt took 3 generations of CPU's for Intel to implement HT again after the fiasco that was netburst. Remember that before playing the "never existed" card.
They're separate lineages:
Long pipelines: Pentium 4 --USA -> Core I#
Short pipelines: Pentium M --Israel-> Core/Core 2 (I think it lives on today as Atom)

HTT was never technically gone--they just weren't launching new processors of its design because Netburst was a clusterfuck that took years to clean up. That said, I really don't get your line of thought with this comment.
Posted on Reply
#405
Aquinus
Resident Wat-man
FordGT90ConceptSeriously, stop thinking so hard. It is very simple.
Take your own advice. A core is something that can (by itself,) execute instructions independently.
FordGT90ConceptOh look, it's all packed into each core like expected:

Seriously, stop thinking so hard. It is very simple.
You do realize that each one of those POWER7 cores has the same integer hardware as Bulldozer's integer core and even has shared dispatch hardware not shown on that diagram which is only describing the memory hierarchy.
Posted on Reply
#406
FordGT90Concept
"I go fast!1!11!1!"
AquinusTake your own advice. A core is something that can (by itself,) execute instructions independently.
Except that the integer cluster gets instructions decoded by separate hardware that it does not possess. It is dependent on the hardware around it--completely useless without it.
AquinusYou do realize that each one of those POWER7 cores has the same integer hardware as Bulldozer's integer core and even has shared dispatch hardware not shown on that diagram which is only describing the memory hierarchy.
I can't find any thing to support this claim. All I could find is POWER8 which does have "predecode" but look further down the pipeline and each core still has a dedicated decoder:

It almost appears that it has at least two ALUs and two FPUs. And why not? With 8 threads in the core, it can certainly keep them busy. I got no problem with multiple integer clusters and floating point clusters inside a core. The point is, each one does not constitute a core--the whole of it does. Instruction to result, it never leaves the core. The same should be said of Bulldozer's "module."
Posted on Reply
#407
BiggieShady
FordGT90ConceptI can't find any thing to support this claim.

Looks to me that instruction dispatcher is shared between 4 fixed point units, and it's all inside core boundary ... and since it's already shared isn't that what really matter how wide it is - how many instructions per clock can it dispatch ... how is this different than having a single double wide dispatcher out of core boundaries shared between two cores?
The answer is, it doesn't matter, this power 7 core could be split into 2 weaker cores that would be less super scalar on their own, each would need more cycles for wider instructions, it would be truly two independent but weaker cores.
Posted on Reply
#408
FordGT90Concept
"I go fast!1!11!1!"
Like POWER8, it appears to be a complete processor with lots of extra hardware to increase throughput. "Core boundary" is right.

I so see the similarities between that and Bulldozer yet IBM calls it what it is: a core. AMD does not. Like I said, all data points to AMD lying to making the processors look better next to Intel.

To be very clear: I have no issue with Bulldozer's design. I have an issue with AMD doubling the "core" count.
Posted on Reply
#409
BiggieShady
FordGT90ConceptTo be very clear: I have no issue with Bulldozer's design. I have an issue with AMD doubling the "core" count.
It is clear, you have an issue with code made of pure AVX 256bit instructions not scaling beyond 4 threads, you are completely fine with bad cache hits and gimped uop scheduler. IMO it should be other way round.
Posted on Reply
#410
FordGT90Concept
"I go fast!1!11!1!"
Look at the FX-8350 from the perspective of being a quad-core. AVX 256-bit becomes a non-issue.

Single-threaded performance is peripheral to the lawsuit. Yeah, it isn't the best but there's really nothing misleading about that part. AMD struggled in that department since Intel has prioritized it.
BiggieShadyLooks to me that instruction dispatcher is shared between 4 fixed point units, and it's all inside core boundary ... and since it's already shared isn't that what really matter how wide it is - how many instructions per clock can it dispatch ... how is this different than having a single double wide dispatcher out of core boundaries shared between two cores?
Because the whole of it is one core--not a component inside. If IBM called those two "Fixed Point Units" "cores," I'd be as up in arms over that as I am over Bulldozer. But they didn't because sense. If only AMD had sense.
Posted on Reply
#411
Aquinus
Resident Wat-man
FordGT90ConceptAVX 256-bit becomes a non-issue.
AVX 256-bit is already a non-issue because hardly any software relies on quad precision floating point math.
FordGT90ConceptIf IBM called those two "Fixed Point Units" "cores," I'd be as up in arms over that as I am over Bulldozer.
The other name for those "fixed point units" are ALUs. Remember when I said POWER7 has the same integer hardware as a single BD core? That's two ALUs and two AGUs.
Posted on Reply
#412
FordGT90Concept
"I go fast!1!11!1!"
AquinusThe other name for those "fixed point units" are ALUs. Remember when I said POWER7 has the same integer hardware as a single BD core? That's two ALUs and two AGUs.
Yet, nothing is shared with a neighboring "core."

Zen is going to have 4 ALUs and 2 AGUs. Does that redefine what a core is? Nope, it just increases the amount of parallelism the processor is capable of. Adding a second integer cluster does the same damn thing (not a "core").
Posted on Reply
#413
Aquinus
Resident Wat-man
FordGT90ConceptYet, nothing is shared with a neighboring "core."

Zen is going to have 4 ALUs and 2 AGUs. Does that redefine what a core is? Nope, it just increases the amount of parallelism the processor is capable of. Adding a second integer cluster does the same damn thing (not a "core").
I see the same gimped FMA FPU though. Weren't you complaining about FP throughput?
Posted on Reply
#414
BiggieShady
What would you say if Zen was presented as a 2 cores per module cpu like this? :laugh:
Posted on Reply
#415
cdawall
where the hell are my stars
FordGT90ConceptEach core is fully autonomous. That is the defining feature of a core. Nothing is shared. Bulldozer shares a lot, UltraSPARC T1 shares nothing (has to leave the core to reach it making it a coprocessor).
So by that logic sharing an L2 is not a core.
FordGT90ConceptThey're separate lineages:
Long pipelines: Pentium 4 --USA -> Core I#
Short pipelines: Pentium M --Israel-> Core/Core 2 (I think it lives on today as Atom)

HTT was never technically gone--they just weren't launching new processors of its design because Netburst was a clusterfuck that took years to clean up. That said, I really don't get your line of thought with this comment.
Simple HT showed a performance degradation in a lot of scenarios back when it first came out. Software and hardware evolved and now SMT is the status quo. So the idea that sharing reasources and an FPU is the devil and "isn't a real core" might be an issue right now, but this shit will come back. These chips were meant for an HPC cluster and performed better than Intel's offerings at the time and they did so for a reason. As you said yourself size wise the modules look more like a tradition core than what the cores do, yet in a massively multithreaded, non-biased environment you were seeing scaling near 100% per core. Something Intel hasn't been able to emulate until haswell was released.
Posted on Reply
#416
FordGT90Concept
"I go fast!1!11!1!"
AquinusI see the same gimped FMA FPU though. Weren't you complaining about FP throughput?
There is one per core. It is not gimped because it is not shared. 8 cores = 8 FPUs. In Bulldozer, not only were there 4 FPUs, but each one was only adequate for one core.
BiggieShadyWhat would you say if Zen was presented as a 2 cores per module cpu like this? :laugh:
If the called the combined object a "module" and not a "core," throw Zen into the lawsuit.
cdawallSo by that logic sharing an L2 is not a core.
L2 has always been optional. The same goes with L3 and L4 (eDRAM). They only exist to speed up memory latency. They are not critical to the function of a core. That said, L1 -> system memory would be painfully slow.
cdawallSimple HT showed a performance degradation in a lot of scenarios back when it first came out. Software and hardware evolved and now SMT is the status quo. So the idea that sharing reasources and an FPU is the devil and "isn't a real core" might be an issue right now, but this shit will come back. These chips were meant for an HPC cluster and performed better than Intel's offerings at the time and they did so for a reason. As you said yourself size wise the modules look more like a tradition core than what the cores do, yet in a massively multithreaded, non-biased environment you were seeing scaling near 100% per core. Something Intel hasn't been able to emulate until haswell was released.
Pentium 4 didn't originally come with HTT. Intel saw all of the cache misses with Pentium 4 and thought a solution to minimize performance loss when that happens is to give it a second thread to work on while the first thread was retrieving data. This was when most software was coded for a single processor. It was also something added in hindsight--not a very good implementation. When they went to design Nehalem, they started designing the architecture from the perspective of having HTT. That's why its implementation was much better.

Remember that Bulldozer was AMD's first attempt at simultaneous multithreading. First try was pretty bad (Bulldozer) and they improved it with each iteration but they couldn't fundamentally fix the blocking problems and poor single-threaded performance. Zen throws out Bulldozer's ideas and replaces it with HTT-like simultaneous multithreading. I'm not expecting AMD's Zen SMT performance to match HTT because Intel has lot of practice. At least it is a step in the right direction.

8 Intel cores is going to beat 8 Bulldozer "cores." Intel is going to charge you a lot more for the privilege though.

Diagrams above showed 75% gain at best, 25% at worst, not "near 100%" (that would be a real dual core, not a hybrid like Bulldozer is). AMD sacrificed single-threaded performance for that though where Intel did not for 0-50% gain.
Posted on Reply
#417
cdawall
where the hell are my stars
FordGT90ConceptL2 has always been optional. The same goes with L3 and L4 (eDRAM). They only exist to speed up memory latency. They are not critical to the function of a core. That said, L1 -> system memory would be painfully slow.
FPU is optional as well. Hence the lack of it's existence, obviously.
FordGT90ConceptPentium 4 didn't originally come with HTT. Intel saw all of the cache misses with Pentium 4 and thought a solution to minimize performance loss when that happens is to give it a second thread to work on while the first thread was retrieving data. This was when most software was coded for a single processor. It was also something added in hindsight--not a very good implementation. When they went to design Nehalem, they started designing the architecture from the perspective of having HTT. That's why its implementation was much better.
Bad argument, my point stands, Intel released a hunk of shit. Took something that worked in theory and applied it to a later CPU. There is no reason why we wont see the module ideology expand and continue. The design was ahead of it's time and not targeted at peasant workloads. It is and always will be an HPC chip.
FordGT90ConceptRemember that Bulldozer was AMD's first attempt at simultaneous multithreading. First try was pretty bad (Bulldozer) and they improved it with each iteration but they couldn't fundamentally fix the blocking problems and poor single-threaded performance. Zen throws out Bulldozer's ideas and replaces it with HTT-like simultaneous multithreading. I'm not expecting AMD's Zen SMT performance to match HTT because Intel has lot of practice. At least it is a step in the right direction.
Technically bulldozer could handle 2 threads per core or 4 per module on top of the whole two core idea, so where in the Windows task manager did that fall?
FordGT90Concept8 Intel cores is going to beat 8 Bulldozer "cores." Intel is going to charge you a lot more for the privilege though.
Which generation? Massively multithreaded environments outside of windows tell a tale...
FordGT90ConceptDiagrams above showed 75% gain at best, 25% at worst, not "near 100%" (that would be a real dual core, not a hybrid like Bulldozer is). AMD sacrificed single-threaded performance for that though where Intel did not for 0-50% gain.
Cool I can make diagrams where it shows nearly 100% scaling depending hugely on OS it sits inside of. Even using your numbers what scaling does HT show? It sure isn't 75%. Another proof that these are "real" cores.
Posted on Reply
#418
Frick
Fishfaced Nincompoop
One day I'll read this thread and dole out thanks whenever I learn something. Should be good. :D
Posted on Reply
#420
FordGT90Concept
"I go fast!1!11!1!"
cdawallFPU is optional as well. Hence the lack of it's existence, obviously.
In theory, not in practice.
cdawallBad argument, my point stands, Intel released a hunk of shit. Took something that worked in theory and applied it to a later CPU. There is no reason why we wont see the module ideology expand and continue. The design was ahead of it's time and not targeted at peasant workloads. It is and always will be an HPC chip.
They are wide cores. This lawsuit will likely force AMD to call them cores too.
cdawallTechnically bulldozer could handle 2 threads per core or 4 per module on top of the whole two core idea, so where in the Windows task manager did that fall?
It would still be a 4-threaded core. A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.
cdawallWhich generation? Massively multithreaded environments outside of windows tell a tale...
Sandybridge/Ivybridge which were out about the same time as Bulldozer.
cdawallCool I can make diagrams where it shows nearly 100% scaling depending hugely on OS it sits inside of. Even using your numbers what scaling does HT show? It sure isn't 75%. Another proof that these are "real" cores.
Go ahead and run your benchmarks then. I'm waiting. Here's the post, by the way. Spoiler: it will never reach 95%+ that an actual dual core would.
Prima.VeraStill haven't got my answer, if you can oc each of the 8 cores independently?
FordGT90Concept@MalakiLab claims it is possible to change the clockspeeds on the integer clusters which begs the question what speed is the FPU, instruction decoder, and so on running at? Also note in the picture how Linux calls the FX-6350 a tri-core.
Posted on Reply
#422
FordGT90Concept
"I go fast!1!11!1!"
Power management circuits can be added pretty much anywhere in a processor to shut parts of it off. It only proves that Bulldozer has those circuits.
Posted on Reply
#423
Aquinus
Resident Wat-man
FordGT90ConceptIn theory, not in practice.
Theory is having only a FPU and no integer cores. Every x86 CPU since its inception to date has had an integer pipeline. Every single one. Whereas not everyone one has had an integrated FPU. In modern times it happens to be the case that the benefit of having a FPU is enough to include it all the time but, there is absolutely nothing to suggest that the FPU is required for the definition of a core because it used to be done. It was done once before, it can be done again. Once again, as the guy from AMD said in an interview, 90% of the work CPUs handle is integer in nature (and my work as a software engineer aligns with this statement.) It only makes sense to beef out a CPU to accommodate that kind of workload if die space is at a premium.
FordGT90ConceptThey are wide cores. This lawsuit will likely force AMD to call them cores too.
2 ALUs and 2 AGUs makes them skinny cores just as the single issue 256-bit FMA FPU (which can be split into dual issue 128-bit,) is a skinny FPU. They're also independent ALUs and AGUs which can receive their own instructions which feels a whole lot like a core. They have their own registers, its own control lines, and even its own instruction cache. Even the way that they scale feels, smells, and tastes like cores and not SMT. They're also not wide cores if you're comparing the integer pipeline against Haswell's 4 ALUs and 3 AGUs or the FPU against Intel's double wide FPU that can quad-issue 128-bit ops and dual issue 256-bit AVX.
FordGT90ConceptIt would still be a 4-threaded core. A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.
So now we're letting Microsoft define a core? Are you ever going to make up your mind or are you going to keep changing it to suit your argument?
FordGT90ConceptGo ahead and run your benchmarks then. I'm waiting. Here's the post, by the way. Spoiler: it will never reach 95%+ that an actual dual core would.
Spoiler: Most multi-threaded workloads that aren't purely parallel in nature will never have 100% speed up indefinitely. More cores means more overhead.
Posted on Reply
#424
FordGT90Concept
"I go fast!1!11!1!"
AquinusOnce again, as the guy from AMD said in an interview, 90% of the work CPUs handle is integer in nature (and my work as a software engineer aligns with this statement.)
He also said blocking was possible. Cores never block other cores ergo not a dual core.
Aquinus2 ALUs and 2 AGUs makes them skinny cores just as the single issue 256-bit FMA FPU (which can be split into dual issue 128-bit,) is a skinny FPU. They're also independent ALUs and AGUs which can receive their own instructions which feels a whole lot like a core. They have their own registers, its own control lines, and even its own instruction cache. Even the way that they scale feels, smells, and tastes like cores and not SMT. They're also not wide cores if you're comparing the integer pipeline against Haswell's 4 ALUs and 3 AGUs or the FPU against Intel's double wide FPU that can quad-issue 128-bit ops and dual issue 256-bit AVX.
Except that those "cores" don't understand x86 instructions. They understand opcodes given to them by the instruction decoder and fetcher. On the other hand, a real core (even the POWER7 and POWER8 behemoths) has the hardware to interpret instruction to a result without leaving the core. So either AMD's definition is wrong or Intel, IBM, ARM Holdings, and Sun are wrong. Considering IBM produces chips that are nearly identical to Bulldozer with four integer clusters and they don't call that a quad-core, I'd say AMD is definitively wrong.
AquinusSo now we're letting Microsoft define a core? Are you ever going to make up your mind or are you going to keep changing it to suit your argument?
All modern operating systems call FX-8350 a quad-core with 8 logical processors, not just Windows. When *nix has to work on POWER7 and Bulldozer, are they really going to use AMD's marketing terms to describe what is actually there? I'd hope not.
AquinusSpoiler: Most multi-threaded workloads that aren't purely parallel in nature will never have 100% speed up indefinitely. More cores means more overhead.
Asyncronous multithreading is always capable of loading systems to 100% so long as it can spawn enough threads and those threads are sufficiently heavy. Overhead is only encountered at the start in the main thread and at the end of the worker thread (well under 1% of compute time).
Posted on Reply
#425
cdawall
where the hell are my stars
FordGT90ConceptConsidering IBM produces chips that are nearly identical to Bulldozer with four integer clusters and they don't call that a quad-core, I'd say AMD is definitively wrong.
Not to nit pick, but isn't this the exact opposite of what you said earlier? I though AMD was the only CPU to ever attempt this...
FordGT90ConceptThey are wide cores. This lawsuit will likely force AMD to call them cores too.
Doubtful. AMD can create words to describe things just as well as the next guy. If AMD can't call what they consider a module a module, I guess Intel will have to ditch HyperThreading in favor for SMT. That is literally what you are saying needs to happen.
FordGT90ConceptIt would still be a 4-threaded core. A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.
Difference is those only have ONE integer and ONE FPU, not TWO and ONE.
FordGT90ConceptGo ahead and run your benchmarks then. I'm waiting. Here's the post, by the way. Spoiler: it will never reach 95%+ that an actual dual core would.
I was very specific with the workloads that would show near 100% scaling, I would wager you cannot prove me wrong, but after reading your argument you find one useless benchmark (not real world scenario) that only uses the FPU for calculations and claim I am incorrect. As has been said a multitude of times the FPU isn't used for the majority of calculations. The real issue behind AMD isn't the configuration of the modules it is the shit design of the internal cores themselves. The module works excellent and if they were stronger cores the pure idea of this lawsuit wouldn't even exist. That my friend is actually the basic design of Zen mind you.
Posted on Reply
Add your own comment
Nov 28th, 2024 00:37 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts