Friday, November 6th 2015

AMD Dragged to Court over Core Count on "Bulldozer"

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.

The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
Source: LegalNewsOnline
Add your own comment

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

#251
Aquinus
Resident Wat-man
FordGT90Concept"Pipelines" diverge and converge. Look at the diagrams to compare. Core, Phenom II, and Bulldozer all start as one pipeline.


AMD says adding the extra "integer cluster" adds 12% to the die space. Intel has said that adding Hyper-Threading Technology adds 5% to its die space. The former begets more performance (in theory) because there's more dedicated transistors. How does 12% constitute a complete core when it is lacking the capability to prefetch and decode x86 instructions? It is a component of the core (AMD calls "module") and not a core unto itself.
Control logic isn't technically part of the execute unit or processing core, you could have a unified decoder for the entire CPU but, that doesn't mean performance is going to be good which is why it's not typically done in multi-core setups. There are parts of the CPU required for operation that aren't technically part of the processing core or execution unit if you will. As a result, control logic is not a pre-requisite to call something a core or an EU.
Posted on Reply
#252
FordGT90Concept
"I go fast!1!11!1!"
When AMD debuted the FX-60, did they not have two prefetchers, two decoders, and two identical sets of execution units? When Intel debuted the Pentium Extreme Edition 840, did it not have two identical sets of those basic components of a processor? If the answer is yes to both of those questions, you see why Bulldozer represents a single core with a minor exception.

When substracting CPU resources such as shared caches, HyperTransport, DMI, and so on, the number of transistors scales linearly the more cores are added:
One Core = ~100%
Two Core = ~200%
Four Core = ~400%
Six Core = ~600%
Eight Core = ~800% and so on

This is the way cores are understood by the public and generally considered by the industry.

AMD on the other hand:
Bulldozer One "Core": 88%
Bulldozer One "Module" = 100% (marketed as "2-core")
Bulldozer Two "Core": 176%
Bulldozer Two "Module = 200% (marketed as "4-core")
Bulldozer Three "Core": 264%
Bulldozer Three "Module" = 300% (marketed as "6-core")
Bulldozer Four "Core" = 352%
Bulldozer Four "Module" = 400% (marketed as "8-core")

They're cooking the books to make their processors look more attractive on computer spec sheets ("Why buy an Intel quad-core when you can buy an AMD 8-core for substantially less? More is better, right?"). It doesn't make it true nor accurate. Compare the two lists above. How is that not misleading? I'd argue it goes beyond misleading: it is false advertising.
Posted on Reply
#253
Aquinus
Resident Wat-man
FordGT90ConceptWhen AMD debuted the FX-60, did they not have two prefetchers, two decoders, and two identical sets of execution units? When Intel debuted the Pentium Extreme Edition 840, did it not have two identical sets of those basic components of a processor? If the answer is yes to both of those questions, you see why Bulldozer represents a single core with a minor exception.

When substracting shared hardware resources (such as various levels of cache) and technologies such as DMI and HyperTransport, the number of transistors scales linearly the more cores are added:
One Core = ~100%
Two Core = ~200%
Four Core = ~400%
Six Core = ~600%
Eight Core = ~800% and so on

This is the way cores are understood by the public and generally considered by the industry.

AMD on the other hand:
Bulldozer "Core": 88%
Bulldozer "Module" = 100% (marketed as "2-core")
Bulldozer Two "Core": 176%
Bulldozer Two "Module = 200% (marketed as "4-core")
Bulldozer Three "Core": 264%
Bulldozer Three "Module" = 300% (marketed as "6-core")
Bulldozer Four "Core" = 352%
Bulldozer Four "Module" = 400% (marketed as "8-core")

They're cooking the books to make their processors look more attractive on computer spec sheets. It doesn't make it true nor accurate. Compare the two lists above. How is that not misleading?
You make it sound like CPU architectures aren't allowed to change. Isn't that a little closed minded? Just because the trend in the past was to duplicate circuits when you needed more doesn't mean that's how it's going to work going forward. That's the exact reason why EEs make terrible software engineers. o_O
Posted on Reply
#254
FordGT90Concept
"I go fast!1!11!1!"
The problem isn't AMD's architecture. The problem is the word (which is already very well defined and understood as described in the last post) they used to account for its hardware resources.

AMD does not claim "8 integer execution units," "8 integer clusters," nor "8 integer cores" (all are true), they claim "8-core" (false).
Posted on Reply
#255
Aquinus
Resident Wat-man
An execution unit is a core. Your problem is that you seem to have this obsession with CPU control logic being part of it. It is not. Control logic is merely translation to drive the core, nothing more, nothing less. A CPU has cores but a core is not a CPU. Just as control logic is part of the CPU, not the cores. You can circle diagrams all day long until your blue in the fingers but, it won't change reality. I went to school for this stuff, I eat, breathe, and dream about this stuff, and I can tell you that you're barking up the wrong tree.
Posted on Reply
#256
FordGT90Concept
"I go fast!1!11!1!"
CPU control logic is part of a discreet core (and every discreet core has it's own control logic). Execution units are useless without it. A defining feature of CPU cores is that they are complete--they house everything they need to take instructions and output results.

Your definition of a "core" applies more to GPUs than CPUs (NVIDIA calls them CUDA "cores" where AMD calls them stream processors). Then again, CPUs are not highly parallel by nature (because logic) where GPUs are.
Posted on Reply
#257
Aquinus
Resident Wat-man
FordGT90ConceptCPU control logic is part of a discreet core (and every discreet core has it's own control logic). Execution units are useless without it. A defining feature of CPU cores is that they are complete--they house everything they need to take instructions and output results.

Your definition of a "core" applies more to GPUs than CPUs (NVIDIA calls them CUDA "cores" where AMD calls them stream processors). Then again, CPUs are not highly parallel by nature (because logic) where GPUs are.
That's why you have a degree in computer science and work in the industry, right? Want to cite some sources there, big guy? No offense, but you're making stuff up at this point. A core doesn't need to control itself, that's the CPU's job. If it does, it does it as a feedback loop where the core provides data back to the control logic in order to react accordingly to things like changes to the status register in a core, right? Come on, man. I learned this in Hardware 101, this isn't even the hard shit.
Posted on Reply
#258
FordGT90Concept
"I go fast!1!11!1!"
Every damn x86 processor on the planet.*

* Except Bulldozer and derivatives.
Posted on Reply
#259
Aquinus
Resident Wat-man
FordGT90ConceptEvery damn x86 processor on the planet.*

* Except Bulldozer and derivatives.
Good job citing sources, brah. You get a gold star. Nothing you've provided actually says anything about where the CPU itself ends and the core begins. Maybe you can explain your yourself instead of repeating yourself incessantly like a broken record.

Go to bed, Ford. Your drunk.
Posted on Reply
#260
FordGT90Concept
"I go fast!1!11!1!"
Like I need to cite sources for something that is EVERYWHERE on the internet. Here's one example:
searchdatacenter.techtarget.com/definition/multi-core-processor
A dual core set-up is somewhat comparable to having multiple, separate processors installed in the same computer, but because the two processors are actually plugged into the same socket, the connection between them is faster.
Not execution units. Not integer cores. Not integer clusters. "Processors!"

How about another?
techterms.com/definition/multi-core
Multi-core technology refers to CPUs that contain two or more processing cores. These cores operate as separate processors within a single chip.
"Processors!"

Or another?
www.techopedia.com/definition/5305/multicore
This technology is most commonly used in multicore processors, where two or more processor chips or cores run concurrently as a single system.

The concept of multicore technology is mainly centered on the possibility of parallel computing, which can significantly boost computer speed and efficiency by including two or more central processing units (CPUs) in a single chip.
"Processors!" and "CPUs!"

Not enough yet? Have another!
www.pcmag.com/encyclopedia/term/55926/multicore
A computer chip that contains two or more CPU processing units.
...okay, that one is just worded badly but..."CPUs!"

Here's a scholarly paper and they have a block diagram showing dual core with separate fetch/decode for each discreet core on page 27:
www.cs.cmu.edu/~fp/courses/15213-s07/lectures/27-multicore.pdf
"Core" is consistently used to describe what could be considered a separate processor (having up to a private or shared L2).


You'll have to be satisfied with that because I'm not quoting more.

Generally speaking, each core starts with prefetch (or "front end") and the core ends just before the L2 cache if the L2 cache is shared or just below the L2 cache if the L2 cache is private. This unit is effectively a stand-alone processor only missing some communication parts (e.g. RAM and chipset).
Posted on Reply
#261
Aquinus
Resident Wat-man
FordGT90ConceptLike I need to cite sources for something that is EVERYWHERE on the internet. Here's one example:
searchdatacenter.techtarget.com/definition/multi-core-processor

Not execution units. Not integer cores. Not integer clusters. "Processors!"

How about another?
techterms.com/definition/multi-core

"Processors!"

Or another?
www.techopedia.com/definition/5305/multicore

"Processors!" and "CPUs!"
A processor is not a core. A processor contains cores. There is a big difference between a multi-core processor and a multi-processor system.

Once again, your sources say nothing about where control logic lives, which is the CPU not the core.

Von Neumann disagrees with you:
2.3 Second: The logical control of the device, that is the proper sequencing of its operations can be most efficiently carried out by a central control organ. If the device is to be elastic, that is as nearly as possible all purpose, then a distinction must be made between the specific instructions given for and defining a particular problem, and the general control organs which see to it that these instructions—no matter what they are—are carried out. The former must be stored in some way— in existing devices this is done as indicated in 1.2—the latter are represented by definite operating parts of the device. By the central control we mean this latter function only, and the organs which perform it form the second specific part: CC.
web.archive.org/web/20130314123032/http://qss.stanford.edu/~godfrey/vonNeumann/vnedvac.pdf

In other words, control logic is a separate entity than what carries out the operations themselves... but you know, I clearly don't know anything on the matter. :shadedshu:
Posted on Reply
#262
FordGT90Concept
"I go fast!1!11!1!"
Von Neumann died in 1957. Hardly relevant.

I'll play your game though: a dual-core processor has two "devices" by Von Neumann's definition. There are two "central controls" and at least one "organ" under each "central control" (usually integer and floating point execution units).
Posted on Reply
#263
Aquinus
Resident Wat-man
FordGT90ConceptVon Neumann died in 1957. Hardly relevant.
Von Neumann is not relevant in CPU design? That's a joke, right? Most modern CPUs exist because of him. Death doesn't make theory irreverent, however your brain dead assertions seem to though. :shadedshu:
Posted on Reply
#265
Aquinus
Resident Wat-man
Now you're just making shit up. Simply put, the paper says the two are distinct entities, not the same. They might be on the same die but, they're not the same thing.
Posted on Reply
#266
FordGT90Concept
"I go fast!1!11!1!"
I never said they weren't "distinct entities." Control logic sits above execution units but the point your missing is that the two together comprise a core in multicore processors. There may be an over arching control logic (especially for power saving) that encompasses all of the cores on a multi-core processor but that's also something common to modern x86 processors. It's beyond the scope of this discussion because the debate is about what is or isn't a core and how many of them Bulldozer actually has.


I tried finding some more recent news on the class action suit and turned this up:
wccftech.com/amd-class-action-lawsuit-bulldozer-processor-core-count/
AMD has just officially replied to the allegations and stated the following: “We believe our marketing accurately reflects the capabilities of the “Bulldozer” architecture which, when implemented in an 8 core AMD FX processor is capable of running 8 instructions concurrently.”
That's a pretty weak defense because Hyper-Threading Technology does the same. Additionally, the threads do not run concurrently through prefetch (true of Bulldozer, Piledriver, Steamroller, & Excavator) and decoding (true of Bulldozer & Piledriver) so, as with all cases of SMT, that statement is only true some of the time.

pacermonitor.com/public/case/9674725/Dickey_v_Advanced_Micro_Devices,_Inc
It appears AMD motioned to dismiss the case. The hearing is scheduled for 2/26/2016.
Posted on Reply
#267
Aquinus
Resident Wat-man
FordGT90Conceptwccftech.com/amd-class-action-lawsuit-bulldozer-processor-core-count/
Since when it WCCFTech reliable and since when does disagreeing with you make them wrong? :wtf:
FordGT90ConceptThat's a pretty weak defense because Hyper-Threading Technology does the same. Additionally, the threads do not run concurrently through prefetch (true of Bulldozer, Piledriver, Steamroller, & Excavator) and decoding (true of Bulldozer & Piledriver) so, as with all cases of SMT, that statement is only true some of the time.
I think someone hasn't been reading my posts because that isn't how hyper-threading works:
AquinusThat's not how hyper-threading works, hyper-threading utilizes unused parts of the pipeline to run that second thread, it doesn't do any parallel execution on a single stage. What it can do is execute multiple of the same kind of uOP on the ALU at any given time that is to say, if you have 3 of the same uOps per instruction in a row on data that's not dependent on the results, the CPU can execute them in parallel to some extent. You're conflating instruction-level parallelism (parallel uOps,) and thread-level parallelism (parallel instructions). Two very different things. Simply put, the only time uOps can be executed in parallel on the ALU is when they're the same uOp. You can't add and sub at the same time.

With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.
That's not parallel execution, that's fitting two parallel tasks in the same serial pipeline by filling the gaps, hence why improvements tend to be minor and dependent on the workload. In other words, when one thread isn't using particular resources, another will but, two threads can never use the same resources in a core at the same time. Once again, you seem to be intent on conflating instruction-level parallelism and thread-level parallelism. Saying the same false thing over and over again doesn't make you right. :slap:

It's not a weak defense because FX CPUs do actually execute concurrently as opposed to simultaneously and I'm sure come the hearing AMD will bring in some engineers to explain exactly why that's the case. Hyper-threading does not because the dedicated hardware to do it simply isn't there.

Now, there are SMT systems that are a little more complex and do have extra dedicated hardware like some of the latest SPARC CPUs in order to run 8 threads per core but, it's a very different animal than Intel's HT or AMD's FX modules.

Simply put, back to the initial argument, control logic is part of the CPU, not the core. A core (or execution unit,) alone without the CPU doesn't mean diddly squat because there wouldn't be anything to drive it. There is absolutely no requirement that says that control logic has to be dedicated for every core. This is true for x86, this is true for GPUs, this is true for SPARC. It's true for just about every microprocessor in the world but, just because there are several cases where it is dedicated, you seem to think you can derive the truth from observation which is simply a joke.
Posted on Reply
#268
FordGT90Concept
"I go fast!1!11!1!"
AquinusSince when it WCCFTech reliable and since when does disagreeing with you make them wrong? :wtf:
It was a quote directly from AMD.
AquinusThat's not parallel execution...
AMD said "running 8 instructions concurrently" and that's exactly what HTT does too. The instructions are prefetched and decoded while the execution units execute what they can when they can.

Simultaneous is synonymous with concurrent.

SMT is a grayscale, not black and white. On the black end, you have technologies like HTT where there's very little extra transistors to make it work; on the white end (but not including) you have a second discreet processor. I'd argue that Bulldozer is as close to white as currently exists while HTT is very close to black. SPARC is in between the two. SPARCs SMT design is actually very similar to HTT but where HTT assumes cache misses will be rare (thanks to huge caches), SPARC assumes they'll be common. SPARC fills in the gaps from cache misses by working on other threads.


In your definition of "core" does it only include integers or does it also include the floating point units? Additionally, what do you call the unit of hardware which encompasses prefetch, decode(rs), execution unit(s), and may or may not include L2 shared cache? Now when you take that unit of hardware and place them together two form four discreet hardware units one die, what do you call them? And I'm talking all x86; not just Bulldozer and derivatives.

I would answer as: CPU (or processor) contains one or more cores constituting of one prefetcher and one or more decoders and execution units.
Posted on Reply
#269
RejZoR
AMD says cores because 99% of people don't know what "integer" cores or "virtual" cores even means.

If number of cores is the issue, then why is no one making fuss about shaders on graphic cards? AMD has tons more of them compared to NVIDIA and yet no one makes any fuss about it. They aren't of same performance either. They just are. AMD can call it 2000 core CPU if they design it in such a way. In the end, in either case you need becnhmarks to assess performance. Because even quad core to quad core comparison NEVER yields the same results, especially not from different company.

So, why all this fuss?
Posted on Reply
#270
Aquinus
Resident Wat-man
FordGT90ConceptAMD said "running 8 instructions concurrently" and that's exactly what HTT does too.
Did you not read my post? Intel HT does not execute concurrently, it executes serially by inserting workloads from concurrent processes into a single pipeline. THAT IS NOT PARALLEL EXECUTION.
AquinusThat's not how hyper-threading works, hyper-threading utilizes unused parts of the pipeline to run that second thread, it doesn't do any parallel execution on a single stage. What it can do is execute multiple of the same kind of uOP on the ALU at any given time that is to say, if you have 3 of the same uOps per instruction in a row on data that's not dependent on the results, the CPU can execute them in parallel to some extent. You're conflating instruction-level parallelism (parallel uOps,) and thread-level parallelism (parallel instructions). Two very different things. Simply put, the only time uOps can be executed in parallel on the ALU is when they're the same uOp. You can't add and sub at the same time.

With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.
AquinusThat's not parallel execution, that's fitting two parallel tasks in the same serial pipeline by filling the gaps, hence why improvements tend to be minor and dependent on the workload. In other words, when one thread isn't using particular resources, another will but, two threads can never use the same resources in a core at the same time. Once again, you seem to be intent on conflating instruction-level parallelism and thread-level parallelism. Saying the same false thing over and over again doesn't make you right. :slap:
Posted on Reply
#271
FordGT90Concept
"I go fast!1!11!1!"
RejZoRIf number of cores is the issue, then why is no one making fuss about shaders on graphic cards? AMD has tons more of them compared to NVIDIA and yet no one makes any fuss about it. They aren't of same performance either. They just are. AMD can call it 2000 core CPU if they design it in such a way. In the end, in either case you need becnhmarks to assess performance.
Because the architectures are wildly different. I tried to compare block diagrams of GCN and Maxwell a while ago to find analogues and I wasn't getting anywhere. The parallelism of GPUs and the fact that GPUs serve as co-processors grants them a lot of flexibility CPUs can't afford. Xeon Phi demonstrates this well.

Bulldozer underperformed Thuban (Phenom II) in most cases.
Posted on Reply
#272
Aquinus
Resident Wat-man
FordGT90ConceptBulldozer underperformed Thuban (Phenom II) in most cases.
Which is a result of what again? The core?! Maybe you forgot one of my earlier posts that showed how AMD's FX cores are gimped compared to prior uArchs.
AquinusBullshit. There are a lot of instructions that not only execute in 1 second cycle, it can sometimes do several of the same instruction at once.

Before I grab part of this document, I will quote it:


Source: gmplib.org/~tege/x86-timing.pdf
Lets look at Sandy Bridge for a minute:
add, sub, and, or, xor inc, dec, neg, and not all execute in a single clock cycle and can process 3 of these uOps at once per core. Haswell expanded that to 4 uOps per cycle from 3 on SB. Even AMD's K10 was the same way but then you look at AMD's BD1 (which is what we're all huffy about,) and you notice that these same instructions can only do 2 uOps per clock cycle on Bulldozer. Then there are cases like double shift left and right which has a fraction of the performance on BD versus modern Intel CPUs.

People need to get their information right. Bulldozer is slow because dedicated components are skimped on, the fact that instructions usually take the same number of cycles as its Intel counterpart in many cases however, have much less throughput resulting in uOps having to be run more often than they would otherwise, which increases latency and translates certain full instructions into a longer set of uOps because of the CPU. So you might have an instruction with uOps that an Intel CPU could execute in one clock cycle but the AMD CPU might need two because it doesn't have enough resources in a single core to do it all at once.

For what its worth, Intel cores might not execute instructions "faster" but, it's that they can do more of them in a single clock cycle but both AMD and Intel both have a lot of core x86 instructions that not only occur in one cycle but, can execute multiple of the same uOps in the same cycle, which is where pipelining comes into play for instructions that allow pipelining.

It's also worth noting that there are x86 instructions that are not pipelined for various reasons. That's in this other document:
www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
Posted on Reply
#273
FordGT90Concept
"I go fast!1!11!1!"
facepalm.jpg

Using that logic, Bulldozer should be almost twice as fast because there are two "cores" per core. But no! You need a separate thread to access those! Even when you compare 8 threads (making it a fairer comparison), Thuban is still competitive likely because the decoder got overwhelmed. This is why they added a second decoder in Steamroller and Excavator. Thuban can't keep pace with Steamroller and Excavator but it isn't clear if that is because of the process advantage or because the decoder really made that big of a difference. Even so, it's moot because that's not what the lawsuit is about. It is about the definition of a core. I searched high and low for calling an "execution unit" a core and I'm not finding anything that isn't related directly to Bulldozer and derivatives.


Meh, fuck it. You're never going to convince me and I'm never going to convince you. The court will decide if the case should be heard 2/26/2016.
Posted on Reply
#274
Aquinus
Resident Wat-man
FordGT90ConceptMeh, fuck it. 2/26/2016.
Your right, we're arguing in circles. However, I would like to offer you an honorary degree in using the Google. :p
Posted on Reply
#275
vega22
don't stop!

this thread has given me hours of entertainment :D

and some insight tbh :)
Posted on Reply
Add your own comment
Nov 28th, 2024 00:53 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts