Friday, November 6th 2015

AMD Dragged to Court over Core Count on "Bulldozer"

Nov 6th, 2015 05:05 Discuss (511 Comments)

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.

The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.

Source: LegalNewsOnline

Add your own comment

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

#276

RejZoR

But again, what matters in the end is performance. AMD opted for such core design. Call them half cores or not true cores all you want, they are cores presented to the system and there are 8 of them. If they don't perform as expected, why the fuck are there 5 trillion review sites for then? Clueless people will get screwed (or shall we say they screw themselves) for not asking the right people or checking reviews. Technically speaking, if CPU had just 1 core and companies advertised it as such, no one would buy it, even if that single core literally raped all the multi-core CPU's in the market. Without looking at reviews, you can't possibly tell how well it performs. So, how different is going to the other extreme, 8 cores that supposedly aren't "real" cores?

Intel's HT really can't be called a core, because it can't be called so on any level even though I've seen really weird namings of i7 CPU's with HT on very popular German webpage Computer Universe. AMD can't just call it quad core with 6,5 threads. It would confuse the fuck out of users. So they opted for calling cores the way they are presented to the system.

Also, look at the task manager...

It's not exactly a tightly kept secret that required rocket scientists to figure it out. 1 processor, 4 cores, 8 logical units. Difference is, those are actually cores, even though different design than one used by Intel. HT on the other hand doesn't have any kind of core appearance. It's just a side logic that tricks OS into thinking it's another core and gives CPU ability to stack more computation on the same physical core. It's confusing to casual users, but I wouldn't call it cheating on the AMD's end...

#277

Frick

Fishfaced Nincompoop

RejZoRIntel's HT really can't be called a core, because it can't be called so on any level even though I've seen really weird namings of i7 CPU's with HT on very popular German webpage Computer Universe. AMD can't just call it quad core with 6,5 threads. It would confuse the fuck out of users. So they opted for calling cores the way they are presented to the system.

Has nothing to do with the topic, but stores sold the first generation i3/i5/i7 CPU's as CPU's with three, five and seven cores.

#278

RejZoR

It has to do with the topic. Because what people consider as 4 core 8 thread Intel CPU cannot be applied to AMD CPU's. If it says 8 cores, it actually has that many cores. If they are really as effective as Intel's cores number vs number, that's debatable. And that's why reviews exist. In the end, it doesn't matter if number of cores is the same or how effective they are per core or in multi-core arrangement. You have to see benchmarks in either case.

#279

FordGT90Concept

"I go fast!1!11!1!"

Microsoft would call them cores if they fit the definition of a core.

#280

Aquinus

Resident Wat-man

L2 is part of the core, huh Ford? I'm pretty sure that Core 2 duos, having a shared L2, still were individual cores. Might want to work on that diagram a bit instead of posting it incessantly. Just like control logic is part of the core too, huh? Lets stick with facts and less home-made bullshit.

#281

RejZoR

The reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.

#282

FordGT90Concept

"I go fast!1!11!1!"

AquinusL2 is part of the core, huh Ford? I'm pretty sure that Core 2 duos, having a shared L2, still were individual cores. Might want to work on that diagram a bit instead of posting it incessantly. Just like control logic is part of the core too, huh? Lets stick with facts and less home-made bullshit.

A core doesn't share any resources with another core. If an L2 cache is shared between two or more cores, none of the cores can claim it as theirs.

In the case of Bulldozer, the L2 cache is shared between the FPU and the two integer clusters. It is not shared with another core so, as the diagram shows, it is correct. One bulldozer core (containing two integer clusters) includes the L2 cache.

In the case of Core 2 Duo, the L2 cache is shared between two cores so the L2 cache is not part of either core. The two discreet cores (purple background) packaged together with the L2 cache is a module (green square):

Core 2 Quad was created by combining two dual-core modules producing a multi-chip module (MCM) quad-core CPU:

RejZoRThe reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.

L3 was added because of the massive performance drop between L2 and RAM. Some processors are getting an L4 cache because of the massive performance drop between L3 and RAM.

#283

Aquinus

Resident Wat-man

RejZoRThe reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.

Yessir. The hit rates on CPU cache nowadays are nutty high, north of 85-90% in a lot of cases, which probably explains why faster memory doesn't do a whole lot of good.

FordGT90ConceptSome processors are getting an L4 cache because of the massive performance drop between L3 and RAM.

You mean the eDRAM cache? That's strictly for the iGPU if I recall correctly because the only chips that sport it are ones with Iris Pro.

#284

ThE_MaD_ShOt

I think that depends on what version of windows you are using. Under win 7 fx8's show up as 8 cpus and win 10 they show up as 4 cpus with 8 threads. I think this was done to help with the performance of Amd processors, but not totally sure on that.

#285

Aquinus

Resident Wat-man

ThE_MaD_ShOtI think that depends on what version of windows you are using. Under win 7 fx8's show up as 8 cpus and win 10 they show up as 4 cpus with 8 threads. I think this was done to help with the performance of Amd processors, but not totally sure on that.

There is a minor performance hit when using the second core in the module. Probably as @FordGT90Concept described as how the decoder was getting overwhelmed which is why they added a second one in Steamroller.

#286

FordGT90Concept

"I go fast!1!11!1!"

AquinusYou mean the eDRAM cache? That's strictly for the iGPU if I recall correctly because the only chips that sport it are ones with Iris Pro.

The eDRAM can be used by Iris Pro and the CPU:
www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3

AnandTechUnlike previous eDRAM implementations in game consoles, Crystalwell is true 4th level cache in the memory hierarchy. It acts as a victim buffer to the L3 cache, meaning anything evicted from L3 cache immediately goes into the L4 cache. Both CPU and GPU requests are cached. The cache can dynamically allocate its partitioning between CPU and GPU use. If you don’t use the GPU at all (e.g. discrete GPU installed), Crystalwell will still work on caching CPU requests. That’s right, Haswell CPUs equipped with Crystalwell effectively have a 128MB L4 cache.

It does not act as a frame buffer for the Iris Pro. Intel hinted at a separate, 16-32 MiB ESRAM could be used exclusively for Iris Pro's frame buffer in the future. Skylake-H will likely be getting the same Crystalwell L4 cache as Broadwell. We could see the same Crytalwell cache spring up on even more chips in the future (Kaby Lake, maybe even Cannonlake).

AquinusThere is a minor performance hit when using the second core in the module. Probably as @FordGT90Concept described as how the decoder was getting overwhelmed which is why they added a second one in Steamroller.

Even in Excavator, the prefetch and FPUs are still shared. There's going to be a performance hit from them too. A legitimate dual-core doesn't share those things as demonstrated by the Core 2 Duo and Phenom II block diagrams.

I did some more digging on Core 2 Duo and it appears that neither core can be disabled. Conroe-L (single-core) appears to be a different chip altogether. This makes Core 2 Duo a true module because it has two of everything except L2 and control which makes them inseparable. Bulldozer is not a module because it doesn't have two of everything--it has one of some things. This is why FX-8350 should be considered a quad-core. What was previously understood as a module (complete but inseparable cores) is absent (needs two prefetchers at minimum).

#287

RejZoR

I was hoping Skylake would get L4 by default (current i7 6700k for example), but after I've seen it's basically just a smaller i7 5000 series, I just didn't bother and opted for more cores instead on 5820K.

#288

cdawall

where the hell are my stars

Scaling would show that an FX 8 core has more than 4 cores. Math would say it is physically impossible to say differently.

#289

eidairaman1

The Exiled Airman

FordGT90ConceptThat was Windows XP and XP only has two states: uniprocessor (one thread at a time) and multiprocessor (two or more threads at a time). Multiprocessor could mean two physical sockets with one core each, one socket with two cores, or one physical + one logic processor. It was updated to better handle the three variations.

Bulldozer did the same thing with Vista. Vista (I believe 7 too) called it eight-cores because it was incapable of distinguishing them but that apparently caused problems because updates were released to fix core parking issues. Come Windows 8 and newer, Microsoft updated the operating system to definitively account for sockets, cores, and logic processors which is where we see 4 cores and 8 logic processors.

CPU-Z doesn't need to schedules threads. Windows does. Microsoft did what they did deliberately so the scheduler best utilizes the processor resources.

Caches have always been tiered. The closer the tier is to the ALUs and FPUs, the faster it is. Caches completely lack logic and there's numerous advantages, and virtually no disadvantages, to sharing caches (scheduler will allot the cache evenly when the load is even).

There's only a handful of FPUs shared in the computing world outside of Bulldozer (and derivatives) and all of them are set up in a way that resembles a co-processor. That is, it has it's own scheduler and all of the cores can queue work to it--effectively its own core. They don't market it as having an extra core though because that would be misleading.

Still on 7 myself.
Yes I got all updates plus core unparker tool. The FX8350 does more than I could imagine.

#290

MalakiLab

FordGT90ConceptIn the case of Core 2 Duo, the L2 cache is shared between two cores so the L2 cache is not part of either core. The two discreet cores (purple background) packaged together with the L2 cache is a module (green square):

Let me show you the Intel Silvermont, C2000, eight cores architecture.
All new Atoms have modules, with 2 cores in it, sharing the same L2 cache. Are they liars too?

Your graphic you made is also completely wrong. It shows how you don't understand OoO, PRF, branch prediction, resource monitoring. www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed/3

In short, you don't understand how their microarchitecture work. 95% of the time, the module will work just the same as 2 cores, because both can share the resource in SAME TIME. In most circumstances, it will use both Integer core and each one will have a 128-bit FMAC with 128-bit Integer execution. So they can simultaneously execute most of the instructions independently without having to wait for it's turn like for hyperthreading. Totally different microarchitecture. When things begin degrading itself is when both floating point pipelines have to get together for a single integer core, to execute a single 256-bit AVX instruction, or two symmetrical SSE instructions. Then the entire FPU is taken and leave no resources to the other integer core. In theory the dispatch controller should give the integer core some instructions not needing any FPU interaction, by going to see in the instructions fetch buffer, and being able to keep it busy while the other complete it's cycles needing all the FPU. On paper it looks awesome, but it's a very very complex operation, sadly not bringing much success. Luckily, those instructions are not very often used. Still, it's a major problem AMD tried to improve in Piledriver, Steamroller and finally Excavator. It was their way to deal with new instructions too, and stay in competition.

It's a good technology, but a little too audacious for today's market. Instead of focusing on having better IPC, they mostly developed way to better dispatch the instructions. That's why they decided to come back to more traditional microarchitectures and be more competitive IPC-wise. It doesn't change the fact a module behave like 2 cores and are in fact 2 cores in a single module. Even Intel agree to that and are using modules for their Atoms. Maybe we should drag them in court too, no?

#291

cdawall

where the hell are my stars

8 months

#292

FordGT90Concept

"I go fast!1!11!1!"

MalakiLabLet me show you the Intel Silvermont, C2000, eight cores architecture.
All new Atoms have modules, with 2 cores in it, sharing the same L2 cache. Are they liars too?

That's an octo-core so Intel is not lying. The compute cores aren't broken up at all--nothing is shared except L2 cache.

A "core" only requires data + instruction cache. Additional caches are added for boosting performance (decreasing the gaps in latency between core and system RAM).

up to 32k = L1
up to 256k = L2
up to 4M = L3
up to 64M = L4 eDRAM in 4950HQ, system RAM otherwise.

As I specified above, if a quad-core processor has 4 L2 caches, then those L2 caches are part of the core because it is not a shared resource. If the resource is shared (as is the case with Silvermont) then the resource doesn't belong to a core--it's part of the CPU package (like L3, QPI, HyperTransport, memory controller, etc. usually are).

MalakiLabThen the entire FPU is taken and leave no resources to the other integer core.

This blocking situation is never encountered on Silvermont nor Core 2 Duo. If a blocking situation is possible, I'd argue (and have argued) the whole of it is a multithreaded core, not multi-core.

A core can take an instruction and execute the whole of it without sharing any parts with any other processor. Bulldozer and sons, when executing a floating point unit task, do not fit that definition. Silvermont will happily execute eight 256-bit AVX instructions simultaneously across all cores, unlike an FX-8350. It'll do that with ANY instruction because none of the execution hardware is shared.

#293

Aquinus

Resident Wat-man

FordGT90ConceptA core can take an instruction and execute the whole of it without sharing any parts with any other processor. Bulldozer and sons, when executing a floating point unit task, do not fit that definition. Silvermont will happily execute eight 256-bit AVX instructions simultaneously across all cores, unlike an FX-8350. It'll do that with ANY instruction because none of the execution hardware is shared.

...but the FPU isn't what did Bulldozer in, it was the reduction in the number of uOps per clock that could be accomplished by either the FPU or the integer cores. Fewer uOps per cycle means that if the bandwidth resources aren't available, full instructions could take more clock cycles to complete which could further harm performance by essentially stalling the pipeline due to these limited resources on each integer core. The net result is relatively garbage performance.

If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.

#294

BiggieShady

AquinusIf you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.

Additionally, let's not forget how late AMD is introducing uOp cache with Zen now, almost 6 years after Intel's Sandy Bridge ... I don't know how much, but absence of uOp cache in bulldozer should also contribute for lesser total net uOps/cycle

#295

FordGT90Concept

"I go fast!1!11!1!"

Aquinus...but the FPU isn't what did Bulldozer in, it was the reduction in the number of uOps per clock that could be accomplished by either the FPU or the integer cores. Fewer uOps per cycle means that if the bandwidth resources aren't available, full instructions could take more clock cycles to complete which could further harm performance by essentially stalling the pipeline due to these limited resources on each integer core. The net result is relatively garbage performance.

If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.

That's irrelevant. What is relevant is that if the FX-8350 had 8 FPUs (one to go with each integer core like a traditional core), it's multithreaded FPU performance would be better because there would no longer be any chance for blocking. The lawsuit is about AMD calling it an "8 core" processor when it is an "8 integer core" processor. AMD does not make that distinction on the box or in marketing material. It has mislead the public selling 4 multithreaded cores as 8. It would be akin to Intel calling the i7-6700 an "8 core" processor. It doesn't matter that AMD shored up the symmetrical multithreading in Bulldozer and sons with extra hardware for a performance boost. It's still a quad-core when you throw heavy FPU loads at it and they sold it as an eight-core.

#296

Frick

Fishfaced Nincompoop

FordGT90ConceptThat's irrelevant. What is relevant is that if the FX-8350 had 8 FPUs (one to go with each integer core like a traditional core), it's multithreaded FPU performance would be better because there would no longer be any chance for blocking. The lawsuit is about AMD calling it an "8 core" processor when it is an "8 integer core" processor. AMD does not make that distinction on the box or in marketing material. It has mislead the public selling 4 multithreaded cores as 8. It would be akin to Intel calling the i7-6700 an "8 core" processor. It doesn't matter that AMD shored up the symmetrical multithreading in Bulldozer and sons with extra hardware for a performance boost. It's still a quad-core when you throw heavy FPU loads at it and they sold it as an eight-core.

Is there a universal definition of an x86 core though? They could have handled it better, but I wouldn't say they were lying.

#297

FordGT90Concept

"I go fast!1!11!1!"

AMD pretty much established it with Athlon 64 X2 and Intel followed suit with Pentium D: two processors, one die. The only anomaly is Bulldozer and sons.

The only other modern exception which I believe @Aquinus pointed out earlier was SPARC processors for databases. In that case, the FPU is a practically a separate core (8:1 ratio) unto itself because databases usually don't have to deal with floating-point operations. If the cores encountered floating-point work, they'd farm it out to the floating-point core and wait for a response.

#298

newtekie1

Semi-Retired Folder

FrickIs there a universal definition of an x86 core though? They could have handled it better, but I wouldn't say they were lying.

Simply, if it can execute all the instructions in the x86, or in this case x86_64 instruction set, then it is an x86_64 core. You don't need an FPU to execute any of the instruction in the basic x86_64 instruction set, it just helps performance greatly for some of them.

FordGT90ConceptIntel followed suit with Pentium D: two processors, one die.

Yeah, the Pentium D wasn't two processor on one die...oh and the Core 2 Quad wasn't 4 processors on 1 die either.

#299

Aquinus

Resident Wat-man

I think the when push comes to push, the core count isn't really what people are pissed off about. This is all about the lackluster performance of these CPUs and I think that this is just a facade for that. No one ever said 8 cores had to be fast. :laugh:

#300

FordGT90Concept

"I go fast!1!11!1!"

newtekie1Simply, if it can execute all the instructions in the x86, or in this case x86_64 instruction set, then it is an x86_64 core. You don't need an FPU to execute any of the instruction in the basic x86_64 instruction set, it just helps performance greatly for some of them.

Indeed, FPU instructions are generally under x87 which stems from the 8087 co-processor for the 8086. Thing is, x87 has been a standard feature for about two decades now. AMD tried to depreciate it to force developers to use the GPU for FPU tasks. It failed.

newtekie1Yeah, the Pentium D wasn't two processor on one die...oh and the Core 2 Quad wasn't 4 processors on 1 die either.

Die meaning CPU socket. Yes, they were MCM'd but that's a technical detail that doesn't matter in terms of core count. Pentium D was sold as a dual core and it had two cores in two modules. Core 2 Duo was sold as a dual core and it had two cores in one module. Core 2 Quad was sold as a quad core and it has four cores in two modules. The cores (L1 instruction to L1 data) did not share any components in any of those processors.

AquinusI think the when push comes to push, the core count isn't really what people are pissed off about. This is all about the lackluster performance of these CPUs and I think that this is just a facade for that. No one ever said 8 cores had to be fast. :laugh:

It does matter. When you go to Best Buy and a guy comes up to you and says this AMD has 8 cores for $200 and this Intel has 4 cores for $300 bucks. Most consumers will go with AMD not realizing that AMD only has 4 complete cores. AMD deliberately mislead the public to get more sales. The people that filed this lawsuit, in hindsight, know they should have gone with Intel's quad-core for $100 more.

Add your own comment

AMD Dragged to Court over Core Count on "Bulldozer"

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Dragged to Court over Core Count on "Bulldozer"

Related News

511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts