Friday, November 6th 2015
AMD Dragged to Court over Core Count on "Bulldozer"
This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
Source:
LegalNewsOnline
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"
If AMD called it a 4-core with physical hyper threading this lawsuit would have been avoided, but somebody could still turn around and sue saying AMD has an "unfair" performance monopoly by deliberately under-spec'ing their processors to outperform the competition. Not saying they would win, but a lawsuit could still be filed.
Btw on a different note here's a couple of benchmarks if anyone is curious....
Check out the latency on bottom screenshot....
Phenom II @4.0 (Singlethreaded)
Phenom II @4.4 (Single)
Vishera @3.5 (stock) (Single)
Vishera @5.0 (Singlethreaded)
Vishera @4.7 (Multithreaded)
(No bandwith sorry PC kept locking up with full test).
This isn't at all like with AMD. When Intel disabled the FPU to make a 486SX it didn't then market the chip as FPU capable, hence there's nothing to sue them over. In fact they actually marketed an FPU coprocessor to go with it to restore the missing function.
However, Bulldozer doesn't have 8 discrete cores, but rather 4 siamesed ones that share resources and have lower performance as a result. Completely different scenario so no wonder they're getting sued.
AFAIK, the 486 line didn't offer a separate FPU, there was no way to restore the lost function of the 486SX. It was the 386 line that had a separate FPU available. They made something called a i487SX, but it was really a full blown 486DX, when installed it would disable the 486SX completely and take over all CPU operations. If sharing resources results in one core, than all of Intel's current desktop processors are all single-core processors... What a different company calls it doesn't really matter here. Remember when Microsoft used to call single core Pentium 4 processors with hyperthreading "two processors"? I do. So what Microsoft says doesn't really matter.
Also, there is other software, ones that are far more geared towards dealing with processor specs, that say they are 8-cores. CPU-Z says 8-Cores and 8-Threads. Microsoft's software can't even read the clock speed properly half the time, so I'd say we should listen to CPU-Z over what Windows says.
...but we've already been over all that, haven't we?
Bulldozer did the same thing with Vista. Vista (I believe 7 too) called it eight-cores because it was incapable of distinguishing them but that apparently caused problems because updates were released to fix core parking issues. Come Windows 8 and newer, Microsoft updated the operating system to definitively account for sockets, cores, and logic processors which is where we see 4 cores and 8 logic processors. CPU-Z doesn't need to schedules threads. Windows does. Microsoft did what they did deliberately so the scheduler best utilizes the processor resources. Caches have always been tiered. The closer the tier is to the ALUs and FPUs, the faster it is. Caches completely lack logic and there's numerous advantages, and virtually no disadvantages, to sharing caches (scheduler will allot the cache evenly when the load is even).
There's only a handful of FPUs shared in the computing world outside of Bulldozer (and derivatives) and all of them are set up in a way that resembles a co-processor. That is, it has it's own scheduler and all of the cores can queue work to it--effectively its own core. They don't market it as having an extra core though because that would be misleading.
Either way, L2 cache's story is very similar to the FPU. It was separate, it then was integrated, it then was shared between two cores. That doesn't make the two cores count as one.
If we are going to let Intel get away with sharing resources and still calling them separate cores, then we have to allow AMD.
Microsoft doesn't deal with clock speeds, they deal with processor states. The clock speed data they do provide is only as a convenience. That said, it appears accurate to me in Windows 10. It's pretty obvious Microsoft put a lot of effort into understanding the processor in more recent versions of Windows (probably because of their work with ARM). Yes, but would also cost a lot more as well as consuming more power and producing more heat. Let's use the test of disabling cores. Where L2 is shared, can you disable half of the cores above it and still have the processor function perfectly normal? With L2 (and L3, and L4, and so on), the definitive answer is "yes." Does Bulldozer pass the same test? The definitive answer is "no." The former constitutes of legitimate cores while the latter does not.
People shouldn't be bitching about if they're "real" cores or not. They should be questioning why the integer cores suck in the first place. It's not because of shared components, it's because each core is actually gimped. I posted this earlier but maybe people have short memories. Explain to me why BD can only process practically half as many instructions per clock versus Haswell? That alone will contribute to cruddy performance, you don't even have to look further than the integer cores to figure out that one.
People are blaming one thing, when they should be blaming another. Most operations in a CPU are going to be integer operations. While floating point math is used often, it's not used as often as the integer ALU in most circumstances which is why AMD shared it in the first place. What AMD screwed up is gimping the integer cores.
For those with a short memory or the inability to go back a page or two: Simply put, Bulldozer didn't suck because of a shared FPU, it sucks because they gimped the integer cores worse than on K10 (per clock).
have an 8 core processor and its definitely better at multi threaded stuff than others. Plus it cost a fraction of what intel has to offer.
All in all, I think we can say that bulldozer sucked because of the length of the pipeline and it's reduced ability to execute certain uOps in parallel. The pipeline introduces stalls and increases the amount of work when a stall occurs. Not being able to process as many uOps per cycle could very well mean that certain X86 instructions might require more clock cycles to complete on BD than on K10 or on an Intel CPU. None of which has to do with the FPU.
I won't deny that the CPU's FP performance is lesser than having 8 dedicated FPUs but, the question is would the CPU suck less if it did but, I don't think that's the case. There are a lot of things wrong with the architecture and the shared FPU isn't even among the biggest issues in my opinion. AMD went with slimmed down cores in order to add more of them which was a fatal mistake.
And Intel beats AMD with less cores because Intel's processors are way more powerful than AMD's. So even with 8-Cores AMD can't top Intel. AMD's 6-Core Phenoms couldn't beat Intel's 4 cores either, Intel's cores are just a lot lot faster.
My argument really boils down to two pipelines = two cores. 1 decode unit that can do 4 uOps or 2 decode units that can do 2 uOps each doesn't make a difference to me, it's doing the same thing.
Intel's HT is simply filling the gaps in the pipeline when the first thread isn't utilizing the entire thing in order to run a second thread, that's it. It's also why scaling is task dependent with HT however, scaling on FX CPUs, once again, tends to be almost linear which isn't indicative of SMT-like behavior on a single core.
It should be able to do one basic ALU operation per clock, per thread. More complex operations will cause one thread to be blocked so it would fall to one ALU operation per clock across two threads.
And I should stress on that image that the whole thing is a "core," not just the integer portion. You can't have a discreet x86 "core" without an x86 decoder.
Note that SMT increases latency which is why the performance gains are not very good.
Edit:www.overclock.net/t/1469255/fx-8350-trying-to-get-best-performance-per-core Trying to find better benchmarks but I'm not coming up with them.
With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.
Side note, back in the day on older x86 processors, there were a lot fewer bells and whistles and the core of an x86 EU was the part you said isn't a core. ;)
AMD says adding the extra "integer cluster" adds 12% to the die space. Intel has said that adding Hyper-Threading Technology adds 5% to its die space. The former begets more performance (in theory) because there's more dedicated transistors. How does 12% constitute a complete core when it is lacking the capability to prefetch and decode x86 instructions? It is a component of the core (AMD calls "module") and not a core unto itself.