Friday, November 6th 2015
AMD Dragged to Court over Core Count on "Bulldozer"
This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
Source:
LegalNewsOnline
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"
Incidentally, when I first built my system, I of course posted about it on TPU. I then couldn't resist playing a little joke on everyone by saying how pleased I was with my new "5 core" CPU and actually posted a screenshot of TM running 5 threads. :laugh: The best bit was that it actually took a little while for people to catch on, lol.
Still though , I maintain my position that if they could, they would have made 3 core cpu's.
Anyways its up to the judge to determine if AMD is a cheat. Not us.
Anyways AMD might have a genuine 8 core, its all up to interpretation look at this thread alone you can make a good argument either way. None of us here can say definitively whether AMD is in breach of anything. Lets wait for the legal system to decide.
amd will win this if the judge is clued up, if not.....
But no, they didn't screw their customers because they always price their products based on the performance those products offer compared to Intel products. Anyone hoping to get a quad core FX chip and beat a 3-5 times more expensive i5, well, why pay for an FX chip? Go and buy a cheap quad core Braswell tablet/miniPC/Stick/whatever and destroy that i5 Skylake. Right? Riiiiiiiggghhhttttt.........
I'm apparently not the only person who thinks the FPU claim for what constitutes a core is bogus.
The FPU uses SMT. I'd argue that any core that uses SMT excludes itself from equating threads to cores.
The redeeming feature for Intel is AMD's slow cache.
secondcycle, it can sometimes do several of the same instruction at once.Before I grab part of this document, I will quote it:
Source: gmplib.org/~tege/x86-timing.pdf
Lets look at Sandy Bridge for a minute:
add, sub, and, or, xor inc, dec, neg, and not all execute in a single clock cycle and can process 3 of these uOps at once per core. Haswell expanded that to 4 uOps per cycle from 3 on SB. Even AMD's K10 was the same way but then you look at AMD's BD1 (which is what we're all huffy about,) and you notice that these same instructions can only do 2 uOps per clock cycle on Bulldozer. Then there are cases like double shift left and right which has a fraction of the performance on BD versus modern Intel CPUs.
People need to get their information right. Bulldozer is slow because dedicated components are skimped on, the fact that instructions usually take the same number of cycles as its Intel counterpart in many cases however, have much less throughput resulting in uOps having to be run more often than they would otherwise, which increases latency and translates certain full instructions into a longer set of uOps because of the CPU. So you might have an instruction with uOps that an Intel CPU could execute in one clock cycle but the AMD CPU might need two because it doesn't have enough resources in a single core to do it all at once.
For what its worth, Intel cores might not execute instructions "faster" but, it's that they can do more of them in a single clock cycle but both AMD and Intel both have a lot of core x86 instructions that not only occur in one cycle but, can execute multiple of the same uOps in the same cycle, which is where pipelining comes into play for instructions that allow pipelining.
It's also worth noting that there are x86 instructions that are not pipelined for various reasons. That's in this other document:
www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
It does appear I was backwards...assuming no cache misses; that's the point though, isn't it? With two threads in the core, more usually gets done. The difference between HTT and Bulldozer's implementation is that Bulldozer should theoretically (assuming all else was equal) be able to do more integer operations in the same time frame. That still doesn't change the definition of a core.
I have no problem with Bulldozer's design. I have a problem with AMD calling it 8-cores (except where appropriate in some Opterons).
I find it ironic K10 beats Bulldozer on pretty much every one. The only advantage Bulldozer has over K10 is the higher clockspeeds.
The simple fact is that BD has two real cores, the problem is that while uOps execute just as fast, instructions that have certain combinations of uOps is going to impact AMD's BD core a lot more than one of Intel's. Even Intel has shown that they would rather beef up a core and AMD's problem is that two lanky cores isn't going to provide the single-threaded throughput you want. If there are instructions that are taking fewer cycles to complete on Intel CPUs, that's a pretty tell tale sign that it's the cores themselves. Add to that the fact that BD cores scale almost linearly on purely parallel workloads (excluding certain FP applications but, that really depends on the particular instructions being used.)
Nothing here to me says they're not 8 real cores. What people are pissed off about is that they're 8 gimped cores, even for integer operations but, that's not because of shared components. If it was a real implementation of hardware SMT like hyper-threading, we wouldn't see the kind of scaling we're seeing with modules which is near linear for purely parallel workloads. What we're seeing is 8 core CPUs where every core is something like 80% of what it should be. It scales properly and runs properly, with the exception that single threaded performance is 20% less than where it should have been and that people were expecting Phenom II like performance in single-threaded applications but BD performance on multi-threaded applications which wasn't the result.
AMD made some choices and it resulted in focusing on more cores and less on individual core performance. As a result, people got irritated that their skinny cores couldn't bite off enough at once and wanted their fatter cores that were more efficient in single threaded applications back (here comes Xen!)
Our disagreement isn't that Bulldozer blows, it's how it blows, and I think blaming the FPU and shared components is a bit of a stretch given the amount of information that indicates that even integer performance is tailing K10 per clock. They only try to make up for that with clock speeds, as you said. None of this has to do with whether it has 8 real cores or not, it has to do with how shitty the slimmed down integer cores are. Mix that with the shared FPU and added latencies on FP instructions, and you have a recipe lackluster performance. All of which still can happen even if there are 8 real cores.
Take Intel's 8c Atom the C2750 I think it is. It's performance trails core series CPUs at the same clock speed with half as many cores but with SMT, so does it mean that the Atom doesn't have real cores? NO! It means the Atom's core is lacking in performance despite having 8 real cores and doesn't efficiently use every clock cycle like the i5 and i7s, just like Bulldozer.
I think we can all agree there isn't much more to be said on this topic until there is a verdict. I'll take my leave until then.