Friday, November 6th 2015
AMD Dragged to Court over Core Count on "Bulldozer"
This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores.
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
Source:
LegalNewsOnline
The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.
511 Comments on AMD Dragged to Court over Core Count on "Bulldozer"
Intel's HT really can't be called a core, because it can't be called so on any level even though I've seen really weird namings of i7 CPU's with HT on very popular German webpage Computer Universe. AMD can't just call it quad core with 6,5 threads. It would confuse the fuck out of users. So they opted for calling cores the way they are presented to the system.
Also, look at the task manager...
It's not exactly a tightly kept secret that required rocket scientists to figure it out. 1 processor, 4 cores, 8 logical units. Difference is, those are actually cores, even though different design than one used by Intel. HT on the other hand doesn't have any kind of core appearance. It's just a side logic that tricks OS into thinking it's another core and gives CPU ability to stack more computation on the same physical core. It's confusing to casual users, but I wouldn't call it cheating on the AMD's end...
Microsoft would call them cores if they fit the definition of a core.
In the case of Bulldozer, the L2 cache is shared between the FPU and the two integer clusters. It is not shared with another core so, as the diagram shows, it is correct. One bulldozer core (containing two integer clusters) includes the L2 cache.
In the case of Core 2 Duo, the L2 cache is shared between two cores so the L2 cache is not part of either core. The two discreet cores (purple background) packaged together with the L2 cache is a module (green square):
Core 2 Quad was created by combining two dual-core modules producing a multi-chip module (MCM) quad-core CPU:
L3 was added because of the massive performance drop between L2 and RAM. Some processors are getting an L4 cache because of the massive performance drop between L3 and RAM.
www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3 It does not act as a frame buffer for the Iris Pro. Intel hinted at a separate, 16-32 MiB ESRAM could be used exclusively for Iris Pro's frame buffer in the future. Skylake-H will likely be getting the same Crystalwell L4 cache as Broadwell. We could see the same Crytalwell cache spring up on even more chips in the future (Kaby Lake, maybe even Cannonlake). Even in Excavator, the prefetch and FPUs are still shared. There's going to be a performance hit from them too. A legitimate dual-core doesn't share those things as demonstrated by the Core 2 Duo and Phenom II block diagrams.
I did some more digging on Core 2 Duo and it appears that neither core can be disabled. Conroe-L (single-core) appears to be a different chip altogether. This makes Core 2 Duo a true module because it has two of everything except L2 and control which makes them inseparable. Bulldozer is not a module because it doesn't have two of everything--it has one of some things. This is why FX-8350 should be considered a quad-core. What was previously understood as a module (complete but inseparable cores) is absent (needs two prefetchers at minimum).
Yes I got all updates plus core unparker tool. The FX8350 does more than I could imagine.
All new Atoms have modules, with 2 cores in it, sharing the same L2 cache. Are they liars too?
Your graphic you made is also completely wrong. It shows how you don't understand OoO, PRF, branch prediction, resource monitoring. www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed/3
In short, you don't understand how their microarchitecture work. 95% of the time, the module will work just the same as 2 cores, because both can share the resource in SAME TIME. In most circumstances, it will use both Integer core and each one will have a 128-bit FMAC with 128-bit Integer execution. So they can simultaneously execute most of the instructions independently without having to wait for it's turn like for hyperthreading. Totally different microarchitecture. When things begin degrading itself is when both floating point pipelines have to get together for a single integer core, to execute a single 256-bit AVX instruction, or two symmetrical SSE instructions. Then the entire FPU is taken and leave no resources to the other integer core. In theory the dispatch controller should give the integer core some instructions not needing any FPU interaction, by going to see in the instructions fetch buffer, and being able to keep it busy while the other complete it's cycles needing all the FPU. On paper it looks awesome, but it's a very very complex operation, sadly not bringing much success. Luckily, those instructions are not very often used. Still, it's a major problem AMD tried to improve in Piledriver, Steamroller and finally Excavator. It was their way to deal with new instructions too, and stay in competition.
It's a good technology, but a little too audacious for today's market. Instead of focusing on having better IPC, they mostly developed way to better dispatch the instructions. That's why they decided to come back to more traditional microarchitectures and be more competitive IPC-wise. It doesn't change the fact a module behave like 2 cores and are in fact 2 cores in a single module. Even Intel agree to that and are using modules for their Atoms. Maybe we should drag them in court too, no?
A "core" only requires data + instruction cache. Additional caches are added for boosting performance (decreasing the gaps in latency between core and system RAM).
up to 32k = L1
up to 256k = L2
up to 4M = L3
up to 64M = L4 eDRAM in 4950HQ, system RAM otherwise.
As I specified above, if a quad-core processor has 4 L2 caches, then those L2 caches are part of the core because it is not a shared resource. If the resource is shared (as is the case with Silvermont) then the resource doesn't belong to a core--it's part of the CPU package (like L3, QPI, HyperTransport, memory controller, etc. usually are). This blocking situation is never encountered on Silvermont nor Core 2 Duo. If a blocking situation is possible, I'd argue (and have argued) the whole of it is a multithreaded core, not multi-core.
A core can take an instruction and execute the whole of it without sharing any parts with any other processor. Bulldozer and sons, when executing a floating point unit task, do not fit that definition. Silvermont will happily execute eight 256-bit AVX instructions simultaneously across all cores, unlike an FX-8350. It'll do that with ANY instruction because none of the execution hardware is shared.
If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.
The only other modern exception which I believe @Aquinus pointed out earlier was SPARC processors for databases. In that case, the FPU is a practically a separate core (8:1 ratio) unto itself because databases usually don't have to deal with floating-point operations. If the cores encountered floating-point work, they'd farm it out to the floating-point core and wait for a response.