AMD Dragged to Court over Core Count on "Bulldozer"

cdawall · Sep 27, 2016

FordGT90Concept said:
This is all very irrelevant anyway because a core is a core, not an integer cluster. AMD, at best, is going to settle which means they don't admit guilt to misleading the public. At worse, it will go to court, AMD will lose, and they'll likely have to pay out hundreds of millions or billions for making consumers think they got twice what they got.

You keep saying AMD will loose. How? they are not incorrect there is not a standard for what a core is and the second core of the module sure as hell isn't a thread so does that mean AMD can countersue Microsoft for misleading the public on what a thread is? Where does this nonsense end. It can act independently, it can do all of the work a core can minus FPM which has never been a written requirement of a core.

FordGT90Concept · Sep 27, 2016

The public believes core fits the Athlon X2 and Intel model which is discreet processors in one socket. Bulldozer's "cores" are not discreet. That's all the judge has to look at and decide. It's not unlike how NVIDIA sold the GTX 970 with "4 GiB of VRAM" but didn't notify the public that the last 0.5 GiB of that underperforms the rest by a huge margin. Excluding important information like your "cores" sharing FPUs or your memory gimps itself are doors for the public to seek damages.

Seagate did not counter sue Microsoft for not correctly labeling hard drive capacity (using math for GiB but showing a GB label). AMD could certainly try to sue Microsoft but where Seagate had a strong case against Microsoft (and still does), AMD really doesn't against Microsoft. What AMD wants to call a "core," no one else does. Microsoft would have to make an exception for Bulldozer and how can Microsoft adequately explain to the public what is weird about Bulldozer in two words? They can't. AMD really brought this on itself by not making it clear to the public the product is different and it will have to pay the price for it.

There is no "minus" for a core. It either is a complete processor or it isn't.

cdawall · Sep 27, 2016

It is a complete processor, each core inside of a module can function without each other and independently. They are physically present therefore they are not "logical cores" they are "physical cores"

FordGT90Concept · Sep 27, 2016

There's a lot of hardware there that indicates it isn't two physical processors:

Pretty much everything is shared except the integer clusters. We're talking about 20% of a CPU that isn't shared. 20% a processor does not make. One core: two integer clusters and two threads.

Aquinus · Sep 27, 2016

FordGT90Concept said:
We're talking about 20% of a CPU that isn't shared. 20% a processor does not make.

That "20% addition" gives you a full core worth of performance in most cases. The only time that changes is when you're exclusively using the FPU on both "integer clusters" at the same time which is an unrealistic use case. Once again, AMD gimped overall performance per clock, but, if you consider how applications scale to pure parallel workloads, it's pretty close to linear speed up for every thread added which feels a lot like a real core. Most forms of SMT don't have those kinds of performance characteristics, hyper-threading certainly doesn't.

FordGT90Concept said:
There is no "minus" for a core. It either is a complete processor or it isn't.

This line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.

FordGT90Concept · Sep 27, 2016

Let's look at it from a different perspective: failure. If an instruction fetcher fails in a Bulldozer chip, you lose two integer clusters and one FPU. If an instruction decoder fails in a quad-core Deneb or Zen, you lose one FPU and one integer cluster. Is it really separate when a single point of failure (a component that is shared) can disable both?

FR@NK · Sep 28, 2016

AMD can claim there are two cores per module but thread scaling seems to disagree:

This is a FX-8320 piledriver 8c/8t.

Now compare an Ivy bridge 4c/8t.

It looks like based on what i've seen, each module was designed as one core with extra hardware for multithreading under most workloads. Somewhere along the way they decided to market it as each module was two cores. I think this was a mistake; bulldozer would have looked like a much faster chip if the FX-8150 was marketed as a 4c/8t chip. Instead we got an 8c/8t chip that marginally hurt the performance when 2 threads were scheduled to the same module compared to spreading them out between modules before doubling them up.

FordGT90Concept · Sep 28, 2016

It does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core. Absolutely nothing suggests it is except AMD's marketing material.

I take it Dhrystone sees HTT and limits itself to 4 cores?

Roph · Sep 28, 2016

I'd like to see such a graph generated on a Phenom 1, as you scale up and starve it of its pathetic cache. Is that not a quad core?

FordGT90Concept · Sep 28, 2016

Since it only has four cores and no simultaneous multithreading, it would look like both of those do up to 4 and then steady at 1 beyond that. It would exactly look like Dhrystone on Ivy--any quad-core without some kind of in-core multithreading would.

cdawall · Sep 28, 2016

Aquinus said:
This line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.

This is the same issue I keep seeing, why do some people assume monolithic dies are the only things that can be described as a core? These have the ability to independently work, something you cannot do with HT. These are and always will be physically there, it isn't a "logical" core.

MalakiLab · Sep 28, 2016

FR@NK said:
AMD can claim there are two cores per module but thread scaling seems to disagree:

This is a FX-8320 piledriver 8c/8t.

Now compare an Ivy bridge 4c/8t.

It looks like based on what i've seen, each module was designed as one core with extra hardware for multithreading under most workloads. Somewhere along the way they decided to market it as each module was two cores. I think this was a mistake; bulldozer would have looked like a much faster chip if the FX-8150 was marketed as a 4c/8t chip. Instead we got an 8c/8t chip that marginally hurt the performance when 2 threads were scheduled to the same module compared to spreading them out between modules before doubling them up.

It is called Amdahl's Law. It depends on a lot of things. First on the hardware to be able to handle all the workload, as the dispatching takes more and more resources. The more processors or the more cores you throw in a processor, the more overhead there will be in management of the flow.

Second is the parallelism of the workflow you have to input in processor. Not everything can be parallelized infinitely. Codes nowadays are not well designed to be on so many cores, and you'll see on other 8 cores and 12 cores Intel processors, you'll begin to see the same exact behaviour.

If you want to read some good text, written by Intel, you can purchase this one : https://www.computer.org/csdl/mags/so/2011/01/mso2011010023-abs.html

Without talking of hyperthread or anything, after 4 cores, the current way we compile and program stuff, don't take all advantage of being parallelized. Graphic is also from Intel, on how they see cores behave, it's part of the article above.

It's funny though seeing you trying to analyse data and interpret them to suit your point of view on something.

MalakiLab · Sep 28, 2016

FordGT90Concept said:
It does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core. Absolutely nothing suggests it is except AMD's marketing material.

I take it Dhrystone sees HTT and limits itself to 4 cores?

Wrong. Dhrystone is calculating MIPS. Hyperthreading don't boost MIPS, but squeeze instructions to be dealt more efficiently, so can speed up by making sure every core don't have some execution holes and everything is well coordinated to maximize usage of all the processor components. The only way to boost the instructions per second is adding more cores. And it might not be linear after 4 cores.

FordGT90Concept · Sep 28, 2016

cdawall said:
These have the ability to independently work, something you cannot do with HT.

There is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

MalakiLab · Sep 28, 2016

FordGT90Concept said:
There is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

You're wrong. If you worked with Xeons, in datacenters, company medium clusters or supercomputers, you'd know that. Software and the workload are having a huge repercussion on how multicore and multiprocessor behave.

I'll show you yet another example of a program i personally worked on. http://compression.ca/pbzip2/

Nothing in real life have an infinite linear curve. There's always a limit to which an application can be parallelized. Every processor, every software and every instructions show that kind of behaviour.

It is a law, it is calculated, it is calculable. http://research.cs.wisc.edu/multifacet/amdahl/

Get your science right.

EDIT : You remained blind to all proof presented to you so far. Don't say there's no proof, everything converge as a proof. If you had an once of honesty i can prove you are wrong. I made the test myself some times ago, as i worked on the ondemand governor, which made the core clock fluctuate depending on the workflow. I made the test to be 100% sure if one core is at 1600MHz and the other is at 3900MHz, that the virtual machine bound to one of the module core won't get affected by the other at lower speed. Both integers and FPU are slower on the 1600MHz core, and both are faster on the 3900MHz core. Both work independently. Until i throw an AVX instruction, then the entire FPU clock is getting at the speed of the core asking for the unification, until the instruction is done. You can even try it yourself with QEMU/KVM, while putting a core affinity to the VMs, one on the first module core, other on the second module core. Very easy to replicate. When i don't use AVX, i have 100% of the time 2 cores in a module. But you continue putting your head in sand, playing blind.

EDIT 2 : I am also surprised on how inaccurate your point of views are. It's actually the opposite, AMD have a semi SMT, because it can't have another thread on a core in a module. It's one of the bottleneck of the cores in the module. with the fact it is not ordering well. Hyperthreading is a much much better SMT implementation, and takes a lot more space on the die. Complete opposite. For that you seem to talk about, you're claiming it's transparent to the OS, when it's not, at all. 95% of the thread management is made my the kernel, the threads library and the kernel, all software. The processor only order them to the right core/module and then to the right thread. The processor don't decide what core will take what thread, the kernel do. What the processor decide is what it will do with the thread. The Intel SMT is better too because it don't have 2 cores to supply, the hyperthreaded one don't have to be on time and constantly supplied like the AMD Bulldozer have to.

Microsoft is very very bad at handling threads, unlike Linux. Because Linux was used with SPARC and other servers to have like 8 chips, with 16 cores, 128 threads. But before hyperthreading, Windows kernel never had a proper threading library. It's normal for a SPARC to have so many threads, as it's a RISC, not a CISC. In Linux they just have to modify some point of the kernel to make it recognize the module as a core with multiple threads. It doesn't change anything, except it will address threads like it should be. If the big SMT would be detrimental for the design, you can be sure a SPARC or ALPHA processor would be bottleneck like there's no tomorrow. But it's not. Pretty much everything composing your logic is opposite to what is established in computer engineering.

cdawall · Sep 29, 2016

FordGT90Concept said:
There is hardware both clusters rely on (not independent). AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space. An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4). What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core. There's absolutely nothing wrong with that. In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores. Nothing suggests it does except the box. The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.

That's great and all until you end up being incorrect.

FordGT90Concept · Sep 29, 2016

Meh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand. AMD is going to lose.

Aquinus · Sep 29, 2016

FordGT90Concept said:
Meh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand. AMD is going to lose.

The ignorance of the consumer makes AMD wrong? Interesting. Is that seriously what you've reduced your argument to? That makes absolutely no sense. It's like saying the consumer was ignorant so AMD will be punished. That's laughable at best.

FordGT90Concept · Sep 29, 2016

It made Seagate wrong when they were class-action sued.

FR@NK · Sep 29, 2016

Aquinus said:
The ignorance of the consumer makes AMD wrong?

Who do you think is sitting in the jury box? Ignorant consumers!

FordGT90Concept said:
It made Seagate wrong when they were class-action sued.

I remember feeling very cheated when my new 60GB drive was only showing 55.87GB. I imagine it would be much worse when you realize you are missing half of your cores.

Aquinus · Sep 30, 2016

FordGT90Concept said:
It made Seagate wrong when they were class-action sued.

The high failure rate class action? How was the consumer wrong and ignorant when their drives actually did have high failure rates? The funny thing about that is that failure rate can be quantified, just as performance can be and you'll find out very quickly that performance really drops off when you're doing all floating point math which isn't a realistic load for a normal application as there are usually a lot of integer operations mixing in with floating points ops so, even if there is a shared FPU, realistically multi-core performance won't suffer as much.

When people complain about Bulldozer, what is the #1 complaint? I'll give you a hint: It's not multi-core performance. The biggest complaint is single-threaded performance so, even without another task trying to utilize the FPU, performance still sucks and that isn't because BD doesn't have "real cores." The fact that the FPU is shared is beside the point but you seem to be incredibly intent on making it an upfront issue. The simple fact is that BD's performance blows because the number of uOps BD can execute at any given time was seriously reduced over K12. Since dispatch width per core is significantly reduced, it's completely possible that instructions before that might take 3 or 4 clock cycles now might take 5 or 6 because of the reduced width of even integer operations because AMD slimmed down the core, they didn't just share some parts like the FPU and dispatch/decode hardware as opposed to beefing up each core which would take up more die space and would reduce how many cores you could cram in for a given size. AMD's mistake was that multi-core performance didn't make up for the loss in single-threaded performance. Pair that up with poor hit rates for cache and pipeline stalls due to a very long pipeline and you have a recipe for a disaster.

People need to stop being obsessed with reducing this problem to something as simple as "it doesn't have real cores," because the problems with Bulldozer are much greater and larger in number than merely a shared FPU but that's all everyone seems to be focused on because honestly, if you need so much floating point bandwidth that a single SIMD unit is too slow, you should be using something optimized for massively parallel SIMD operations like GPUs.

Lets say for a minute Bulldozer didn't have the second integer core, okay? Would you still be pissed off because performance is crap because the FPU has half of the floating point capability as both K12 and SB and later Intel CPUs? The FPU literally can do twice as much on K12 and SB+ because it's twice as wide as Bulldozer's.

So if you want to get pissed off about something, get pissed off about that because a second integer core doesn't change the fact that the FPU already is seriously under-powered, even if it wasn't shared, which will continue to plague AMD if they don't change that in Zen.

FR@NK said:
I remember feeling very cheated when my new 60GB drive was only showing 55.87GB.

You mean how you can still buy a 1TB drive and find that 92.7GB is "missing" because people don't realize that HDD manufactures state SI prefixed bytes and not binary prefixed bytes?

FordGT90Concept · Sep 30, 2016

http://www.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1

FR@NK · Sep 30, 2016

Aquinus said:
The high failure rate class action?

I see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.

The class wanted a 7% refund on the drives they bought which as you can see below nearly matches the difference when referring to gigabytes. Also notice how the difference increases as harddrives get larger and use larger prefixes.

Aquinus said:
because people don't realize that HDD manufactures state SI prefixed bytes

Again: Who do you think is sitting in the jury box? Ignorant consumers! This is why im not surprised AMD is getting sued over core counts.

Aquinus · Sep 30, 2016

FordGT90Concept said:
http://www.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1

That required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.) The difference is that there is nothing wrong with stating that a core is merely registers and combinational logic. If AMD has to do anything, it will the fine print on the back of the box that explains the difference between integer cores and their relationship to the FPU.

The argument falls apart when you consider what would happen if AMD had doubled the width of the single FPU (not add a second one,) per module and it's impact it would have had on floating point performance and I'm willing to bet that you would instantly make up the difference but, that still doesn't fix the integer cores which is where a lot of performance is lost. Once again, the class action makes it sound like bulldozer sucks because it has a shared FPU when it's really because it has gimped FPUs. Sharing it was smart, slimming it out was not. A similarly clocked Intel quad core will have double the floating point performance than an "8 core" BD chip at the same clock. It also happens to be the case (as I said before,) that the FPU per module is half of the width of the FPU on K12 and SB through at least Haswell. If BD had FPUs that were twice as wide, it would still be shared but, if you consider the clocks that BD runs at, you make up some of that difference and floating point performance would line up more with a 6c Intel CPU if that were the case instead of somewhere between a dual-core and quad-core Intel chip at the same clock.

Simply put, you could still have a FPU on every core but, if they make the FPU half as wide than it is now per every module, you're still stuck with the same crappy performance because your ability to dispatch hasn't been improved. When using any streaming SIMD task with floating point data, the wider FPU at any given clock speed will always be faster than a narrower one because half the width means twice as many cycles to do the same thing and fewer cycles to complete a task means better IPC. So despite having twice as many FPUs, the reduced width of each unit harms overall throughput.

tl;dr: Increasing the width of the already shared FPU by double would have the same performance characteristics as doubling the number of FPUs with the current width which is reason alone to reject the "it's not 8 cores," claim based strictly on the FPU itself. Simply put, caveat emptor.

FordGT90Concept · Sep 30, 2016

FR@NK said:
I see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.

Yup, even DVD+/-Rs have 1 GB = 1,000,000,000 bytes on the packaging. I noticed with newer hard drives, they don't even put the capacity on the HDD label. All it has is the model number which you can usually figure out capacity from (e.g. ST1000 = 1TB, ST3000 = 3 TB).

Aquinus said:
That required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.)

AMD needs to explain what is a core and what is not a core because what they provide doesn't fit the mold of what people expect.

System Name	All the cores
Processor	2990WX
Motherboard	Asrock X399M
Cooling	CPU-XSPC RayStorm Neo, 2x240mm+360mm, D5PWM+140mL, GPU-2x360mm, 2xbyski, D4+D5+100mL
Memory	4x16GB G.Skill 3600
Video Card(s)	(2) EVGA SC BLACK 1080Ti's
Storage	2x Samsung SM951 512GB, Samsung PM961 512GB
Display(s)	Dell UP2414Q 3840X2160@60hz
Case	Caselabs Mercury S5+pedestal
Audio Device(s)	Fischer HA-02->Fischer FA-002W High edition/FA-003/Jubilate/FA-011 depending on my mood
Power Supply	Seasonic Prime 1200w
Mouse	Thermaltake Theron, Steam controller
Keyboard	Keychron K8
Software	W10P

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	All the cores
Processor	2990WX
Motherboard	Asrock X399M
Cooling	CPU-XSPC RayStorm Neo, 2x240mm+360mm, D5PWM+140mL, GPU-2x360mm, 2xbyski, D4+D5+100mL
Memory	4x16GB G.Skill 3600
Video Card(s)	(2) EVGA SC BLACK 1080Ti's
Storage	2x Samsung SM951 512GB, Samsung PM961 512GB
Display(s)	Dell UP2414Q 3840X2160@60hz
Case	Caselabs Mercury S5+pedestal
Audio Device(s)	Fischer HA-02->Fischer FA-002W High edition/FA-003/Jubilate/FA-011 depending on my mood
Power Supply	Seasonic Prime 1200w
Mouse	Thermaltake Theron, Steam controller
Keyboard	Keychron K8
Software	W10P

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 4TB External
Display(s)	Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	96w Power Adapter
Mouse	Logitech MX Master 3
Keyboard	Logitech G915, GL Clicky
Software	MacOS 12.1

Processor	7900
Motherboard	Rampage Apex
Cooling	H115i
Memory	64GB TridentZ 3200 14-14-14-34-1T
Video Card(s)	Fury X
Case	Corsair 740
Audio Device(s)	8ch LPCM via HDMI to Yamaha Z7 Receiver
Power Supply	Corsair AX860
Mouse	G903
Keyboard	G810
Software	8.1 x64

System Name	It does stuff
Processor	Ryzen 3600
Motherboard	B550 Gaming X V2
Cooling	Stock
Memory	16GB DDR4 3600
Video Card(s)	RX 6700XT
Storage	Too much
Display(s)	27" & 21.5"
Case	Antec 300
Power Supply	750W

AMD Dragged to Court over Core Count on "Bulldozer"

where the hell are my stars

"I go fast!1!11!1!"

where the hell are my stars

"I go fast!1!11!1!"

Resident Wat-man

"I go fast!1!11!1!"

"I go fast!1!11!1!"

"I go fast!1!11!1!"

where the hell are my stars

New Member

New Member

"I go fast!1!11!1!"

New Member

where the hell are my stars

"I go fast!1!11!1!"

Resident Wat-man

"I go fast!1!11!1!"

Resident Wat-man

"I go fast!1!11!1!"

Resident Wat-man

"I go fast!1!11!1!"