Wednesday, January 23rd 2019

Bulldozer Core-Count Debate Comes Back to Haunt AMD

AMD in 2012 launched the FX-8150, the "world's first 8-core desktop processor," or so it says on the literal tin. AMD achieved its core-count of 8 with an unconventional CPU core design. Its 8 cores are arranged in four sets of two cores each, called "modules." Each core has its own independent integer unit and L1 data cache, while the two cores share a majority of their components - the core's front-end, a branch-predictor, a 64 KB L1 code cache, a 2 MB L2 cache, but most importantly, an FPU. There was much debate across tech forums on what constitutes a CPU core.

Multiprocessor-aware operating systems had to be tweaked on how to properly address a "Bulldozer" processor. Their schedulers would initially treat "Bulldozer" cores as fully independent (as conventional logic would dictate), until AMD noticed multi-threaded application performance bottlenecks. Eventually, Windows and various *nix kernels received updates to their schedulers to treat each module as a core, and each core as an SMT unit (a logical processor). The FX-8350 is a 4-core/8-thread processor in the eyes of Windows 10, for example. These updates improved the processors' performance but not before consumers started noticing that their operating systems weren't reporting the correct core-count. In 2015, a class-action lawsuit was filed against AMD for false marketing of FX-series processors. The wheels of that lawsuit are finally moving, after a 12-member Jury is set up to examine what constitutes a CPU core, and whether an AMD FX-8000 or FX-9000 series processor can qualify as an 8-core chip.
US District Judge Haywood Gilliam of the District Court for the Northern District of California rejected AMD's claim that "a significant majority of" consumers understood what constitutes a CPU core, and that they had a fair idea of what they were buying when they bought AMD FX processors. AMD has two main options before it. The company can reach an agreement with the plaintiffs that could cost the company millions of Dollars in compensation; or fight it out in the Jury trial, by trying to prove to 12 members of the public (not necessarily from an IT background) what constitutes a CPU core and why "Bulldozer" qualifies as an 8-core silicon.

The plaintiffs and defendants each have a key technical argument. The plaintiffs could point out operating systems treating 8-core "Bulldozer" parts as 4-core/8-thread (i.e. each module as a core and each "core" as a logical processor); while the AMD could run multi-threaded floating-point benchmark tests to prove that a module cannot be simplified to the definition of a core. AMD's 2017 release of the "Zen" architecture sees a return to the conventional definition of a core, with each "Zen" core being as independent as an Intel "Skylake" core. We will keep an eye on this case.
Source: The Register
Add your own comment

369 Comments on Bulldozer Core-Count Debate Comes Back to Haunt AMD

#352
FordGT90Concept
"I go fast!1!11!1!"
mouacykMight be referring to this test done at Anandtech, where conclusion is flawed. If you normalize the 7zip scores to the same clock speed, they are identical. The relevance to this thread is that the FX-8150 is claiming 8 cores but the 2600K is only claiming 4 cores:

Also this post, done on Linux, with more chips:
www.techpowerup.com/forums/threads/bulldozer-core-count-debate-comes-back-to-haunt-amd.251758/page-10#post-3981565

7-Zip on FX-8350 behaves like 5.6 cores, not 8.
lexluthermiesterJust because a quad core CPU came close does not settle the question of how many actual cores are at play. 7Zip is heavily floating point dependent. Not all programs are. In fact, at the time these CPU's were being made/released most CPU instructions were still being done on the interger side of things, thus the design logic. It was a gamble that didn't pay off. That doesn't mean that the integer cores are not individual cores.
Independent 8 cores come in at 8. Independent quad cores come in at 4. Independent dual cores come in at 2.

Look at 7-zip code. It uses very, very little FPU operations. It uses lots and lots and lots of ALU/memory operations, so many that the shared fetcher gets bogged down. There's other benchmarks that show a similar bottleneck.
mouacykHere's an integer scaling benchmark:

Source: techreport.com/discussion/21865/a-quick-look-at-bulldozer-thread-scheduling?post=592039



Mask 55: 4 threads on 4 modules, no resource sharing
Mask 0f: 4 threads on 2 modules, with resource sharing
Exactly. Bulldozer sucks if it doesn't get a mixed workload. The scheduler is inadequate to handle two threads in real time.
Posted on Reply
#353
bb1000
Bulldozer was not a bad CPU. price- performance. But it's not good for gaming. Now it's better than ever

The problem was that AMD was right, the way forward was several core. But they were not market leaders
Intel was not interested in selling cheap multiple core systems. And the software manufacturers did not change their software either, so it supported and used multiple cores.

Ryzen also have the same problems with windows at the beginning, where software could not figure out how to use the processor correctly.
- But the solution of Ryzen problems has make the Bulldozer a better cpu too.


Second time, with Ryzen going better, AMD corrected the mistakes they made in their first series of many cores (Bulldozer)
Most importantly, software manufacturers here also play game manufacturers.
Have start using them, maybe because they are now also available in Playstation and Xbox

Today, where the software supports multiple CPUs, the old Bulldozer actually better than 6 years ago. So It's not that bad.
So AMD is right. More CPUs were the future They just made some mistakes, with poor choice in design, and the market was not ready.
Posted on Reply
#354
londiste
The problem was never that Bulldozer had too many cores and this is not what is being argued in the court case or the article.
bb1000Ryzen also have the same problems with windows at the beginning, where software could not figure out how to use the processor correctly.
The solution to this was to handle Bulldozer module as a single SMT-d core. The problem with this solution - including why it took as long as it did - is that it went against AMD's wishes.
Posted on Reply
#355
Vya Domus
There is no right way to use this type of architecture, as a matter of fact that was the point. To make a CPU that had less resources which could be used transparently by software.
Posted on Reply
#356
londiste
Vya DomusThere is no right way to use this type of architecture, as a matter of fact that was the point. To make a CPU that had less resources which could be used transparently by software.
There is a right way to use this type of architecture. Due to shared scheduling resources, a module can be treated as a core normally would be with 2 threads being run on it thanks to SMT (CMT in this case). This quite automatically solves most of the Windows scheduling issues. Also, this was the eventual fix. Why was it not done in the first place? Because a module would show up in Task Manager as a single core. That is the main reason.

From technical perspective, AMD really had 2 objections - two threads in a module would share L2 that could prove a speed boost (this was fairly well debunked by some site, I think it was TechReport) and AMD's power management should be able to boost higher if only one module is used (which was true but that did not help enough for performance).

Edit:
I think this was the relevant Tech Report article:
techreport.com/review/21865/a-quick-look-at-bulldozer-thread-scheduling
Posted on Reply
#357
Vya Domus
londisteSMT (CMT in this case).
Decide which is it, the two are not equivalent. SMT is one thing CMT is another.
londisteThis quite automatically solves most of the Windows scheduling issues.
It didn't solve anything. All that the "fix" did was make it so that Windows prioritizes scheduling threads onto separate modules first. Given how many hundreds or thousands of threads are scheduled at any given time and how dependent they can be between each other you can take a guess as to how effective this would eventually become when you start loading all cores. Ideally, you would want independent threads being scheduled first on different modules and dependent ones onto the same modules. This obviously isn't feasible in practice and even if it would be, eventually, it wouldn't make much of a difference either.

There is no right way to do it. The same argument can be had for simple multi core processors where you'd want dependent threads to be scheduled onto the same core so they can share the same L1/L2 cache. But when do you stop so that it doesn't hurt performance ? Same thing, nothing out of the ordinary here. As astonishing as it may seem all processors share resources on a certain level and face the same types of limitations.
Posted on Reply
#358
londiste
Vya DomusDecide which is it, the two are not equivalent. SMT is one thing CMT is another.
CMT is effectively an SMT solution and the distinction is more marketing than a technical distinction in addition to trying to take over an existing and different acronym. Yes, there is an additional Integer Core with 2 additional ALUs and 2 additional AGUs but the problem is still in scheduling and shared resources and it is the same problem that SMT has with largely the same solution.
Vya DomusIt didn't solve anything. All that the "fix" did was make it so that Windows prioritizes scheduling threads onto separate modules first. Given how many hundreds or thousands of threads are scheduled at any given time and how dependent they can be between each other you can take a guess as to how effective this would eventually become when you start loading all cores. Ideally, you would want independent threads being scheduled first on different modules and dependent ones onto the same modules. This obviously isn't feasible in practice and even if it would be, eventually, it wouldn't make much of a difference either.

There is no right way to do it. The same argument can be had for simple multi core processors where you'd want dependent threads to be scheduled onto the same core so they can share the same L1/L2 cache. Same thing, nothing out of the ordinary here. As astonishing as it may seem all processors share resources on a certain level and face the same types of limitations.
Windows prioritizes scheduling threads into separate modules first because this results in best possible performance. As you said yourself, separating threads into dependent and independent is not feasible.
What else would there be to fix?
Posted on Reply
#360
FordGT90Concept
"I go fast!1!11!1!"
The lawsuit is about AMD not being entirely truthful in their marketing--failing to make the distinction between conjoined core and a traditional core clear to the customer. That raised performance expectations for consumers which AMD generally did not deliver on.

I doubt the court is even interested in getting technical.
Posted on Reply
#361
londiste
Vya DomusNo it's not. Someone even posted on here a paper describing exactly what CMT was (not by that name since it predates AMD's implementation) and how it relates to SMT and CMP.
There you go : www.eecis.udel.edu/~cavazos/cisc879-spring2008/papers/conjoining_micro04.pdf
CMP is different from CMT. Technically, CMT is part-SMT, part-CMP solution. When it comes to operating system scheduling arrangements, it is sufficiently SMT-like.

The only difference is the separate Integer Core assigned to each thread. This results in 70-80% boost from using both threads in a module as opposed to 30% from Intel's HT (which is textbook SMT). This does not play a part in operating system scheduling. Bulldozer module has single frontend, architectural features like Integer Cores and separate schedulers do not play a part for what operating system and its scheduler can do.
Posted on Reply
#363
RichF
The nonsensical nature of the lawsuit continues to grow.

Firstly, it's meritless — at the very least — due to the language of the claim.

Secondly, it only applies to purchases made in California, which is utterly arbitrary.

Thirdly, it only arbitrarily applies to a subset of the consumer-grade 8 core FX chips.

According to this article.
Posted on Reply
#364
FordGT90Concept
"I go fast!1!11!1!"
Why do you think they settled? To keep fighting it could draw the ire of the FTC which hugely expands the scope of the litigation. $12.1 million is a small price to pay for sweeping this under the rug.

This case does put anyone on notice about trying to redefine what a "core" is. Bulldozer had two "integer cores" per "core" and this misrepresentation in advertising lead consumers to believe they got more than they really did.
Posted on Reply
#365
Vya Domus
35$ for each 8 core chip sold in one state ? :roll:

Everything went skin deep as expected, this ended up having almost nothing to do with AMD's definition of a core and more to do with the complaints of some consumers not getting the performance they expected.

There was simply no way the plaintiffs could come up with a coherent argument against AMD's choice of architecture that would have affected all CPUs.

AMD did well to settle with this garbage.
Posted on Reply
#366
FordGT90Concept
"I go fast!1!11!1!"
It was a state suit, not federal. Federal would need plaintiffs from many states to file and a law firm approved to argue cases in federal courts. For whatever reason, they decided to only keep it in California.
Posted on Reply
#367
lexluthermiester
FordGT90ConceptIt was a state suit, not federal. Federal would need plaintiffs from many states to file and a law firm approved to argue cases in federal courts. For whatever reason, they decided to only keep it in California.
California is the special-snowflake capital of the continent after all.
Posted on Reply
#368
eidairaman1
The Exiled Airman
lexluthermiesterCalifornia is the special-snowflake capital of the continent after all.
Thats Right
Posted on Reply
#369
Shambles1980
never did i nor will i see these chips as 8 cores.
But given AMD went back to using real cores now have decent cpu's again and this law suit is now settled..
Maybe we can get back to having AMD vs Intel And both being Really viable choices.

If nothing else comes of this we can see that even bulldozer didn't manage to take AMD down, and now they are back in the game.
Given how fast people forgot all the Good cpu's AMD made before bulldozer. lets hope people forget about the terrible cpu's they made in an equally timely manner.
Posted on Reply
Add your own comment
Jul 22nd, 2024 09:34 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts