• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Bulldozer Core-Count Debate Comes Back to Haunt AMD

The problem is that AMD made a design that was just hyperthreading with an extra integer unit thrown in. So technically, it is more capable than just having two execution states for one execution engine (containing an FPU and IU), since two threads can technically run at the same time, but if both threads need the FPU, then it is basically back to a multithreaded single core.

I think they may have a case in that early processors didn't even have an FPU, and were still processors. So it is technically an eight core integer CPU, and a four core floating point CPU. So you technically have eight cores that could run at once. It will depend on how much a modern processor design matters, as AMD didn't include any caveats in their marketing for the FX series to indicate that it wasn't always going to run 8 threads simultaneously. And in my personal opinion it was not correct to call them 8 core processors. They should have just called them 4 module processors, or made clear it was 8 integer cores and would perform at half speed for FP math. I wouldn't be surprised if AMD loses. I don't consider them 8 core processors since they aren't 8 core all the time.

Maybe the most damning thing is that AMD now sells 8 core processors that actually do have the FPU and IU per core, and 4 core processors with SMT that are 4 core/8 thread. In a way, that's admitting that the FX core terminology was a load of shit the whole time.


The code we ran was optimized with AMD's help. We were able to get the FPUs to operate as 2 in parallel. The problem is, most programmers don't wanna do the extra few steps or just use an Intel based compiler which views the FX series as a 4-core CPU and doesn't take advantage of any of its extra resources. The FPU in each Bulldozer module is technically 2x 64bit units but merged into using just 1 scheduler. You had to know how to code for Bulldozer to get the scheduler to run things that could utilize both.
 
Pretty sure you can't.

You are making a lot of assumptions here it seems without cheeking first, cores can be turned off. You do understand what independent means I presume, so I ask you again how can you do this without breaking the functionality of the module ?

Untitled.png
 
Disabling in no way shows that the disabled core was independent. To show that you would need to do this the other way around - disable everything else in the module and see what that core is capable of :D

Edit:
By the way did you notice how core disabling works on your screenshot - CPU core 1-2/CPU core 3-4/CPU core 5-6/CPU core 7-8. This strongly suggests that these are indeed not independent cores.
 
Wow this has been argued to death but here is my 2 cents.

It's technically and 8-core IMO and thus AMD marketing was correct. If you look at die shot within each module you will see duplicated logic blocks for all 8 cores, in 4/8 you will not see this because the two threads share existing hardware. The processor has 8 Integer cores, and they are cores, especially with the Steamroller and Excavator chips as they have their own dedicated instruction decoders too. (BD and PD had to share a decoder IIRC). The Floating point unit is capable of running SMT with two threads and has 4x FPU pipes so it can juggle two threads (and that's where it gets the 2 thread per module). I think I read somewhere AMD stated a lot of work on desktop CPU is integer based so they opted with this design. At the end of the day, it is an 8-core chip, but it shares an FPU block between groups of 2 cores.

Stupid case to try and get a bit of money, when AMD really needs each penny it can get. IMO.
 
This strongly suggests that these are indeed not independent cores.

Nope, it suggest some resources are shared between each pair. And that in turn does not negate the independence of each core. Just like how the L3 cache is shared among every CPU core in some processors.
 
If you look at die shot within each module you will see duplicated logic blocks for all 8 cores, in 4/8 you will not see this because the two threads share existing hardware.
Are you sure about that?
Lets say this is an image of a CPU module. Would you consider the parts in rectangles separate cores? ;)
core.jpg


Nope, it suggest some resources are shared between each pair. And that in turn does not negate the independence of each core. Just like how the L3 cache is shared among every CPU core in some processors.
Agreed. But would you like to guess (or list) what the shared resources are? Which resources would you consider defines an independent core?
 
"12 members of the public (not necessarily from an IT background)"

It is stupid to have non-IT people in the jury in a case like this.
 
The processor has 8 Integer cores, and they are cores, especially with the Steamroller and Excavator chips as they have their own dedicated instruction decoders too. (BD and PD had to share a decoder IIRC). ... At the end of the day, it is an 8-core chip, but it shares an FPU block between groups of 2 cores.
Starting Steamroller AMD actually did split the Decoder. Architecture details for these are pretty scarce to come by though (Anandtech's article has some details).
At the same time, there is a lot more shared in a module other than just the FPU block. Fetch, Instruction Cache, L2 Cache and probaby some more stuff. In Bulldozer/Piledriver Decoder as well.

Actually, the same question really comes up again - What would you consider defines a core?
 
Are you sure about that?
Lets say this is an image of a CPU module. Would you consider the parts in rectangles separate cores? ;)
View attachment 115029

Agreed. But would you like to guess (or list) what the shared resources are? Which resources would you consider defines an independent core?
No, i'm pretty sure they are the SIMD units. Bulldozer Integer cores do not have SIMD units (obviously), they are in the FPU. This really is semantics at this point.


b33tNAp.jpg


I can see what you're saying honestly. I think AMD was a bit misleading with what they called a "core" especially when the marketshare was highest with Intel and they were using a "traditional" monolithic core design. But the point is: there's a lot more hardware duplicated in a BD module for each "core" than a 2-way SMT design such as the one you posted. I think a compromise is in order; It's a "4-module chip with 8 clustered integer cores" :) But I'm not sure where this leaves AMD and the case. I guess it will depend on a bunch of non techies looking at die shots and block diagrams lol
 
Yep, would you like to guess what the shared resources are? Which resources would you consider defines an independent core?

The dependencies that exist within each pair are irrelevant , each instruction assigned to each hardware thread is fetched, decoded, executed and stored back. The fact that there is still scaling in terms of throughput way past 4x and single thread performance is never affected clearly shows that there are enough resources such that instructions can be independently executed. Which is not the case for example with SMT capable cores, where despite the fact that there are two hardware threads available the lack of resources causes both pipelines to become interlocked to the point where you can even see a regression in performance. That never happens with Bulldozer.
 
The dependencies that exist within each pair are irrelevant , each instruction assigned to each hardware thread is fetched, decoded, executed and stored back. The fact that there is still scaling in terms of throughput way past 4x and single thread performance is never affected clearly shows that there are enough resources such that instructions can be independently executed. Which is not the case for example with SMT capable cores, where despite the fact that there are two hardware threads available the lack of resources causes both pipelines to become interlocked to the point where you can even see a regression in performance. That never happens with Bulldozer.

and yet you cannot disable 1 core at a time only 1 module at a time. showing that 1 of the "cores" cannot function without the other"
 
and yet you cannot disable 1 core at a time only 1 module at a time.

I have even showed the option in the BIOS to shut down cores within the module and not the module itself. At this point I am going to assume that there isn't just a lack of understanding on this forum but also an unwillingness to do so.
 
No, i'm pretty sure they are the SIMD units. Bulldozer Integer cores do not have SIMD units (obviously), they are in the FPU. This really is semantics at this point.
You were talking about duplicated module blocks on a die shot ;)
That is a Skylake core and you are right, these are SIMD units. Although INT, FP or Vector was a bit secondary here, technically these should be the execution units behind ports 0 and 1.
The dependencies that exist within each pair are irrelevant , each instruction assigned to each hardware thread is fetched, decoded, executed and stored back.
This is where you are wrong. Dependencies are not irrelevant.
But OK. you said instructions are fetched, decoded, executed and stored back. At least fetch and stored part are very clearly happening at the module level. Decode in Bulldozer and Piledriver as well. After that, Decode is (partially) done in separate units. The stage of processing with separate hardware is only execution.
The fact that there is still scaling in terms of throughput way past 4x and single thread performance is never affected clearly shows that there are enough resources such that instructions can be independently executed. Which is not the case for example with SMT capable cores, where despite the fact that there are two hardware threads available the lack of resources causes both pipelines to become interlocked to the point where you can even see a regression in performance. That never happens with Bulldozer.
Available resources is very conditional and has no real requirement of execution units being separated in the way they are in Bulldozer. Execution units eventually boil down to several pipes you force instuction and data down to that compute stuff. To look at what resources are or could be available for compute you need to look inside the execution units. How they are organized has a lot do with scheduling and managing things but little with the actual compute.
I have even showed the option in the BIOS to shut down cores within the module and not the module itself. At this point I am going to assume that there isn't just a lack of understanding on this forum but also an unwillingness to do so.
You showed an option 'One Core Per Compute Unit'. Right after that are the options to disable... modules, 2 cores at a time. There is no option to shut down one core in a module. This combination makes this setting you showed simply a type of SMT - in Bulldozers case officially CMT.
 
At least fetch and stored part are very clearly happening at the module level.

It simply doesn't matter if it's singular hardware block or not. Look how many entries the decode stage has.

AMD_Bulldozer_block_diagram_%28CPU_core_bloack%29.PNG

You showed an option 'One Core Per Compute Unit'. Right after that are the options to disable... modules, 2 cores at a time. There is no option to shut down one core in a module.

You have some serious reading issues.
 
I have even showed the option in the BIOS to shut down cores within the module and not the module itself. At this point I am going to assume that there isn't just a lack of understanding on this forum but also an unwillingness to do so.

My motherboard allows cores to be turned off as well.
 
You have some serious reading issues.
Really, the BIOS options are irrelevant. The setting you showed shuts down 4 "cores" not one. Depending on BIOS you may be able to shut down one "core" in a module. What you are shutting down is not a core, it is an Integer Execution Unit.
 
What you are shutting down is not a core, it is an Integer Execution Unit.

You're making me laugh, making stuff up now ? How do you know it does that ?
 
You're making me laugh, making stuff up now ? How do you know it does that ?
Because if it shut down the rest of the units the other core would no longer work.
For obvious reasons there is no BIOS setting to do that.
 
Really, the BIOS options are irrelevant. The setting you showed shuts down 4 "cores" not one. Depending on BIOS you may be able to shut down one "core" in a module. What you are shutting down is not a core, it is an Integer Execution Unit.

Do you have amd white papers of that happening?
 
Do you have amd white papers?
Look at the Bulldozer block diagram @Vya Domus posted a few posts up. It has been said in this thread repeatedly that AMD did provide fairly nice details about the Bulldozer Architecture.
That block diagram is a single module. The moment you shut down any parts that are not duplicated it will no longer work. The duplicated parts are the two Integer Clusters.
 
Ever heard of power gating ?
If we are talking functionality, power gated unit is shut down.
If you mean one of the Integer Clusters, then yes, it can be shut down both in terms of not being used and I would expect is power gated.
I don't see the relevance?
 
Look at the Bulldozer block diagram @Vya Domus posted a few posts up. It has been said in this thread repeatedly that AMD did provide fairly nice details about the Bulldozer Architecture.
That block diagram is a single module. The moment you shut down any parts that are not duplicated it will no longer work. The duplicated parts are the two Integer Clusters.

So where's an official statement from amd, white papers?

If you have them that states any of this please do share because I believe you are pulling this from thin air.
 
So where's an official statement from amd, white papers?
What exactly do you want a statement about?
AMD will not be saying it is 4 cores, the wording they have been using everywhere are 'modules' and 'integer cores' :)
 
What exactly do you want a statement about?

About the stuff you are making up such as the fact that only ALUs can be power gated and nothing else.
 
Back
Top