• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

FX-Series Processors Clock Speeds 'Revealed'

Joined
May 16, 2011
Messages
1,430 (0.29/day)
Location
A frozen turdberg.
System Name Runs Smooth
Processor FX 8350
Motherboard Crosshair V Formula Z
Cooling Corsair H110 with AeroCool Shark 140mm fans
Memory 16GB G-skill Trident X 1866 Cl. 8
Video Card(s) HIS 7970 IceQ X² GHZ Edition
Storage OCZ Vector 256GB SSD & 1Tb piece of crap
Display(s) acer H243H
Case NZXT Phantom 820 matte black
Audio Device(s) Nada
Power Supply NZXT Hale90 V2 850 watt
Software Windows 7 Pro
Benchmark Scores Lesbians are hot!!!
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Yes, I know that. But which cores will it use?

It will use all that they need

The Windows OS will schedule the threads not the software or the cpu

If it a 4 core app you will see any 2 modules being used

If it is a 3 core app you will see any 1 module being used + any 1 half-utilized module
 
Joined
May 16, 2011
Messages
1,430 (0.29/day)
Location
A frozen turdberg.
System Name Runs Smooth
Processor FX 8350
Motherboard Crosshair V Formula Z
Cooling Corsair H110 with AeroCool Shark 140mm fans
Memory 16GB G-skill Trident X 1866 Cl. 8
Video Card(s) HIS 7970 IceQ X² GHZ Edition
Storage OCZ Vector 256GB SSD & 1Tb piece of crap
Display(s) acer H243H
Case NZXT Phantom 820 matte black
Audio Device(s) Nada
Power Supply NZXT Hale90 V2 850 watt
Software Windows 7 Pro
Benchmark Scores Lesbians are hot!!!
It will use all that they need

The Windows OS will schedule the threads not the software or the cpu

If it a 4 core app you will see 2 modules used in any order

If it is a 3 core app you will see any 1 module being used + any 1 half-utilized module

So there is no way to make it use 1 core per module? That would make more sense so that all 4 threads are getting 100% of the resources.

In a 4 threaded application.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
So there is no way to make it use 1 core per module? That would make more sense so that all 4 threads are getting 100% of the resources.

In a 4 threaded application.

In real world tasks there will be no differences between 1 core used to 2 cores used

1 core in a module has access to 100% of the resources
2 cores in a module has access to 100% of the resources

The idea of CMT is to make 2 cores use the same resources to increase throughput/speed

1 module provides 2x the resources 1 core needs

4 cores being used in any pattern or setup will have 100% access to all the resources it needs

Simply put you do not need to worry about the module as a whole

Everything that needs to be dedicated is dedicated and everything that needs to be shared is shared
 

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.63/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
AMD Zambezi has more IPC per module compared to Intel IPC per core

and AMD Zambezi mimics Tri-channel simply do to how many predictors the IMC has

I'll believe it when I see it.
In real world tasks there will be no differences between 1 core used to 2 cores used

1 core in a module has access to 100% of the resources
2 cores in a module has access to 100% of the resources only half of the time.

The idea of CMT is to make 2 cores use the same resources to increase throughput/speed

1 module provides 2x the resources 1 core needs

4 cores being used in any pattern or setup will have 100% access to all the resources it needs

Simply put you do not need to worry about the module as a whole

Everything that needs to be dedicated is dedicated and everything that needs to be shared is shared in an ideal world, but we don't live in an ideal world.
Fixed.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.63/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Fabrication

You got it backwards

There is 800% resources

1 core can only access 100% of those resources

It is a hardware limitation

2 cores will use 200% of those

So, 1 core will completely use the stuff dedicated to it and only half the stuff shared with it there is only so much 1 core can do

The floating point is a dedicated entity shared between both cores, so it does not follow what we think of a normal FPU

That is why it is call a Flex FPU

1 core in a module has access to 100% of the resources in 1 module
2 cores in a module has access to 200% of the resources in 1 module

Module holds all the resources needed for 2 cores to run in it
There is no performance hit in this design

Performance to Resources used
100% -> 200% -> 300% -> 400% -> 500% -> 600% -> 700% -> 800%
{50% -> 100%} -> {150% -> 200%} -> {250% -> 300%} -> {350% -> All}

with 4 cores used in any module will utilize half of the CPUs total resources
with 2 cores used in any module will utiilize 1/4 of the CPUs total resources
with 6 """"" 3/4 of the CPUs total resources
with 8 """"" All of the the CPUs total resources

Module holds 100% of the stuff needed for 1 core and 2 cores to operate without a bottleneck

---and now for something totally different--------

Brad_Hawthorne@EVGA said:
I helped setup an AMD event on the 15th and staffed the event on the 16th. The two systems I worked with were Bulldozer 3.4ghz engineering samples. I pulled up the system control panel to confirm the chips as 3.4ghz ES. I have mixed thoughts about it. I liked that they did 3.4ghz on the stock AMD heatsink by default. On the other hand, it was definitely ES silicon. I was getting random CTD on things while we were demoing. I did no benches on the hardware because I had no time. They were Dirt3 Eyefinity 3x1L demo rigs.

Brad_Hawthorne@EVGA said:
Everything was cranked up to maximums in the settings, running them 5760x1080. No noticeable lag spikes or FPS issues with real world use in game. I believe the video cards in the rigs were 6990. The port config was two dvi and 2 mini-dp. Ran two of the projectors via dvi and the third via minidp-to dvi adapter. The rigs were connected to D-Box motion actuated racing chairs with Logitech G27 setups. I have pics and video of the rig configurations on my Canon t2i. I just finished driving Dallas-Wichita though in 6 hours so I'm a bit tired. Will update pics when I wake up later today.



What he does^
 
Last edited:

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.63/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
If both threads going to a module need to access the fpu, one has to wait. By definition, that's not 100% resource availability.

And eyefinity setups prove absolutely nothing. Gaming is an absolutely terrible metric to judge cpu performance.

Sorry, but I would happily bet money that Intel still wins IPC per core per clock.
 
Joined
May 16, 2011
Messages
1,430 (0.29/day)
Location
A frozen turdberg.
System Name Runs Smooth
Processor FX 8350
Motherboard Crosshair V Formula Z
Cooling Corsair H110 with AeroCool Shark 140mm fans
Memory 16GB G-skill Trident X 1866 Cl. 8
Video Card(s) HIS 7970 IceQ X² GHZ Edition
Storage OCZ Vector 256GB SSD & 1Tb piece of crap
Display(s) acer H243H
Case NZXT Phantom 820 matte black
Audio Device(s) Nada
Power Supply NZXT Hale90 V2 850 watt
Software Windows 7 Pro
Benchmark Scores Lesbians are hot!!!
Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.

OK, daddy is going to do some math, everyone follow along please.

First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

Now, let's get started.

First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

Assume that a processor core has 100% throughput.

Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

Running 2 threads on the same module is ~180%.

You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

Now, where does the 90% come from? What is 180% /2? 90%.

People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

Mythical 6-core bulldozer:
100% + 95% + 95% + 95% + 95% + 95% = 575%

Orochi die with 4 modules:
180% + 180% + 180% + 180% = 720%

What if we had just done a 4 core and added HT (keeping in the same die space):
100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

What about a 6 core with HT (has to assume more die space):
100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

(Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?

http://www.xtremesystems.org/forums...-From-AMD-at-ISSCC-2011&p=4755711#post4755711

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
If both threads going to a module need to access the fpu, one has to wait. By definition, that's not 100% resource availability.

And eyefinity setups prove absolutely nothing. Gaming is an absolutely terrible metric to judge cpu performance.

Sorry, but I would happily bet money that Intel still wins IPC per core per clock.

The FPU isn't a resource and both threads aren't going to wait for the FPU since it is tasked way differently than in CMP

256bit commands are done at the module level not the core level
(Meaning AVX support is the same as Intel's AVX for compatibility)

SSE5 is where it is at for AMD(XOP, CVT16, FMA4)

SSE is done at the core level(8xSSE)(SSE5 128bit)
AVX is done at the module level(4xAVX)(AVX 128bit+AVX 128bit)

Sorry, but I would bet money that AMD and Intel have equal IPC per core per clock

Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.

He isn't talking about what I am talking about

It's harder to explain once you go from the throughput world to the speed world

There is no overhead....that is the issue
The core CMP issue is there but isn't really bad

CMT scales on the module level not the core level
200% -> 397.5% -> 595% -> 792.5%
vs CMP
100% -> 197.5% -> 295% -> 392.5% -> 490% -> 587.5% -> 685% -> 782.5%

Do you see the trade off?
 
Last edited:

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.63/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
The FPU isn't a resource and both threads aren't going to wait for the FPU since it is tasked way differently than in CMP

256bit commands are done at the module level not the core level
(Meaning AVX support is the same as Intel's AVX for compatibility)

SSE5 is where it is at for AMD(XOP, CVT16, FMA4)

SSE is done at the core level(8xSSE)(SSE5 128bit)
AVX is done at the module level(4xAVX)(AVX 128bit+AVX 128bit)

Sorry, but I would bet money that AMD and Intel have equal IPC per core per clock



He isn't talking about what I am talking about

It's harder to explain once you go from the throughput world to the speed world

Bullshit, if both threads need floating point, one has to wait, plain and simple fact. I don't care about SSE5, that's a completely irrelevant distraction, and doesn't change the point at all.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Bullshit, if both threads need floating point, one has to wait, plain and simple fact. I don't care about SSE5, that's a completely irrelevant distraction, and doesn't change the point at all.

No...there is no waiting for each core

AVX done on both cores or done half-length
128bit AVX+128bit AVX
2x128bit AVX

You not understanding this is lousy tiddings

Here is a quote from JF-AMD that says that you don't get 100% out of two cores in a module.

So now I'm back to hoping that they can figure out a way to make 4 threads run on 4 modules.

Back to you

4 core Orochi vs 4 core Phenom II

Both score 4000~
But in multithreading there is an overhead but the Orochi design alleviates that to the module level and not to the core level

4 core Orochi will get a real world score of 15000~ where in an no-overhead world it will get 16k
4 core Phenom II will get a real world score of 14000~ where in an no-overhead world it will get 16k

The distance gets even bigger with more cores

8 core Orochi vs 8 core Phenom II

4000 again
8 core Orochi will get a real world score of 30000 where in a no overhead world it will get 32K
8 core Phenom II will get a real world score of 28000 where in no overhead world it will get 32k

But that is at the same clocks and for the same IPC

Phenom II has 3 IPC per core while Zambezi has 4 IPC per core(this is where the 25% comes in)

and Zambezi will have a higher clock

Same clocks though
Phenom II 3.4GHz
4200
Zambezi ignoring all the extra stuff that increases a little bit
5000~(I'm going to say it will get 5000ish(±400)

Phenom II 8C - 29400
Zambezi 8C - 34500

But that is if it is well programmed
 
Last edited:

Wile E

Power User
Joined
Oct 1, 2006
Messages
24,318 (3.63/day)
System Name The ClusterF**k
Processor 980X @ 4Ghz
Motherboard Gigabyte GA-EX58-UD5 BIOS F12
Cooling MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory 3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s) Evga GTX 580
Storage Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s) HP LP2475w 24" 1920x1200 IPS
Case Technofront Bench Station
Audio Device(s) Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply ENERMAX Galaxy EVO EGX1250EWT 1250W
Software Win7 Ultimate N x64, OSX 10.8.4
Thanks, but I'll wait for the real info to release.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Thanks, but I'll wait for the real info to release.



Is this real enough?

128bit Execution per Core
256bit Execution per Module
 
Joined
Mar 24, 2011
Messages
2,356 (0.47/day)
Location
VT
Processor Intel i7-10700k
Motherboard Gigabyte Aurorus Ultra z490
Cooling Corsair H100i RGB
Memory 32GB (4x8GB) Corsair Vengeance DDR4-3200MHz
Video Card(s) MSI Gaming Trio X 3070 LHR
Display(s) ASUS MG278Q / AOC G2590FX
Case Corsair X4000 iCue
Audio Device(s) Onboard
Power Supply Corsair RM650x 650W Fully Modular
Software Windows 10
o_O

128-bit - 32 FLOPS
256-bit - 64 FLOPS

wait what?
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
o_O

128-bit - 32 FLOPS
256-bit - 64 FLOPS

wait what?

It's due to the FMACs

Intel doesn't have FMACs

1x128bit
or
1x256bit
per core

AMD has FMACs

1x128bit
per core
1x256bit
per module

The best I can come up with
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
I'm amazed at the sheer amount of BS that you can write in one night. lol not pretending to be offensive, it's almost a complimment.

Anyway, FMAC has nothing to do with that. FMAC is the way the math is done. AMD used 2x128 bit FMAC units. Which means 2 fused mulply accumulate units.

Intel used 1x 256 bit FMUL and 1x 256 bit FADD. The result is similar.

The difference is that BD can use 1x 128 bit for each "core", which may or might not be an advantage for legacy code that is heavily parallelized (8 threads). In the server arena this might be a real advantage, in desktop, it will help nothing most probably (8 threads required).

What AMD doesn't say either is that the 128+128 = 256 bit operation is slower than the "native" 256 bit operation, so slower for AVX, there is overhead. Pretending there is not, is just like believing in fairies.

Or just like believing that GlobalFoundries or not, the yields are the same for an old architecture and a new architecture. :rolleyes:
 
Joined
Jun 17, 2007
Messages
7,336 (1.14/day)
Location
C:\Program Files (x86)\Aphexdreamer\
System Name Unknown
Processor AMD Bulldozer FX8320 @ 4.4Ghz
Motherboard Asus Crosshair V
Cooling XSPC Raystorm 750 EX240 for CPU
Memory 8 GB CORSAIR Vengeance Red DDR3 RAM 1922mhz (10-11-9-27)
Video Card(s) XFX R9 290
Storage Samsung SSD 254GB and Western Digital Caviar Black 1TB 64MB Cache SATA 6.0Gb/s
Display(s) AOC 23" @ 1920x1080 + Asus 27" 1440p
Case HAF X
Audio Device(s) X Fi Titanium 5.1 Surround Sound
Power Supply 750 Watt PP&C Silencer Black
Software Windows 8.1 Pro 64-bit
I have no idea what is going on here or what is being said but would honestly like to know.

I do plan on picking up one of these processors, of course after they are actually out and I've read a couple of reviews.

Should I worry about all the info that is being thrown around here?
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
Should I worry about all the info that is being thrown around here?

As someone who only wants to buy the thing, not really. Wait until the reviews are performed and make your decision based on the performance for your prefered applications.

We just like to talk about and predict performance based on our knowledge of the architecture and the different tech utilized.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
blah blah blah

Using 128bit+128bit is 6% slower I am to lazy to google up what we already should know

The yields are better

Because they have been producing AMD Zambezi chips since
Late August 2010(8 weeks after Bulldozer was taped out)
Late August 2010 -Late October 2010 = A1
November 2010 - January 2011 = B0
February 2011- April 2011 = B1
May 2011 - July 2011 = B2
^That span of time I am pretty sure they don't have yield issues

Since, the desktop market likes the legacy benchies it will do great

I have no idea what is going on here or what is being said but would honestly like to know.

I do plan on picking up one of these processors, of course after they are actually out and I've read a couple of reviews.

Should I worry about all the info that is being thrown around here?

No, you shouldn't we are bickering about stuff you won't have to worry about

As someone who only wants to buy the thing, not really. Wait until the reviews are performed and make your decision based on the performance for your prefered applications.

We just like to talk about and predict performance based on our knowledge of the architecture and the different tech utilized.

Basically
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
No, you shouldn't we are bickering about stuff you won't have to worry about

and what you know little about, for all you know they may allow the use of 2x128bit fpu to 1 core if underused meaning it would have 100+% resources, im not saying it will just saying that not one person on here knows what them mofos are !actually! gona be capable of regarding speed or function, your spreading FUD simples and clearly sat on an AMD dildo:eek:

not once have i seen it be your imho:wtf:, no your waffling like its a fact, and no sprd sheets please or screanies , ive seen em all and been following it as long as everyone else on TPU:D
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
Using 128bit+128bit is 6% slower I am to lazy to google up what we already should know

The yields are better

Because they have been producing AMD Zambezi chips since
Late August 2010(8 weeks after Bulldozer was taped out)
Late August 2010 -Late October 2010 = A1
November 2010 - January 2011 = B0
February 2011- April 2011 = B1
May 2011 - July 2011 = B2
^That span of time I am pretty sure they don't have yield issues

And why did they had so many revisions? Because yields were not good my friend. And now they delayed it again, for which reason? Obvious. They still have some issues.

Since, the desktop market likes the legacy benchies it will do great

Even if that was true:

legacy apps == poor multi-threading == don't dream of 4 threads being fully utilized, let alone 8 == 128+128 bit advantage goes down the drain.

And when major developers start using AVX in 1-2 years tops, BD will have the disadvantage. "Only" 6% if you will (I want proof btw), still a big one considering that the die size increases too.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
and what you know little about, for all you know they may allow the use of 2x128bit fpu to 1 core if underused meaning it would have 100+% resources, im not saying it will just saying that not one person on here knows what them mofos are !actually! gona be capable of regarding speed or function, your spreading FUD simples and clearly sat on an AMD dildo:eek:

not once have i seen it be your imho:wtf:, no your waffling like its a fact, and no sprd sheets please or screanies , ive seen em all and been following it as long as everyone else on TPU:D

http://blogs.amd.com/work/2010/10/25/the-new-flex-fp/

You need to read this

One of the most interesting features planned for our next generation core architecture, which features the new “Bulldozer” core, is something called the “Flex FP”, which delivers tremendous floating point capabilities for technical and financial applications.

For those of you not familiar with floating point math, this is the high level stuff, not 1+1 integer math that most applications use. Technical applications and financial applications that rely on heavy-duty use of floating point math could see huge increases in performance over our existing architectures, as well as far more flexibility.

The heart of this new feature is a flexible floating point unit called the Flex FP. This is a single floating point unit that is shared between two integer cores in a module (so a 16-core “Interlagos” would have 8 Flex FP units). Each Flex FP has its own scheduler; it does not rely on the integer scheduler to schedule FP commands, nor does it take integer resources to schedule 256-bit executions. This helps to ensure that the FP unit stays full as floating point commands occur. Our competitors’ architectures have had single scheduler for both integer and floating point, which means that both integer and floating point commands are issued by a single shared scheduler vs. having dedicated schedulers for both integer and floating point executions.

There will be some instruction set extensions that include SSSE3, SSE 4.1 and 4.2, AVX, AES, FMA4, XOP, PCLMULQDQ and others.

One of these new instruction set extensions, AVX, can handle 256-bit FP executions. Now, let’s be clear, there is no such thing as a 256-bit command. Single precision commands are 32-bit and double precision are 64-bit. With today’s standard 128-bit FPUs, you execute four single precision commands or two double precision commands in parallel per cycle. With AVX you can double that, executing eight 32-bit commands or four 64-bit commands per cycle – but only if your application supports AVX. If it doesn’t support AVX, then that flashy new 256-bit FPU only executes in 128-bit mode (half the throughput). That is, unless you have a Flex FP.

In today’s typical data center workloads, the bulk of the processing is integer and a smaller portion is floating point. So, in most cases you don’t want one massive 256-bit floating point unit per core consuming all of that die space and all of that power just to sit around watching the integer cores do all of the heavy lifting. By sharing one 256-bit floating point unit per every 2 cores, we can keep die size and power consumption down, helping hold down both the acquisition cost and long-term management costs.

The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle. When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP – whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don’t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit. Here are some additional benefits:

Non-destructive DEST via FMA4 support (which helps reduce register pressure)
Higher accuracy (via elimination of intermediate round step)
Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit

The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The “Bulldozer” Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings.

By having a shared Flex FP the power budget for the processor is held down. This allows us to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) we can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don’t need it.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.

Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn’t necessarily present a dramatic trade-off because of the types of operations being handled.



As you can see, the flexibility of the FPU really gives total flexibility to the system, designed to deliver optimized performance per core per cycle.

Also, each of our pipes can seamlessly handle SSE or AVX as well as FMUL, FADD, or FMAC providing the greatest flexibility for any given application. Existing apps will be able to take full advantage of our hardware with potential for improvement by leveraging the new ISAs.

Obviously, there are benefits of recompiled code that will support the new AVX instructions. But, if you think that you will have some older 128-bit FP code hanging around (and let’s face it, you will), then don’t you think having a flexible floating point solution is a more flexible choice for your applications? For applications to support the new 256-bit AVX capabilities they will need to be recompiled; this takes time and testing, so I wouldn’t expect to see rapid movement to AVX until well after platforms are available on the streets. That means in the meantime, as we all work through this transition, having flexibility is a good thing. Which is why we designed the Flex FP the way that we have.

If you have gotten this far, you are probably thinking that the technical discussion might be a bit beyond a guy with a degree in economics. I’d like to take a moment to thank Jay Fleischman and Kevin Hurd, two geniuses who really understand how all of these pieces fit together to make the Flex FP really unique in the industry.

And why did they had so many revisions? Because yields were not good my friend. And now they delayed it again, for which reason? Obvious. They still have some issues.

And those issues weren't yield bent


Even if that was true:

legacy apps == poor multi-threading == don't dream of 4 threads being fully utilized, let alone 8 == 128+128 bit advantage goes down the drain.

And when major developers start using AVX in 1-2 years tops, BD will have the disadvantage. "Only" 6% if you will (I want proof btw), still a big one considering that the die size increases too.

I am mainly talking about Cinebench, wPrime, and those other benchies
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Top