• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Bulldozer Threading Hotfix Pulled

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
LuLz. The problem is the shared L2 of a module is not fast enough to feed two threads. /end story. There's no big mystery as to why BD is slow. I knew it before the CPU was out.
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
LuLz. The problem is the shared L2 of a module is not fast enough to feed two threads. /end story. There's no big mystery as to why BD is slow. I knew it before the CPU was out.

The L2 doesn't feed the two cores...It stores results from the Floating Point Unit and provides instructions to the L1i there is no problem there but It can be improved but I wouldn't fix what isn't broken
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
OK. Please explain to me how, then, moving two threads from running on two cores within a module, to one thread per module, is faster, on workloads that don't require that the shared resources are used exclusively per thread?


In an Intel CPU, cache gets slower, going from L1, to L2, to L3. Then ram is even a bit slower yet. The speed differences are offset by having a larger data store.

In Bulldozer, the L2 cache is a fraction the speed of both the L1 and L3. Why? What benefit does this serve?
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
OK. Please explain to me how, then, moving two threads from running on two cores within a module, to one thread per module, is faster?

In what benchmark



If I had to make a guess without knowing the benchmark it would be the dispatch not the L2
 
Last edited by a moderator:

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.88/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
LuLz. The problem is the shared L2 of a module is not fast enough to feed two threads. /end story. There's no big mystery as to why BD is slow. I knew it before the CPU was out.

Isn't it also slow because there's only 4 FPU's in the 8 core model? I was aghast when I first saw this.
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
Isn't it also slow because there's only 4 FPU's in the 8 core model? I was aghast when I first saw this.

There is not jsut 4 FPUs. There are 4 256-bit FPUs, which can each handle dual 128-bit operations. Nearly nothing currently uses the 256-bit capability.
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Isn't it also slow because there's only 4 FPU's in the 8 core model? I was aghast when I first saw this.

It's 4 Floating Point Units but you have two units for each core if it was 256bit Units you wouldn't be complaining
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.88/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
There is not jsut 4 FPUs. There are 4 256-bit FPUs, which can each handle dual 128-bit operations. Nearly nothing currently uses the 256-bit capability.

It's 4 Floating Point Units but you have two units for each core if it was 256bit Units you wouldn't be complaining

In other words it's four double-width FPU's making it equivalent to 8 single width ones? And it can be logically split into two? If so, that would make it just fine, yes.

Now I think about it, what is the word size of an FPU on previous 64-bit processors? Should be the same of AMD and Intel, I'd expect. (Yes, I know I could google it, but I'd rather you guys just explain it to me. :p )
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Now I think about it, what is the word size of an FPU on previous 64-bit processors? Should be the same of AMD and Intel, I'd expect. (Yes, I know I could google it, but I'd rather you guys just explain it to me. :p )

Mostly 128bits

ARM is just getting 128bit SIMD with A15-Cortexs
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.88/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
That's how AMD explains it.


The whole thing about a single scheduler revolves around the FP scheduler being shared for the seperate 128-bit "pipes"(as is plain the image I posted above), but it seem to me, evne workloads that don't have any floating point, and are integer based, benefit from moving dual threads to individual cores.

And to me, the figure of 10% performacne increases, seems to fit with the L2 cache being slow, rather than with the FP scheduler not being wide enough.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.88/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
It sounds like the damned thing just wants some hand-tuned optimisation, doesn't it? Perhaps that would make it fly? We really need Intel to be lifted bodily out of its comfort zone.
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
As far as I can tell, it's more than having workloads balanced.


Now, there's a differnce in Windows 8 and Windows 7, in how workloads are managed in a CPU, due to Windows 8 allowing what is called "core parking". This is basically fully shutting off a core when it's not in use, for power savings. Naturally, such control needs to be finely tuned so that threads do not stall, and bringing similar functionality to Windows 7 is what this patch s supposed to be all about. The ability to dynamically move threads from one core to the next without stalling the thread is not really a big thing, and if it really was an issue with the FP scheduler, there'd be much more than just a 10% boost possible...sometimes it would be a doubling of speed.

That said, no, I do not think there is any "saving grace" for BD in this. I really feel the L2 cache is to slow, and the numbers seem to agree. When someone can tell us why the L2 cache seems to be slow, it might be more clear why BD "sucks".

Price the 8150 @ $200, and it's killer. There's really nothing wrong with BD's design. The only thing that makes it look wrong is the pricing, and that's because everyone considers BD to compete with SB(rightly so).
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
When someone can tell us why the L2 cache seems to be slow, it might be more clear why BD "sucks".

The L2 has to handle writes from two L1ds and it is handled by the WCC unit(The WCC can combine 4 x 8192Kb or send 4KB to the L2, write through)

The problem with multithreading once again won't be there

Memory Sub System isn't really the problem other than the L3 but the L3 problem only starts with more than one module being used
 
Joined
Oct 30, 2008
Messages
1,768 (0.30/day)
System Name Lailalo
Processor Ryzen 9 5900X Boosts to 4.95Ghz
Motherboard Asus TUF Gaming X570-Plus (WIFI
Cooling Noctua
Memory 32GB DDR4 3200 Corsair Vengeance
Video Card(s) XFX 7900XT 20GB
Storage Samsung 970 Pro Plus 1TB, Crucial 1TB MX500 SSD, Segate 3TB
Display(s) LG Ultrawide 29in @ 2560x1080
Case Coolermaster Storm Sniper
Power Supply XPG 1000W
Mouse G602
Keyboard G510s
Software Windows 10 Pro / Windows 10 Home
We all know AMDs in Microsofts pocket.They will do what ever they can software/driver wise to help them.

Because Windows is clearly optimized for AMD :rolleyes:

M$ may like AMD some because they are tired of Intel's monopoly, but ultimately they make more money thanks to Intel than not. Money talks. But the patch for BD was a given. There were features of it that were not being implemented right now. Or at least, not as well as they could. M$ would do the same for Intel if they came out with a new tech.

They aren't directly in M$'s pocket. More likely they are in Intel's pocket because without them, Intel faces antitrust.
 

cadaveca

My name is Dave
Joined
Apr 10, 2006
Messages
17,232 (2.53/day)
The L2 has to handle writes from two L1ds and it is handled by the WCC unit(The WCC can combine 4 x 8192Kb or send 4KB to the L2, write through)

The problem with multithreading once again won't be there

Memory Sub System isn't really the problem other than the L3 but the L3 problem only starts with more than one module being used
L3 is shared between ALL cores. The problem isn't multithreaded workloads. THe problem with BD is that single-threaded perforamcne is lower than even Thuban. Teh most obvious difference, to me, between the two, is cache design and speed.

Nobody cares about BD's multi-threaded performance. I'm not sure we're on the same topic here.
 
Joined
Jul 10, 2010
Messages
1,233 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
L3 is shared between ALL cores. The problem isn't multithreaded workloads. THe problem with BD is that single-threaded perforamcne is lower than even Thuban. Teh most obvious difference, to me, between the two, is cache design and speed.

Nobody cares about BD's multi-threaded performance. I'm not sure we're on the same topic here.

Cache Design isn't at fault and Speed isn't at fault

Single Threaded performance -> Dispatch
Scroll up, I said dispatch already

Dispatch is shared between the two cores and the shared FP....It is divided into 4 dispatches per clock(2 macro-ops per unit(Core A, Core B, FPU x 2)...unless you disable a cluster then it will be 2 dispatches per clock(Core Ax2, FPUx2)...(4 Macro-ops to Core A and 4 Macro-ops to FPU and the FPU will only need to use Core A stuff making a ~17 stage pipeline effectively a ~14 stage pipeline(each core only needs 2 macro-ops and to complete core commands the FPU only really needs 4 macro-ops(decoder can do 8 macro-ops))
 
Last edited:
Joined
Jan 2, 2009
Messages
9,899 (1.70/day)
Location
Essex, England
System Name My pc
Processor Ryzen 5 3600
Motherboard Asus Rog b450-f
Cooling Cooler master 120mm aio
Memory 16gb ddr4 3200mhz
Video Card(s) MSI Ventus 3x 3070
Storage 2tb intel nvme and 2tb generic ssd
Display(s) Generic dell 1080p overclocked to 75hz
Case Phanteks enthoo
Power Supply 650w of borderline fire hazard
Mouse Some wierd Chinese vertical mouse
Keyboard Generic mechanical keyboard
Software Windows ten
OK. Please explain to me how, then, moving two threads from running on two cores within a module, to one thread per module, is faster, on workloads that don't require that the shared resources are used exclusively per thread?



On my system running a 4 threaded program on 4 cores ( 2 modules) is the same speed as running a 4 threaded program on 4 cores ( 4 modules)


On Cinebench anyway, not sure about anything else as I've not tested it.

But Cinebench should be a program that would highlight this right?
 
Joined
Mar 24, 2011
Messages
2,356 (0.47/day)
Location
VT
Processor Intel i7-10700k
Motherboard Gigabyte Aurorus Ultra z490
Cooling Corsair H100i RGB
Memory 32GB (4x8GB) Corsair Vengeance DDR4-3200MHz
Video Card(s) MSI Gaming Trio X 3070 LHR
Display(s) ASUS MG278Q / AOC G2590FX
Case Corsair X4000 iCue
Audio Device(s) Onboard
Power Supply Corsair RM650x 650W Fully Modular
Software Windows 10
I chose to believe shared resources AND slow L2 Cache are problems.
 
Joined
Jan 20, 2010
Messages
868 (0.16/day)
Location
Toronto, ON. Canada
System Name Gamers PC
Processor AMD Phenom II X4 965 BE @ 3.80 GHz
Motherboard MSI 790FX-GD70 AM3
Cooling Corsair H50 Cooler
Memory Corsair XMS3 4GB (2x2GB) DDR3-1333
Video Card(s) XFX Radeon HD 5770 1GB GDDR5
Storage 2 x WD Caviar Green 1TB SATA300 w/64MB Buffer (RAID 0)
Display(s) Samsung 2494SW 1080p 24" WS LCD HD
Case CM HAF 932 Full Tower Case
Audio Device(s) Creative SB X-FI TITANIUM -PCIE x 1
Power Supply Corsair TX Series CMPSU-650TX (650W)
Software Windows 7 Ultimate 64-bit
There 3 problems with Bulldozer in order. If AMD can fix this in time for Piledriver, then they would have the ability to compete with Intel much better.

1 -It lacks hand-tuned optimisation (somebody already mention this)
2 -Dispatch Unit needs major tweaking
3 -L1 and L2 cache needs a speed boost.
On my system running a 4 threaded program on 4 cores ( 2 modules) is the same speed as running a 4 threaded program on 4 cores ( 4 modules)

On Cinebench anyway, not sure about anything else as I've not tested it.

But Cinebench should be a program that would highlight this right?
Something wrong there :confused: Ive seen tests done that shows a 4C4M beats out a 4C2M setup in almost all tests done. And the higher you scale the CPU clock the better the 4C4M becomes versus the 4C2M. This sharing within the bulldozer design needs some real fine tuning IMO.
 

EastCoasthandle

New Member
Joined
Apr 21, 2005
Messages
6,885 (0.96/day)
System Name MY PC
Processor E8400 @ 3.80Ghz > Q9650 3.60Ghz
Motherboard Maximus Formula
Cooling D5, 7/16" ID Tubing, Maze4 with Fuzion CPU WB
Memory XMS 8500C5D @ 1066MHz
Video Card(s) HD 2900 XT 858/900 to 4870 to 5870 (Keep Vreg area clean)
Storage 2
Display(s) 24"
Case P180
Audio Device(s) X-fi Plantinum
Power Supply Silencer 750
Software XP Pro SP3 to Windows 7
Benchmark Scores This varies from one driver to another.
This is exciting stuff. Can't wait to see what can of performance people can expect once the patch is released.
 
Joined
Jan 2, 2009
Messages
9,899 (1.70/day)
Location
Essex, England
System Name My pc
Processor Ryzen 5 3600
Motherboard Asus Rog b450-f
Cooling Cooler master 120mm aio
Memory 16gb ddr4 3200mhz
Video Card(s) MSI Ventus 3x 3070
Storage 2tb intel nvme and 2tb generic ssd
Display(s) Generic dell 1080p overclocked to 75hz
Case Phanteks enthoo
Power Supply 650w of borderline fire hazard
Mouse Some wierd Chinese vertical mouse
Keyboard Generic mechanical keyboard
Software Windows ten
There 3 problems with Bulldozer in order. If AMD can fix this in time for Piledriver, then they would have the ability to compete with Intel much better.

1 -It lacks hand-tuned optimisation (somebody already mention this)
2 -Dispatch Unit needs major tweaking
3 -L1 and L2 cache needs a speed boost.

Something wrong there :confused: Ive seen tests done that shows a 4C4M beats out a 4C2M setup in almost all tests done. And the higher you scale the CPU clock the better the 4C4M becomes versus the 4C2M. This sharing within the bulldozer design needs some real fine tuning IMO.

What bios and boards were used?
 
Joined
Feb 19, 2007
Messages
12,453 (1.92/day)
Location
Yankee lost in the Mountains of East TN
Processor 5800x(2)/5700g/5600x/5600g/2700x/1700x/1700
Motherboard MSI B550 Carbon (2)/ MSI z490 Unify/Asus Strix B550-F/MSI B450 Tomahawk (3)
Cooling EK AIO 360 (2)/EK AIO 240, Arctic Cooling Freezer II 280/EVGA CLC 280/Noctua D15/Cryorig M9(2)
Memory 32 GB Ballistix Elite/32 GB TridentZ/16GB Mushkin Redline Black/16 GB Dominator
Video Card(s) Asus Strix RTX3060/EVGA 970(2)/Asus 750 ti/Old Quadros
Storage Samsung 970 EVO M.2 NVMe 500GB/WD Black M.2 NVMe 500GB/Adata 500gb NVMe
Display(s) Acer 1080p 22"/ (3) Samsung 22" 1080p
Case (2) Lian Li Lancool II Mesh/Corsair 4000D /Phanteks Eclipse 500a/Be Quiet Pure Base 500/Bones of HAF
Power Supply EVGA Supernova 850G(2)/EVGA Supernova GT 650w/Phantek Amps 750w/Seasonic Focus 750w
Mouse Generic Black wireless (5)
Keyboard Generic Black wireless (5)
Software Win 10/Ubuntu
As far as I can tell, it's more than having workloads balanced.


Now, there's a differnce in Windows 8 and Windows 7, in how workloads are managed in a CPU, due to Windows 8 allowing what is called "core parking". This is basically fully shutting off a core when it's not in use, for power savings. Naturally, such control needs to be finely tuned so that threads do not stall, and bringing similar functionality to Windows 7 is what this patch s supposed to be all about. The ability to dynamically move threads from one core to the next without stalling the thread is not really a big thing, and if it really was an issue with the FP scheduler, there'd be much more than just a 10% boost possible...sometimes it would be a doubling of speed.

That said, no, I do not think there is any "saving grace" for BD in this. I really feel the L2 cache is to slow, and the numbers seem to agree. When someone can tell us why the L2 cache seems to be slow, it might be more clear why BD "sucks".

Price the 8150 @ $200, and it's killer. There's really nothing wrong with BD's design. The only thing that makes it look wrong is the pricing, and that's because everyone considers BD to compete with SB(rightly so).

Agreed, At the end of the day, through all of the technical jargon, this is all that matters to the majority of users. The pricing just doesn't parallel it's performance. If AMD adjusts this, and makes it clear that the chip is not designed to really compete with SB, then the chip goes from POS to a good budget, mid level enthusiast chip. I paid $195 for mine, and for that (even though I haven't received it yet), it feels like a bargain based on benchmarks.
 
Last edited:
Joined
Mar 10, 2010
Messages
11,878 (2.21/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
imho the scheduling patch may well fix it, as for the issues with its L2 cache its quite simple if one program runs 2 threads on different modules and one thread requires data from the other to proceed theres a halt while the data is pulled from one modules L2 to the other slowing down speed so in this instance they would be better scheduled on the same module,,

however if one program runs two threads that dont share data or two progs run a thread each then in this case the scheduler needs to run one thread per module to optimise perfomance and none of this is presently being done by windows correctly hence lower performance higher heat and watts so a patch should reap rewards if it works right
 
Top