• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.43/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
We kind of already gathered that, no? Async on AMD cards is executed asynchronously while async on NVIDIA cards is executed synchronously.

Interesting that on both architectures, 100 threads appears to be the sweet spot.
 
Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
Those nvidia cons managed to have a gpu architecture that is good for dx12 while it is good for dx11 at the same time, in the middle of a transition from dx11 to dx12 ... damn them.
Those bastards may even engineer Pascal completely with dx12 in mind.

But seriously, Maxwell architecture seems to handle async task concurrency between themselves just fine (latencies are in accordance with 32 queue depth)... problem is graphics workload being synchronous against async compute workload - if there is no architectural reason to be that way, this could be solved through a driver update. Troubling thing is, if nvidia knew they could fix it in driver update, they'd be faster with their response. Maybe Jen-Hsun Huang is writing a heartwarming letter.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.87/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
Why am I not surprised at all by this too ?

Nvidia is so dirty like pigs in the mud.

I keep telling you that this company is not good but how many listen to me ?
The world will become a better place when we get rid of nvidia.

Monopoly of AMD (with good hearts) will be better than monopoly of nvidia (who only look how to screw technological progress).
Say whut?! :eek: :laugh: Especially the bold bit. With idiotic statements like that, no wonder you're getting criticized by everyone.
 
Joined
Oct 9, 2009
Messages
716 (0.13/day)
Location
Finland
System Name RGB-PC v2.0
Processor AMD Ryzen 7950X
Motherboard Asus Crosshair X670E Extreme
Cooling Corsair iCUE H150i RGB PRO XT
Memory 4x16GB DDR5-5200 CL36 G.SKILL Trident Z5 NEO RGB
Video Card(s) Asus Strix RTX 2080 Ti
Storage 2x2TB Samsung 980 PRO
Display(s) Acer Nitro XV273K 27" 4K 120Hz (G-SYNC compatible)
Case Lian Li O11 Dynamic EVO
Audio Device(s) Audioquest Dragon Red + Sennheiser HD 650
Power Supply Asus Thor II 1000W + Cablemod ModMesh Pro sleeved cables
Mouse Logitech G500s
Keyboard Corsair K70 RGB with low profile red cherrys
Software Windows 11 Pro 64-bit
What a huge pile of dog turd over something which seems to have been a driver issue, completely expectable with Alpha level implementation of first ever DX12 title. NVIDIA DX12 driver does not seem to yet fully support Async Shaders, although Oxide dev thought it does.

Yeah, AMD technical marketing might not be your best source for info about competitor products. Combine that with a meltdown from a game dev... Then we have some good old fashioned NVIDIA bashing.

http://wccftech.com/nvidia-async-compute-directx-12-oxide-games/

"We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.
"
 
Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
14,020 (2.34/day)
Location
Louisiana
Processor Core i9-9900k
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax ETS-T50 Black CPU cooler
Memory 32GB (2x16) Mushkin Redline DDR-4 3200
Video Card(s) ASUS RTX 4070 Ti Super OC 16GB
Storage 1x 1TB MX500 (OS); 2x 6TB WD Black; 1x 2TB MX500; 1x 1TB BX500 SSD; 1x 6TB WD Blue storage (eSATA)
Display(s) Infievo 27" 165Hz @ 2560 x 1440
Case Fractal Design Define R4 Black -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic Focus GX-1000 Gold
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
That comedy team has been running these gags almost 10 years. There's been a few people in other threads that actually think he works for Nvidia, so I figured I would throw this up here:
 
Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
Joined
Sep 21, 2013
Messages
61 (0.01/day)
System Name Biostar
Processor Intel i5 3470
Motherboard BIOSTAR TZ77A
Cooling Stock Intel Cooler
Memory 8GB G.Skill Ripjaws 1600 DDR3
Video Card(s) EVGA GTX 970 SC ACX 2.0 3.5GB + 512MB of very slow.
Storage 1x Seagate Barracuda 7200.12 500GB / 2x Segate Barracuda 7200.10 250 GB
Display(s) HP 2009m
Case Thermaltake Versa H22
Audio Device(s) Realtek ALC892
Power Supply Thermaltake SP-650P 650Watt.
Mouse Logitech M185
Keyboard Logitech K270
Software Windows 10 Enterprise LTSB 64
Benchmark Scores Why waste my hardware and wear it out for a stupid score similar to everyone else's?
Really, I never knew and actually don't wanna know that this fruit the apple is so divine. :laugh:

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?

Because you are supposed to be smart enough to comprehend it. (No offense) But that's how life works.
 
Joined
Sep 29, 2011
Messages
217 (0.04/day)
Location
Ottawa, Canada
System Name Current Rig
Processor Intel 12700K@5.1GHz
Motherboard MSI Pro Z790-P
Cooling Arctic Cooling Liquid Freezer II 360mm
Memory 2x16GB DDR5-6000 G.Skill Trident Z RGB
Video Card(s) MSI Gaming X Trio 6800 16GB
Storage 1TB SSD
Case Cooler Master Storm Striker
Power Supply Antec True Power 750w
Keyboard IBM Model 'M"
No support, no problem, just pay them to gimp it on AMD cards: welcome to the wonderful world of the nVidia console, sorry, pc gaming, the way it's meant to be paid.

More like the way WE'RE meant to be played.
 
Joined
Sep 22, 2012
Messages
1,010 (0.23/day)
Location
Belgrade, Serbia
System Name Intel® X99 Wellsburg
Processor Intel® Core™ i7-5820K - 4.5GHz
Motherboard ASUS Rampage V E10 (1801)
Cooling EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory CMD16GX4M4A2666C15
Video Card(s) ASUS GTX1080Ti Poseidon
Storage Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s) Samsung P2450H
Case Lian Li PC-O11 WXC
Audio Device(s) CREATIVE Sound Blaster ZxR
Power Supply EVGA 1200 P2 Platinum
Mouse Logitech G900 / SS QCK
Keyboard Deck 87 Francium Pro
Software Windows 10 Pro x64
I don't see single DX12 game, only some calculation for possible scenario.
Until 10 DX12 games show on market Pascal will be old more than year.
Because of that I don't see reason for panic, if some card support DX12 that's not same as capable to offer playable fps.
I remember when 5870 with 2GB show up I bought immediately as first card with DX11 support.
Card was excellent but for DX9 and few DX10 environment, but first playable fps and much better with tessellation and DX11 was GTX580.
I changed and ATI5870 and ATI6970 but only with GTX580 situation become really better and with Tahiti later. Period between ATI 5870 and AMD 7970 AMD didn't improve nothing on DX11 field and people who wait and upgrade on GTX580 played much better, until HD7950/HD7970.
Because of that no reason to panic, NVIDIA will be ready when time come...
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
 
Last edited:
Joined
Feb 18, 2011
Messages
1,259 (0.25/day)
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
Nvidia has very good drivers for older generations, even Fermi cards run recent games quite nicely, one just needs to reduce some settings, but that's always the case as time goes by, you get a new card or reduce some settings. I recently helped somebody who has a 560ti + a 2500k (he bought those from me), and most of the games still look and run quite nicely with "medium" settings, and some of them even runs fine with "high". I can't imagine my Maxwell2 would need a replacement any time soon because of performance problems, if I will replace it, that will only happen because I won't be able to resist the upgrade itch again.
 
Joined
Nov 3, 2011
Messages
697 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
That is a claim presented at the beginning of the article. Through the end, if you read it, it is proven in benchmark that it is not true (number of queues horizontally and time spent computing vertically - lower is better)
View attachment 67772
Maxwell is faster than GCN up to 32 queues, and it evens out with GCN to 128 queues, where GCN has same speed up to 128 queues.
It's also shown that with async shaders it's extremely important how they are compiled for each architecture.
Good find @RejZoR

From https://forum.beyond3d.com/posts/1870374/



For pure compute, AMD's compute latency (green color areas) rivals NVIDIA's compute latency (refer to the attached file).

http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1710#post_24368195

Here's what I think they did at Beyond3D:
  1. They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
  2. They've bumped the Kernel count to up to 512 (16,384 Threads total).
  3. They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2

Here's why that's not how you code for GCN


Why?:
  1. Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
  2. That's 2,560 Threads total PER CU.
  3. An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.

If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.


Conclusion:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.


GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.
 

Attachments

  • 64.png
    64.png
    26.2 KB · Views: 437
Last edited:
Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
This test is only good if you're checking on Async functionality.
That's exactly what the test is for ... checking on how much latency Async functionality introduces on both architectures.
GCN has a constant latency, good enough for compute loads made of small number of async tasks and great for huge number of async tasks. Additionaly GCN mixes compute async load and graphics load in near perfect parallelism.
Maxwell shows varying latency that is extremely low for small number of async tasks and surpasses GCN over 128 async tasks. What's really bad is that in current drivers async compute load and graphics load are done serially.
Mind you, every single async compute task is parallel in itself and can occupy 100% of GPU if the job is suitable (parallelizable), so in most cases penalty boils down in how many times and how much context switching is done. Maxwell has nice cache hierarchy to help with that.
GCN should destroy Maxwell in special cases where huge number of async tasks depend on results calculated by huge number of other async tasks that are greatly varying in computational complexity ;)
 
Joined
Apr 30, 2012
Messages
3,881 (0.84/day)
Gears of War Ultimate Will Have Unlocked Frame Rate; Devs Explain How They’re Using DX12 & Async Compute

To begin with, Cam McRae (Technical Director for the Windows 10 PC version) explained how they’re going to use DirectX 12 and even Async Compute in Gears of War Ultimate.

We are still hard at work optimising the game. DirectX 12 allows us much better control over the CPU load with heavily reduced driver overhead. Some of the overhead has been moved to the game where we can have control over it. Our main effort is in parallelising the rendering system to take advantage of multiple CPU cores. Command list creation and D3D resource creation are the big focus here. We’re also pulling in optimisations from UE4 where possible, such as pipeline state object caching. On the GPU side, we’ve converted SSAO to make use of async compute and are exploring the same for other features, like MSAA.
 

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
14,020 (2.34/day)
Location
Louisiana
Processor Core i9-9900k
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax ETS-T50 Black CPU cooler
Memory 32GB (2x16) Mushkin Redline DDR-4 3200
Video Card(s) ASUS RTX 4070 Ti Super OC 16GB
Storage 1x 1TB MX500 (OS); 2x 6TB WD Black; 1x 2TB MX500; 1x 1TB BX500 SSD; 1x 6TB WD Blue storage (eSATA)
Display(s) Infievo 27" 165Hz @ 2560 x 1440
Case Fractal Design Define R4 Black -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic Focus GX-1000 Gold
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.43/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I'm sure they did. They did that to the Windows version of Minecraft already. Of course there's no technical reason for doing so.
 
Last edited:
Top