• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

delete this because less than half the people around here understand or even want to

will AMD outperform NV while using DX12?


  • Total voters
    67
Status
Not open for further replies.
you no doubt have a better understanding of it. to me its just low level optimization at this point.
ok.. nv has had better api overhead in dx11 leaving less untapped performance but with dx12 not only will both be better but amd's gpu's are just capable of more when fully utilized.
Did AMD pay you for that last sentence?
 
Did AMD pay you for that last sentence?
it would have been better to say the cpu has been the master and the gpu has been the slave.. never before more untrue with next gen api's. your cpu and gpu are more like equal partners than ever before. even if nvidia is able to fully optimize overhead at levels of amd its just not enough..
what are you going to buy if you have heavy gpu accelerated workflow? well at the top is probably going to be a complete intel xeon setup followed by quadro and firepro.
amd gaming gpu's hold a spot over nv gaming gpu's for being able to do both except pushing cuda has made sales.
 
it would have been better to say the cpu has been the master and the gpu has been the slave.. never before more untrue with next gen api's. your cpu and gpu are more like equal partners than ever before. even if nvidia is able to fully optimize overhead at levels of amd its just not enough..
what are you going to buy if you have heavy gpu accelerated workflow? well at the top is probably going to be a complete intel xeon setup followed by quadro and firepro.
amd gaming gpu's hold a spot over nv gaming gpu's for being able to do both except pushing cuda has made sales.
Wow................
 
it would have been better to say the cpu has been the master and the gpu has been the slave.. never before more untrue with next gen api's.
You mean the APIs we haven't really seen in action yet except for Mantle? Sounds like optimization still lies on the engine.
your cpu and gpu are more like equal partners than ever before.
Like HSA? DX12 doesn't really do anything about this.
even if nvidia is able to fully optimize overhead at levels of amd its just not enough..
...and what is that supposed to mean? Reducing overhead isn't going to result in some crazy amount of performance improvement.
what are you going to buy if you have heavy gpu accelerated workflow? well at the top is probably going to be a complete intel xeon setup followed by quadro and firepro.
amd gaming gpu's hold a spot over nv gaming gpu's for being able to do both except pushing cuda has made sales.
Not precisely. There is a reason why the Quadro and FirePro use ECC memory and have a bigger feature set. It's not just a huge price tag because you can get consumer grade hardware to do the same thing. That would be insane. I also don't think CUDA has really does wonders for nVidia.

Lastly, re-read your posts. It took me 10 times through to understand what you were trying to say with your decrepit English.
Wow................
I thought so...
 
The problem isn't bandwidth, it's latency. Whenever the GPU needs to stream data from the CPU's memory pool, the GPU has to wait to get that data. So even if it were only to need to send one packet of PCI-E data, it has to wait for that one packet. Bandwidth gives you more of something over time, it does not make it respond faster. This is by no means a result of PCI-E being bad, it's just a result of people forgetting how the actual physical length of PCI-E is long and it takes time for the electrical signal to travel. It's the same argument for moving from GDDR5 to HBM; you're moving memory closer to the GPU therefore latency will be less of a problem (hence why bandwidth was hiked with the super wide bus because it's right next to the GPU, almost like another level of cache (think eDRAM on Iris Pro.)

Also consider this, if a GPU uses its own memory, you have a workflow like this:
GPU Cores -> GPU IMC -> VRAM -> GPU IMC -> GPU Cores
If you have to stream data, you end up doing something like this.
GPU Cores -> GPU IMC -> GPU PCI-E interface -> (Possible PLX chip or PCH,) -> CPU PCI-E interface -> CPU IMC -> Main Memory -> CPU IMC -> CPU PCI-E Interface -> (Possible PLX chip or PCH,) -> GPU PCI-E interface -> GPU IMC -> GPU Cores.

I think you can clearly see why there it latency associated with a GPU needing system memory. Simple fact is no interface is going to change this because latency is determined by circuit distance and the number of devices involved in the entire process.

I'd love to see FO used at some point for CPU/GPU interconnects, but there's more to this than just circuit length. Right now there's a lot of waiting for the CPU to send the GPU rendering data, but they're working on more of the calculations be handled by the GPU, and not just due to more draw calls being possible. It's also because they're streamlining instruction sets. The Forza Dx12 demo clearly points this out. Particularly regarding repeated identical instructions, they're consolidating that into less work.

Here's the thing though, I really don't get why many are being pessimistic about this, almost implying there will be too much bottlenecking to get much gain at all out of low level APIs. Mantle has already proven otherwise in games it's used well in. Sure it won't be several times the performance, but enough gain to certainly be worthwhile and a big step forward.

And now we have a unilateral low level API that both AMD and Nvidia users can benefit from, and some act like it's just a bluff. These are good times for gaming people.
 
You mean the APIs we haven't really seen in action yet except for Mantle? Sounds like optimization still lies on the engine.

Like HSA? DX12 doesn't really do anything about this.

...and what is that supposed to mean? Reducing overhead isn't going to result in some crazy amount of performance improvement.

Not precisely. There is a reason why the Quadro and FirePro use ECC memory and have a bigger feature set. It's not just a huge price tag because you can get consumer grade hardware to do the same thing. That would be insane. I also don't think CUDA has really does wonders for nVidia.

Lastly, re-read your posts. It took me 10 times through to understand what you were trying to say with your decrepit English.

I thought so...
yup makes since to me..
your failing to grasp all the changes taking place in windows 10.. dx12 will run better by design on a hsa apu but it has such great latency improvement even with traditional systems that microsoft makes little claims other than performance performance performance!
reducing overhead and low level optimizations have nothing to do with performance? better get that memo out there and tell microsoft and amd they are doing it wrong..
 
Last edited:
I'd love to see FO used at some point for CPU/GPU interconnects, but there's more to this than just circuit length. Right now there's a lot of waiting for the CPU to send the GPU rendering data, but they're working on more of the calculations be handled by the GPU, and not just due to more draw calls being possible. It's also because they're streamlining instruction sets. The Forza Dx12 demo clearly points this out. Particularly regarding repeated identical instructions, they're consolidating that into less work.

I'd love to see FO used at some point for CPU/GPU interconnects, but there's more to this than just circuit length. Right now there's a lot of waiting for the CPU to send the GPU rendering data, but they're working on more of the calculations be handled by the GPU, and not just due to more draw calls being possible. It's also because they're streamlining instruction sets. The Forza Dx12 demo clearly points this out. Particularly regarding repeated identical instructions, they're consolidating that into less work.
the multi threaded command buffer
 
You missed the discussion here. We are talking about how DX12 pumps up AMD's performance compare to DX11. That benefit doesn't exist in nVidia's cards, cause they are well optimized for Dx11 now and have nothing to unlock with DX12.

Nvidia do get a boost to draw calls, just not as great as AMD. where AMD may triple, nvidia may only get double. (this is a rough, vague example)
 
Nvidia do get a boost to draw calls, just not as great as AMD. where AMD may triple, nvidia may only get double. (this is a rough, vague example)

AMD is cramming so many ACE units into their GPUs they ought to adopt Snoopy as their mascot.

ACE%20Snoopy_zpsefo4q0kz.jpg
 
amd-wallpaper-hd-4.jpg

even toasters can fly?
AMD_Bulldozer_FX.jpg

i couldnt always agree with arnold as a governor but he hit it on the head this time..
 
Last edited:
Actually any optimizations in DX11 are miniscule in comparison of performance increase DX12 will bring to both teams because draw call overhead is a CPU side optimization. Increase will be so radical that CPU bottleneck in gaming will be thing of a past and that completely shifts balance to GPUs where Nvidia has upper hand in geometry and pixel pushing power while AMD has more shading performance and memory bandwith. That's where the dis-balance is, not in DX12 benefits, they will be win for all.
The problem with AMD is a software stack limitation with DX11. Nvidia have done a better job at extracting parallel threaded operation for their driver stack, whereas AMD rely on a single CPU thread. DX11 allows limited gains, but Nvidia went for it & it paid off with ~50% performance over AMD. Peculiarly, AMD's command processor is reputedly more robust than Nvidia's and hence benefits more from DX12/Mantle/Vulkan explicit multi-threading. Additionally the HW compromises needed due to 28nm for Maxwell (which resulted in good perf/watt in current gaming workloads) may be less beneficial going forward. Though by the time it matters, I expect both IHVs will have new architectures on 14/16nm...
 
Nvidia do get a boost to draw calls, just not as great as AMD. where AMD may triple, nvidia may only get double. (this is a rough, vague example)

1 draw call time = overhead time of an API + rendering time for that specific object
1 synced frame at 60 fps = 16 ms
draw calls per frame = 16 ms / (overhead + rendering)

If GPU draws non-shaded cubes (very small rendering time) then behold explosion of draw calls with less overhead in DX12. Most of the draw call time is cut down.
If GPU draws complex shaded tesselated objects (long render time per object) less of a gain happens when we reduce the overhead. Here, draw call time was not wasted predominantly on overhead in the first place.

Because of this, game designers always separate static geometry from dynamic in a sense that all static geometry is being baked together to reduce separate draw calls - for example if you have 17 non moving buildings in the distance, you don't do 17 draw calls one for each building, rather 1 draw call for a group of 17 buildings as a single object. It's called batching and is very effective in reducing draw calls when same materials/shaders are reused all over the game world. This technique alone is offsetting most of the shortcomings DX11 has in terms of overhead.

So the draw call number per frame will grow but not based on GPU architecture, but based on complexity of each object being drawn. The balance can go both ways once constraints are lifted: will it be in favor of more detailed objects because available draw call balance per frame will be high enough, or more different objects on screen because quality of each individual object is high enough.
This shift in balance will only make developers free to make more game objects animated by CPU and physically simulated by CPU, and use much more varied materials in their game.

The problem with AMD is a software stack limitation with DX11. Nvidia have done a better job at extracting parallel threaded operation for their driver stack, whereas AMD rely on a single CPU thread. DX11 allows limited gains, but Nvidia went for it & it paid off with ~50% performance over AMD. Peculiarly, AMD's command processor is reputedly more robust than Nvidia's and hence benefits more from DX12/Mantle/Vulkan explicit multi-threading. Additionally the HW compromises needed due to 28nm for Maxwell (which resulted in good perf/watt in current gaming workloads) may be less beneficial going forward. Though by the time it matters, I expect both IHVs will have new architectures on 14/16nm...

It's true for DX11 drivers, nvidia's gain is not negligible here. However AMD's disadvantage in DX11 shouldn't be looked as an direct relative advantage in DX12 - complexity of each draw call will dictate peak number of draw calls per frame more than anything else.

IMO it will come to this - Nvidia will still be better with high detailed geometry and AMD as of Fiji now will be better with shader performace especially if shader uses many different textures to sample from. DX12 overhead differences will translate to slightly different CPU core usage percentages (much below 100% mind you, bottleneck will never be on CPU with DX12 unless played on Atom).
 
Last edited:
I feel like nothing good is going to come of this thread.
 
Meh. I am still looking forward to quantum computing.
 
1 draw call time = overhead time of an API + rendering time for that specific object
1 synced frame at 60 fps = 16 ms
draw calls per frame = 16 ms / (overhead + rendering)

If GPU draws non-shaded cubes (very small rendering time) then behold explosion of draw calls with less overhead in DX12. Most of the draw call time is cut down.
If GPU draws complex shaded tesselated objects (long render time per object) less of a gain happens when we reduce the overhead. Here, draw call time was not wasted predominantly on overhead in the first place.

Because of this, game designers always separate static geometry from dynamic in a sense that all static geometry is being baked together to reduce separate draw calls - for example if you have 17 non moving buildings in the distance, you don't do 17 draw calls one for each building, rather 1 draw call for a group of 17 buildings as a single object. It's called batching and is very effective in reducing draw calls when same materials/shaders are reused all over the game world. This technique alone is offsetting most of the shortcomings DX11 has in terms of overhead.

So the draw call number per frame will grow but not based on GPU architecture, but based on complexity of each object being drawn. The balance can go both ways once constraints are lifted: will it be in favor of more detailed objects because available draw call balance per frame will be high enough, or more different objects on screen because quality of each individual object is high enough.
This shift in balance will only make developers free to make more game objects animated by CPU and physically simulated by CPU, and use much more varied materials in their game.



It's true for DX11 drivers, nvidia's gain is not negligible here. However AMD's disadvantage in DX11 shouldn't be looked as an direct relative advantage in DX12 - complexity of each draw call will dictate peak number of draw calls per frame more than anything else.

IMO it will come to this - Nvidia will still be better with high detailed geometry and AMD as of Fiji now will be better with shader performace especially if shader uses many different textures to sample from. DX12 overhead differences will translate to slightly different CPU core usage percentages (much below 100% mind you, bottleneck will never be on CPU with DX12 unless played on Atom).
makes a lot of since to me except a little of it..
they directly show how the multithreaded command buffer results in higher fps and i have seen the amd gaming "scientist" explain how these measurements mean exactly that.
Core i7-4960X that was used for the tests i have posted is much faster fx-9590 on 1 core.. do a little math and you will see not only will budget cpu's get a boost but expensive cpu's have always been able to do more. on top of that the tests are also done at 4k that is suppose to make situations more gpu dependent and less cpu dependent.

AMD is cramming so many ACE units into their GPUs they ought to adopt Snoopy as their mascot.

ACE%20Snoopy_zpsefo4q0kz.jpg
this is exactly it^^^^ the multithread command buffer and async shaders that are better utilized on amd's architecture are revolutionary!!
 
Last edited by a moderator:
@xfia

you do like to double post
try using the EDIT BUTTON to merge your posts
Mods have been known to put naughty boys who break the rules persistently over their virtual knee for a spanking
 
@CrAsHnBuRnXp

you on page 3 of too late
I wasnt about to go through all 3 pages to find all the bitching and moaning. Even if I am late to the party, still doesnt mean that vacations cant be given out from my post onward. :)
 
@dorsetknob hm well this site is horribly optimized for editing and finding anything so why bother?
so you just came in here to troll and say nothing at all about the topic?
@CrAsHnBuRnXp so you came in just to say that and didnt even read any of it?
starting to think half the no answers up there are just nvidia fan boys that have no understanding at all about what any of this means.
 
hm well this site is horribly optimized for editing and finding anything so why bother?

There is an edit button at the bottom of each post you make
How difficult is it to click that

by the way its next to the Delete button and the Report button
 
@dorsetknob hm well this site is horribly optimized for editing and finding anything so why bother?
so you just came in here to troll and say nothing at all about the topic?
@CrAsHnBuRnXp so you came in just to say that and didnt even read any of it?
starting to think half the no answers up there are just nvidia fan boys that have no understanding at all about what any of this means.
I dont need to read any of it because no one is going to know anything until Windows 10 and DX12 titles start appearing. Why bother speculating and causing arguments that dont need to be argued?

And this is coming from what appears to us as an AMD "fanboy", right?
 
I dont need to read any of it because no one is going to know anything until Windows 10 and DX12 titles start appearing. Why bother speculating and causing arguments that dont need to be argued?

And this is coming from what appears to us as an AMD "fanboy", right?
why bother coming here at all then? i mean a lot of the articles are stuff that a engineer would laugh at..
just how important is gpu vram bandwidth?? why the hell did they make compression and hbm?? idk but im gonna write a article and explain how the measurements might be bs:laugh:
 
why bother coming here at all then? i mean a lot of the articles are stuff that a engineer would laugh at..
just how important is gpu vram bandwidth?? why the hell did they make compression and hbm?? idk but im gonna write a article and explain how the measurements might be bs:laugh:
I dont know. Why do you bother coming here then? And why are we suddenly talking articles when all you did is pull some AMD slideshow screenshots from god knows where and link a Youtube video. You didnt even bother providing anything to your question/argument in your OP. You just posted a question in the Subject and called it a day. Then come back to argue with everyone else who posts anything.

And you didnt even source the screenshots. So please tell me why any of us in this thread should take anything you say seriously?
 
@dorsetknob hm well this site is horribly optimized for editing and finding anything so why bother?
so you just came in here to troll and say nothing at all about the topic?
@CrAsHnBuRnXp so you came in just to say that and didnt even read any of it?
starting to think half the no answers up there are just nvidia fan boys that have no understanding at all about what any of this means.

Nvidia fanboys don't make threads like this, so what does this say about you, who looks to only run AMD Gpus?
 
Status
Not open for further replies.
Back
Top