• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon RX Vega Preview

Nvidia's drivers allow for multithreaded draw calls to be decoupled from what the API does. AMD's do not.

You can argue with me all day , this is a well known fact : AMD's poor performance along the last couple of years had everything to do with a lack of multithreaded drivers.
That makes no sense. While the internal processing in the driver may utilize multiple threads, a single render pass executes its API calls from a single thread, and the internal queue ends up as a linear stream of native operations. There is no difference between AMD and Nvidia here. Lack of "multithreading" has never been the problem for GCN.
 
No, I actually understand this, and I don't refer random things from the Internet I don't comprehend.
As mentioned in #124, you can have multiple threads building a queue, but it comes at a cost of synchronization overhead. Rendering is a pipelined process of steps, you can parallelize inside each step, but the steps still has to be executed in a serial manner. So if a rendering pass consists of steps a) -> b) -> c) -> d) -> …, you can use this deferred context to have four threads submitting commands to the queue. You can't have one thread working on c) while another is working on a). Synchronization of CPU threads are very expensive, and doing it many times during a single frame will cost milliseconds. It only makes sense when the overhead of the rendering thread (engine overhead, not API overhead) is greater than the synchronization overhead, which is unusual. This code example shows a simple scene with just simple cubes, while rendering in games are much more complex, so the application of this is much more challenging. So this technique is only applicable to certain scenarios or edge cases.

As evident in the code example you clearly don't understand, this has to be designed into the rendering engine. This feature works around CPU overhead in the rendering engine itself, not the driver. I can guarantee that this is not what makes Pascal and Maxwell outperform GCN, since this is in the rendering engine's realm and outside Nvidia's control. And using deferred contexts makes no difference from the GPU side, this is purely a rendering engine optimization.
 
No, I actually understand this, and I don't refer random things from the Internet I don't comprehend.
As mentioned in #124, you can have multiple threads building a queue, but it comes at a cost of synchronization overhead. Rendering is a pipelined process of steps, you can parallelize inside each step, but the steps still has to be executed in a serial manner. So if a rendering pass consists of steps a) -> b) -> c) -> d) -> …, you can use this deferred context to have four threads submitting commands to the queue. You can't have one thread working on c) while another is working on a). Synchronization of CPU threads are very expensive, and doing it many times during a single frame will cost milliseconds. It only makes sense when the overhead of the rendering thread (engine overhead, not API overhead) is greater than the synchronization overhead, which is unusual. This code example shows a simple scene with just simple cubes, while rendering in games are much more complex, so the application of this is much more challenging. So this technique is only applicable to certain scenarios or edge cases.

As evident in the code example you clearly don't understand, this has to be designed into the rendering engine. This feature works around CPU overhead in the rendering engine itself, not the driver. I can guarantee that this is not what makes Pascal and Maxwell outperform GCN, since this is in the rendering engine's realm and outside Nvidia's control. And using deferred contexts makes no difference from the GPU side, this is purely a rendering engine optimization.

That piece of documentation was for you to understand how this concept works , they implemented a form of this optimization as an automated feature done by the driver shortly after the launch of Kepler , they even made a big deal out of it how they suddenly got xx% more performance. There are also a ton of tests done on DX11 games that confirm AMD's drivers hammer down just one core/thread at a time , while using Nvidia hardware grants a more balanced load across cores/threads.

Here : https://developer.nvidia.com/dx12-dos-and-donts

I'll just pick up some key things :

  • Consider a ‘Master Render Thread’ for work submission with a couple of ‘Worker Threads’ for command list recording, resource creation and PSO ‘Pipeline Stata Object’ (PSO) compilation
    • The idea is to get the worker threads generate command lists and for the master thread to pick those up and submit them
  • Expect to maintain separate render paths for each IHV minimum
    • The app has to replace driver reasoning about how to most efficiently drive the underlying hardware

  • Don’t rely on the driver to parallelize any Direct3D12 works in driver threads
    • On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12
    • While the total cost of work submission in DX12 has been reduced, the amount of work measured on the application’s thread may be larger due to the loss of driver threading. The more efficiently one can use parallel hardware cores of the CPU to submit work in parallel, the more benefit in terms of draw call submission performance can be expected.
But hey , this has nothing to do at AT ALL with multithreading at the driver level. I mean Nvidia clearly has no clue about what they are talking about.

Look , we're not getting anywhere , you don't want to acknowledge this is how their drivers work for one reason or another. Carry on with your belief. In this situation I suggest we best drop this discussion , this is way off topic.
 
Last edited:
That piece of documentation was for you to understand how this concept works , they implemented a form of this optimization as an automated feature done by the driver shortly after the launch of Kepler
The documentation describes using a feature to have multiple threads dispatch commands.

In #125 you said this:
Nvidia's drivers allow for multithreaded draw calls to be decoupled from what the API does. AMD's do not.

You can argue with me all day , this is a well known fact : AMD's poor performance along the last couple of years had everything to do with a lack of multithreaded drivers.
Would you please make up your mind? In one instance it's decoupled from the API, and in the next it's inside the driver?

Back to your code example, this has nothing to do with driver implementation or hardware architecture, but simply how the rendering engine interfaces with the driver. Pascal and Maxwell doesn't scale better because they interface differently with Nvidia hardware, no, both the render code and the API are in fact the same. All the rendering engine sees are queues of API commands, sometimes multiple queues, even both rendering and compute queues. The driver translates these commands into the GPU's native API. The render engine never does low level scheduling; it never knows which GPU cluster will do what in which clock cycle, it doesn't do resource dependency analysis and queue read/write operations, it doesn't estimate what will be in L2 cache and what not, etc. All of this is handled by the GPU's internal scheduler. Modern GPUs are fitted with multiple separate memory controllers. Only one GPU cluster can read from a memory bank at a time, so real-time dependency analysis is done in the GPU scheduler as it receives the queue from the driver. Whenever multiple clusters need the same resource or resources from the same bank, you'll get a stall. This is the core of the problem with utilization in GCN. This low level GPU scheduling is not only impossible from the driver and game engine side, it would also result in single frames taking minutes to render.
 
Would you please make up your mind?

I don't need to. I think I made it very clear what I had to say, I brought up enough information. You can look further into this by yourself , I can't carry on with this forever. So I say again , we better drop this discussion.
 
I honestly didn't mind this conversation at all, and would not call it way off-topic. You both were courteous to each other and cited sources wherever possible, so props for that. But I do agree it would be better to take this else where if only to not have others come in to check for updates and see something they weren't expecting.
 
Back
Top