thought with dx12, it would put any gpu together as a pool and the memory as well. something along the lines of any gpu work together to render games. no need for sli or crossfire.. was i wrong? can't remember
How Vulkan and DX12 work in general:
- Start the program and grab a list of all available physical devices (IGP, discrete GPU etc.)
- Query the physical devices to find one that supports the features you need, then create a logical device out of it.
- Attach a command pool (the thing you've heard of in articles that makes DX12 multithreaded) to the logical device.
- Throw work into a command queue then throw that command queue at the command pool to get the physical device to do work. Rinse and repeat.
The way multi-GPU works is you just create more than one logical device, each with it's own command pool. Then you can just spread the work between the different command pools.
The reason why devs don't really use it is because this is all new and they're still trying to get to grips with it.
Different GPU's have different features available and perform differently, so spreading work evenly between them usually ends up slower. So you have to load balance the work you send to the GPU which takes a bunch of testing to get just right.
Then there's the problem of cross-GPU communication. Sometimes work depends on the results of previous work (e.g. postprocessing) which could be on a different GPU, requiring communication between GPU's which add latency. That latency could end up making things slower than doing everything on a single GPU because GPUs are extremely fast and any time spent outside them (e.g. copying 3d model data from ram to vram) is wasted time. Ideally the program would require zero cross-GPU communication but that would take dev resources to redesign how they go about their renderer code.
Over time as devs learn more and tools get better, DX12/Vulkan (which are very similar) will end up replacing DX11/OpenGL* and we'll live in a magical world of cross vendor multi-GPU.
Of course the closet Mormons say that will never happen but they said the same about DX9, DX10, DX11, dualcores, quadcores and now 6/ cores. So they can safely be ignored.
*Khronos said Vulkan isn't supposed to replace Vulkan but that's what will happen. Graphics programming is commonly referred to as black magic or a dark art by other programmers because, while it may be easy to do a basic "hello world", it's extremely time consuming getting good at it. The reason for that is that pre-DX12, the API was a black box (a bloated finite state machine that does mysterious things) which was interpreted by the driver, another black box (also bloated because it's interpreting a bloated thing), which in turn relays commands to the GPU, another (you guessed it) black box.
The way you learn a black box is by throw it a bunch of inputs then seeing what it ouputs. This is time consuming enough for a single black box but when you have 3 layers of blackboxes, each interchangeable, you can imagine how this becomes a dark art.
With DX12/Vulkan, instead of a bloated FSM they give you a model of a GPU (which accurately maps to how modern GPUs work) and give you functions to directly interact with different parts of that model, removing its blackbox nature.
Since the model maps almost one-to-one with the physical GPU drivers can be a much leaner (which is why AMD is so good now, they aren't hampered by their bad drivers anymore). The driver ends up being just a glorified conversion layer which you can ignore because it's not doing anything special, removing it's blackbox nature.
And finally, since the model maps so closely to the physical GPU and the driver is ignorable the GPU also stops being a black box.
Writing a basic "hello world" is harder in DX12/Vulkan because you can essentially interact with any part of the GPU in any order. However, once you've written that "hello world" program you will have learnt most of the API. I've messed around with Vulkan and I'd compare it to C. It doesn't hold your hand but it does make eveything very transparent, so you can easily tell what your code is doing at the hardware level. It even feels the same with all the memory management.