Saturday, April 1st 2023
DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU
Microsoft has implemented two new features into its DirectX 12 API - GPU Upload Heaps and Non-Normalized sampling have been added via the latest Agility SDK 1.710.0 preview, and the former looks to be the more intriguing of the pair. The SDK preview is only accessible to developers at the present time, since its official introduction on Friday 31 March. Support has also been initiated via the latest graphics drivers issued by NVIDIA, Intel, and AMD. The Microsoft team has this to say about the preview version of GPU upload heaps feature in DirectX 12: "Historically a GPU's VRAM was inaccessible to the CPU, forcing programs to have to copy large amounts of data to the GPU via the PCI bus. Most modern GPUs have introduced VRAM resizable base address register (BAR) enabling Windows to manage the GPU VRAM in WDDM 2.0 or later."
They continue to describe how the update allows the CPU to gain access to the pool of VRAM on the connected graphics card: "With the VRAM being managed by Windows, D3D now exposes the heap memory access directly to the CPU! This allows both the CPU and GPU to directly access the memory simultaneously, removing the need to copy data from the CPU to the GPU increasing performance in certain scenarios." This GPU optimization could offer many benefits in the context of computer games, since memory requirements continue to grow in line with an increase in visual sophistication and complexity.A shared pool of memory between the CPU and GPU will eliminate the need to keep duplicates of the game scenario data in both system memory and graphics card VRAM, therefore resulting in a reduced data stream between the two locations. Modern graphics cards have tended to feature very fast on-board memory standards (GDDR6) in contrast to main system memory (DDR5 at best). In theory the CPU could benefit greatly from exclusive access to a pool of ultra quick VRAM, perhaps giving an early preview of a time when DDR6 becomes the daily standard in main system memory.
Sources:
Microsoft Dev Blogs, Zhang Doa
They continue to describe how the update allows the CPU to gain access to the pool of VRAM on the connected graphics card: "With the VRAM being managed by Windows, D3D now exposes the heap memory access directly to the CPU! This allows both the CPU and GPU to directly access the memory simultaneously, removing the need to copy data from the CPU to the GPU increasing performance in certain scenarios." This GPU optimization could offer many benefits in the context of computer games, since memory requirements continue to grow in line with an increase in visual sophistication and complexity.A shared pool of memory between the CPU and GPU will eliminate the need to keep duplicates of the game scenario data in both system memory and graphics card VRAM, therefore resulting in a reduced data stream between the two locations. Modern graphics cards have tended to feature very fast on-board memory standards (GDDR6) in contrast to main system memory (DDR5 at best). In theory the CPU could benefit greatly from exclusive access to a pool of ultra quick VRAM, perhaps giving an early preview of a time when DDR6 becomes the daily standard in main system memory.
54 Comments on DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU
And then you need to sell the innovation.
AMD is notoriously bad at both timing and selling.
Puts on some GPU board picture.
GTX285. Supports max DX11.1
Stupid artists ®
MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.
Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)
This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.
In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)
This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.
Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)
There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.
More generally, accessible by the cpu does not mean it should be used by it other than to fill it with data to be processed by the gpu. It will still be very slow to access as it is behind a pcie link, and actually using the same memory range would require memory coherency which would destroy both cpu and gpu performance by several orders of magnitude.
When DirectStorage lets them turn the entire NVME drive into an extension of that memory pool (almost like a pre-filled page file, as far as game textures are concerned) it will totally change the hardware requirements for high quality textures on those consoles, as they can now stream in the data faster
A quick google shows the PS5 can read from NVME around 5.5GB/s and the Series X at 2.4GB/s (Sometimes with faster speeds mentioned with current decompression tech) - which explains microsofts focus on direcstorage with hardware decompression of these textures, they want to move the data over to the GPU, have the GPU decompress it and not use the consoles limited CPU power to do so - they're using their software prowess to improve Direct3D to benefit their console so it can be cheaper than the competition, and make their desktop OS the 'gamers choice'
It's making what they have more efficient, which lets them not make a new console - but us windows users reap the rewards too (which is a smaller side benefit to them)
But that could probably be a good way for the CPU to modify data into VRAM without having to bring it back to main memory.
Nice stuff, but how long it will take to be used in actual games? probably 4-5 years.
Still have to have games really using Direct Storage (yeah forsaken is there, but it's just 1 games and it do not use GPU decompression). And still no games using sampler feedback.