Saturday, April 1st 2023

DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU

Microsoft has implemented two new features into its DirectX 12 API - GPU Upload Heaps and Non-Normalized sampling have been added via the latest Agility SDK 1.710.0 preview, and the former looks to be the more intriguing of the pair. The SDK preview is only accessible to developers at the present time, since its official introduction on Friday 31 March. Support has also been initiated via the latest graphics drivers issued by NVIDIA, Intel, and AMD. The Microsoft team has this to say about the preview version of GPU upload heaps feature in DirectX 12: "Historically a GPU's VRAM was inaccessible to the CPU, forcing programs to have to copy large amounts of data to the GPU via the PCI bus. Most modern GPUs have introduced VRAM resizable base address register (BAR) enabling Windows to manage the GPU VRAM in WDDM 2.0 or later."

They continue to describe how the update allows the CPU to gain access to the pool of VRAM on the connected graphics card: "With the VRAM being managed by Windows, D3D now exposes the heap memory access directly to the CPU! This allows both the CPU and GPU to directly access the memory simultaneously, removing the need to copy data from the CPU to the GPU increasing performance in certain scenarios." This GPU optimization could offer many benefits in the context of computer games, since memory requirements continue to grow in line with an increase in visual sophistication and complexity.
A shared pool of memory between the CPU and GPU will eliminate the need to keep duplicates of the game scenario data in both system memory and graphics card VRAM, therefore resulting in a reduced data stream between the two locations. Modern graphics cards have tended to feature very fast on-board memory standards (GDDR6) in contrast to main system memory (DDR5 at best). In theory the CPU could benefit greatly from exclusive access to a pool of ultra quick VRAM, perhaps giving an early preview of a time when DDR6 becomes the daily standard in main system memory.

Sources: Microsoft Dev Blogs, Zhang Doa
Add your own comment

54 Comments on DirectX 12 API New Feature Set Introduces GPU Upload Heaps, Enables Simultaneous Access to VRAM for CPU and GPU

#1
Selaya
not sure if april fools
Posted on Reply
#2
Zunexxx
Selayanot sure if april fools
Nah, it was posted 2 days ago on the Microsoft dev channel and blog.
Posted on Reply
#5
silent majority
Will 12 GB be the de facto standard for graphics cards?
Posted on Reply
#6
Dr. Dro
silent majorityWill 12 GB be the de facto standard for graphics cards?
Yeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.
Posted on Reply
#7
silent majority
Dr. DroYeah. GPUs with less than 12 GB will have varying degrees of trouble with games going forward. 4-6 GB cards will be confined to lowest settings and low resolutions. That accompanies the past two generations of 8-12 GB GPUs being readily available at the midrange + 32 GB RAM being the new de facto standard for main memory.
Thank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.
Posted on Reply
#8
R0H1T
One step closer to AMD's decade old hUMA/HSA dream :toast:
Posted on Reply
#9
Dr. Dro
R0H1TOne step closer to AMD's decade old hUMA/HSA dream :toast:
Indeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.
silent majorityThank you. I'm Japanese but using a 3060Ti. I'll use it as a reference.
けっこう, the 3060 Ti is still a great card, it might have a little trouble at 1440p or higher because of the 8 GB when you have ray tracing features enabled, but it should run games at 1080p very well for the foreseeable future :clap:
Posted on Reply
#10
Vayra86
Dr. DroIndeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.
Every innovation needs the right time and the right market conditions to sell, really.

And then you need to sell the innovation.

AMD is notoriously bad at both timing and selling.
Posted on Reply
#11
R0H1T
I'd argue timing, they're great at selling! Just look at the first (dual core) Athlons, Zen & to a much lesser extent the early GCN cards. If you remove the cult of JHH (or fruity loops) AMD probably has the most loyal supporters out there!
Posted on Reply
#12
pavle
Interesting how the thing about which there was much hot air blown out in the past is now finally being realised have to keep that DirectX relevant.
Posted on Reply
#13
Ferrum Master
One makes an article about DX12.

Puts on some GPU board picture.

GTX285. Supports max DX11.1

Stupid artists ®
Posted on Reply
#14
persondb
Dr. DroIndeed. AMD has some incredibly forward-thinking engineers, it puzzles me why do they end up in the "Hardware Vendor #3" situation over and over again.
It's easy to put those future goals and objectives. It's extraordinarily hard to implement it, as it isn't just a matter of developing a single thing but the whole platform, ecosystem and the applications that run on it. I don't think that AMD current APUs even support their past dream(hardware wise), I believe that there is no coherency between CPU and GPU caches and memory access between them necessarily need to go straight to memory.
Posted on Reply
#15
InhaleOblivion
I'm amazed that Microsoft didn't save this for DirectX 13. It must not actually speed up performance in any substantial way.
Posted on Reply
#16
Mussels
Freshwater Moderator
Commented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)


This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.


In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents
Thanks to the architecture of the new WDDM (Windows Display Driver Model), applications now create Direct3D 10 resources with different usage flags to indicate how the application intends on using the resource data. The new driver model virtualizes the memory used by resources; it then becomes the responsibility of the operating system/driver/memory manager to place resources in the most performant area of memory possible given the expected usage.
Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)
Posted on Reply
#17
Steevo
MusselsCommented on this in another thread

MS is always working on things like this, always has been. They've always wanted their OS to run well (to sell it on more PC's) and now they have a gaming console using the same code under the hood, they really do care about performance - this is the sort of thing they can slip into all their Xbox games in a dev environment, then make a big media fuss about free performance for all Xbox owners if it works out.

Game devs just either use them silently, or cover them up with something that costs more performance than they gained (CoH and the '-nolitter' DX10 command come to mind, then Crysis)


This seems like it had to be developed to work with DirectStorage since it's been getting a lot of traction this year - anything fed from NVME-GPU had zero options to be modified or altered in any way, and this is a backup method without needing to revert to the older system. Could be as simple as a single line of code for a bugfix, that saves them having to send a 15GB texture pack out to every client.


In the far reaches of human history (Prior to DX9) Everything was essentially duplicated into system RAM as the CPU did all the work, so it needed a live copy of the active data to decompress, modify, whatever.
DX10 reduced a lot of this, allowing it to be fed over more efficiently with less CPU overhead, but it wasn't documented super well other than vague descriptions of "less CPU calls" and really nerdy documents


Example: Vistas Aero interface could run in an emulated software mode or a hardware mode reducing CPU usage.
Windows 7 had the option for a DX9 mode or full DX10.1 hardware acceleration freeing up CPU resources and system RAM - those poor office workers needed every megabyte they could save (and then their intel IGP took it from system ram anyway, ironically)

This was why a lot of low-end vista laptops (Thanks intel atom) felt slugglishly crap, as the IGP couldnt do hardware mode and the CPU's were too weak to do the animations smoothly.

Windows 7 DWM cuts memory consumption by 50% | istartedsomething
I remember quoting this to @W1zzard years ago and being unable to find the source, finally did! Thanks google!
(The 50% reduction was simply that they didn't have to duplicated it any longer - theres a comment from 2008 that guessed it all the way back then)
I memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.
Posted on Reply
#18
TumbleGeorge
I think it is important to have many and different developments. For the future AI to have something to sort, compare and judge by quality and appropriateness, so as to assemble the next, many times better complex API.
Posted on Reply
#19
biggermesh
I think the author mis interpreted the blog post a bit. This buffer type is meant to reduce the amount of copy necessary to put data in the memory accessible by the gpu for computing, NOT reduce the amount of duplicated data between cpu and gpu of which there's very little.

More generally, accessible by the cpu does not mean it should be used by it other than to fill it with data to be processed by the gpu. It will still be very slow to access as it is behind a pcie link, and actually using the same memory range would require memory coherency which would destroy both cpu and gpu performance by several orders of magnitude.
Posted on Reply
#20
Mussels
Freshwater Moderator
SteevoI memeber that.

There are features that having a unified hardware between console and PC that have yet to be discovered in the name of performance and I for one look forward to it. AMD also has had a long history of working with MS to improve DX performance.
Directstorage and all it's accidental improvements is the biggest one i can think of in recent history - clearly designed to benefit the current console design with a large pool of 'memory' that software dictates is RAM or VRAM, so having *anything* duplicated there is silly and redundant.

When DirectStorage lets them turn the entire NVME drive into an extension of that memory pool (almost like a pre-filled page file, as far as game textures are concerned) it will totally change the hardware requirements for high quality textures on those consoles, as they can now stream in the data faster


A quick google shows the PS5 can read from NVME around 5.5GB/s and the Series X at 2.4GB/s (Sometimes with faster speeds mentioned with current decompression tech) - which explains microsofts focus on direcstorage with hardware decompression of these textures, they want to move the data over to the GPU, have the GPU decompress it and not use the consoles limited CPU power to do so - they're using their software prowess to improve Direct3D to benefit their console so it can be cheaper than the competition, and make their desktop OS the 'gamers choice'
Posted on Reply
#21
TumbleGeorge
It smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.
Posted on Reply
#22
Mussels
Freshwater Moderator
TumbleGeorgeIt smells to me like a continuation of the trend started by Nvidia. We're selling you less and cheaper hardware for more money because we've found a way to make you believe it's more productive.
It's both.
It's making what they have more efficient, which lets them not make a new console - but us windows users reap the rewards too (which is a smaller side benefit to them)
Posted on Reply
#23
Punkenjoy
I am not sure if the CPU could use the huge bandwidth of the Graphics cards. The latency alone would kill it.

But that could probably be a good way for the CPU to modify data into VRAM without having to bring it back to main memory.

Nice stuff, but how long it will take to be used in actual games? probably 4-5 years.

Still have to have games really using Direct Storage (yeah forsaken is there, but it's just 1 games and it do not use GPU decompression). And still no games using sampler feedback.
Posted on Reply
#24
Vya Domus
R0H1TOne step closer to AMD's decade old hUMA/HSA dream
CPU + GPU unified memory architecture is nothing new, just hasn't been done for consumer level software but you can get unified memory with CUDA or HIP in Linux right now.
Posted on Reply
#25
R0H1T
It's also available with CDNA & EPYC IIRC.
Posted on Reply
Add your own comment
Jan 18th, 2025 01:47 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts