Wednesday, April 21st 2021

DirectStorage API Works Even with PCIe Gen3 NVMe SSDs
Microsoft on Tuesday, in a developer presentation, confirmed that the DirectStorage API, designed to speed up the storage sub-system, is compatible even with NVMe SSDs that use the PCI-Express Gen 3 host interface. It also confirmed that all GPUs compatible with DirectX 12 support the feature. A feature making its way to the PC from consoles, DirectStorage enables the GPU to directly access an NVMe storage device, paving the way for GPU-accelerated decompression of game assets.
This works to reduce latencies at the storage sub-system level, and offload the CPU. Any DirectX 12-compatible GPU technically supports DirectStorage, according to Microsoft. The company however recommends DirectX 12 Ultimate GPUs "for the best experience." The GPU-accelerated game asset decompression is handled via compute shaders. In addition to reducing latencies; DirectStorage is said to accelerate the Sampler Feedback feature in DirectX 12 Ultimate.More slides from the presentation follow.
Source:
NEPBB (Reddit)
This works to reduce latencies at the storage sub-system level, and offload the CPU. Any DirectX 12-compatible GPU technically supports DirectStorage, according to Microsoft. The company however recommends DirectX 12 Ultimate GPUs "for the best experience." The GPU-accelerated game asset decompression is handled via compute shaders. In addition to reducing latencies; DirectStorage is said to accelerate the Sampler Feedback feature in DirectX 12 Ultimate.More slides from the presentation follow.
76 Comments on DirectStorage API Works Even with PCIe Gen3 NVMe SSDs
compatible means some features are emulated, or missing for optional ones - and direct storage may use those optional ones
My guess is games designed for this tech will have faster than average load times anyway, since they're being optimised for SSD's and not 4,200RPM laptop drives in consoles
Hopefully 'it just works'.
I would not be shocked to find out its a DX12 ultimate feature
(I'd be happy if it wasnt)
Maybe APU's would benefit the most from something like that, as they're dealing with slower system RAM anyway
What we really need is a giant GPU in the ATX standard with a PCI-E card to slot in the CPU
GPU Memory Latency Tested on AMD's RDNA 2 and NVIDIA's Ampere Architecture
and backed up with this video
In theory it is possible to stretched into DirectX 11 title by simply removing performance target or some kind of limitation (in this case the transfer rate), but I don't think impact will be significant as the Variable Rate Shading and Resources Binding features that are embedded in DirectX 12.
I guess the software overhead etc was the reason it never took off.
But texture popping is caused by a missing resource, and the only real way to avoid that is prefetching. So this technology by itself will not solve that problem, but you can certainly build a game engine which prefetches textures combined with this technology. I would highly recommend having a separate boot drive. The OS will cause a lot of wear on TLC/QLC SSDs, so you better have your files somewhere else.
If your motherboard has a free PCE 4x slot, you can buy a M.2 adapter for it.
All the files that I care about amount to only ~30GB and are backed up monthly on my Dropbox. Everything else can be downloaded & reinstalled easily, and I can also reconfigure my OS easily because I keep my .reg files, redists, and other necessary tweaks also on my Dropbox.
The NVMe is solely for OS files and programs I can redownload.
Also nobody gonna answer if the NVMe needs to be attached to CPU for DirectStorage to work?
That's how next-gen consoles are. The SSD connects directly to the APU. Last-gen consoles had the HDD connected through the southbridge (which isn't a big deal for HDDs or SATA SSDs).
So, if you have an old Intel PC (Rocket Lake introduced 4 dedicated lanes for NVMe, just like AMD Ryzen/AM4 since 2017) where the NVMe is attached to the PCH (southbridge), you're screwed. There's no benefit, so expect to upgrade your platform.
PCH connection is usually 4 lanes (4 GB/s) and any NVMe worth its salt is going to saturate that bus (let alone the fact you may also have SATA HDDs, Gigabit Ethernet, TV tuner card etc.)
The current software stack is from a time where HDD was common, so the overhead didn't matter.
At most you just need install whatever game that supports DirectStorage on your 980 Pro.
The reality is the PCH is mostly idle, which is why the current system works, its oversubscribed in theory, but the vast majority of people are not fully utilising several PCH connected devices at the same time. Also I expect in most games a pcie3 drive would not typically be maxed out either. The benefits of directstorage is the extra io operations/sec not so much the overall burst bandwidth. It will work just fine, like how a 3080 can work fine on pcie3x8.
On the xbox it works via pch.
That's where the GPU is attached, along with NVMe (only for AM4/AMD Zen so far and some recent Intel platforms).
2) Nope. When Ratchet gets ported on PC, you'll understand what I'm talking about. You need raw bandwidth too for instant portal switching.
3) Have you studied the XBOX Series architecture? The NVMe is connected directly to the APU (SoC/uncore part), not the southbridge (that's a separate chip).
Pretty sure you haven't even seen XBOX Series PCB pics (there are 2 PCBs).
Come on guys, there's tons of info out there, educate yourselves! :)
Of course, those who use Linux are aware that it has always been ahead of the curve in terms of storage which is something that I would advise to read up on FIO and elbencho, although the former works on Windows. Billy Tallis of AnandTech has been posting a lot of Linux and storage-related content on Reddit, including async I/O.
One region is P2PDMA (or P2P DMA), which is basically what we have here and is associated with other technologies such as GPUDirect.
Notably, P2PDMA is compatible with a wide variety of chipsets, for example, "all AMD Zen chipsets."
"Root complex functionality may be implemented as a discrete device (northbridge chip), or may be integrated in the CPU."
It's a matter of pure geography: both the GPU and the SSD need to be as close as possible.
If you care to study console motherboards/PCBs, you'll notice that the SSD lanes (4 of them) lead straight to the APU chip, not the PCH.
I don't know why you have to confuse all these things.
Even if you have ample of PCH bandwidth (like on TRX40), you're going to experience more latency if the SSD is not connected directly to the GPU via the SoC (PCIe root complex).
Game devs want guaranteed things: this means that if my PC has 6 x SATA HDDs in RAID0 seeding torrents, a TV tuner card recording stuff and a Gigabit Ethernet connection, saturation is inevitable.
The only way to guarantee (via DirectStorage API) zero saturation is by enforcing direct GPU <-> SSD communication via the SoC/northbridge. There's no other way.
There's a reason AMD dedicated 4 lanes to the NVMe since 2017. Intel was late in the game (Rocket Lake supports it, but only if the mobo has the actual PCB traces obviously).
2) X570 is not a normal chipset/southbridge, it's a hack.
B450/X470/B550 are the equivalent of southbridge for AMD.
Again: why do you have to confuse all these things?
AMD has 4 dedicated lanes since 2017. X570 is not needed and in fact many people avoid it (due to active cooling and a certain SATA bug).
3) Linux is a server-oriented OS, so of course it would have a more advanced I/O stack (among other things).
Amd/comments/fwh7q0Amd/comments/fwh7q0/_/fo7rc8r