• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

Joined
Feb 20, 2019
Messages
8,280 (3.93/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Will this play nice with DirectStorage or is it going to be another Nvidia black box that only Nvidia 3000-series customers get to beta test for Jensen?
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Death Stranding: 64 GB
Horizon Zero Dawn: 72 GB
Mount & Blade 2: 51 GB
Red Dead Redemption 2: 110 GB
Star Citizen: 60 GB
...and? We could all list a bunch of random games at random sizes. CoD: Warzone is still 175GB, and other high budget AAA games are likely to exceed this soon. I never said all games were at that level, but there will be plenty of >100GB games in the next couple of years.

Will this play nice with DirectStorage or is it going to be another Nvidia black box that only Nvidia 3000-series customers get to beta test for Jensen?
According to Anandtech:
At a high level this appears to be NVIDIA’s implementation of Microsoft’s forthcoming DirectStorage API
 
Joined
Dec 26, 2016
Messages
287 (0.10/day)
Processor Ryzen 3900x
Motherboard B550M Steel Legend
Cooling XPX (custom loop)
Memory 32GB 3200MHz cl16
Video Card(s) 3080 with Bykski block (custom loop)
Storage 980 Pro
Case Fractal 804
Power Supply Focus Plus Gold 750FX
Mouse G603
Keyboard G610 brown
Software yes, lots!
...and? We could all list a bunch of random games at random sizes. CoD: Warzone is still 175GB, and other high budget AAA games are likely to exceed this soon. I never said all games were at that level, but there will be plenty of >100GB games in the next couple of years.

Yes, there WILL be. But there ARE not. SSDs prices WILL drop too.
 
Joined
Sep 3, 2020
Messages
10 (0.01/day)
Processor i5 13600KF
Motherboard Z790 Steel Legend WiFi
Cooling Artic Freezer 2 420mm
Memory 32GB DDR5 6000mhz
Video Card(s) RTX 3080 FTW3 Ultra EVGA
Storage S70 Blade 1TB
Display(s) S2721DGF
Case Corsair 7000D Airflow
Power Supply Seasonic 1300w Gold
Mouse Logitech G Pro X Superlight
Joined
Oct 12, 2019
Messages
128 (0.07/day)
What review sites do you know of that systematically only tests games in RT mode? Sure, RT benchmarks will become more of a thing this generation around, but I would be shocked if that didn't mean additional testing on top of RT-off testing. And comparing RT-on vs. RT-off is obviously not going to happen (that would make the RT-on GPUs look terrible!).

You misunderstood. Look at CPU testing benchmarks. 2080 Ti, of course, and way too much games on Ultimate details in FHD, as it proves anything, since most are over 100FPS. Now replace it with 3090 and add 'RT' games and it will mean even less. 'Reducing GPU bottleneck' is two-handed sword, because it skips benchmarks that may actually mean something on tested 200g CPU paired with GPU that person buying mentioned processor would actually consider buying - which is certainly not 6x more expensive. Real gaming weaknesses might stay hidden, because of this.

As for RT, as I said - I believe nothing until I see it. Right now, I think NVIDIA will pressure benchmarking sites to include RT titles in benchmark suite.

As for looking terrible, what if it is? Should we hide those results? I think not, publish them and publish screenshots/videos and see is it worth investing +700g in 3090 over 3080 or whatever else... Buyers should decide, based on true input - quality/quantity included.
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Yes, there WILL be. But there ARE not. SSDs prices WILL drop too.
I just showed you that there are. As for SSD prices dropping: sure, but there is no way on earth they will be dropping more than 20% per year - not until we have PLC SSDs, at least. Silicon manufacturing is expensive. The increases in game install size - which have already been accelerating for 5+ years, alongside increases in resolution and texture quality, which there is no reason to expect a slowdown of - will far outstrip any drops in SSD pricing.

You misunderstood. Look at CPU testing benchmarks. 2080 Ti, of course, and way too much games on Ultimate details in FHD, as it proves anything, since most are over 100FPS. Now replace it with 3090 and add 'RT' games and it will mean even less. 'Reducing GPU bottleneck' is two-handed sword, because it skips benchmarks that may actually mean something on tested 200g CPU paired with GPU that person buying mentioned processor would actually consider buying - which is certainly not 6x more expensive. Real gaming weaknesses might stay hidden, because of this.

As for RT, as I said - I believe nothing until I see it. Right now, I think NVIDIA will pressure benchmarking sites to include RT titles in benchmark suite.

As for looking terrible, what if it is? Should we hide those results? I think not, publish them and publish screenshots/videos and see is it worth investing +700g in 3090 over 3080 or whatever else... Buyers should decide, based on true input - quality/quantity included.
RT certainly doesn't remove any GPU bottleneck - it introduces a massive new one! - so it will never be used for CPU testing by any reviewer with even a modest amount of knowledge of their field.

As for the rest of that part: that's a critique completely unrelated to this thread, Ampere, and any new GPU in general. It's a critique of potential shortcomings of how most sites do GPU and CPU testing. And it is likely valid to some degree, but ... irrelevant here. I agree that it would be nice to see tests run on lower end hardware too, but that would double the workload on already overworked and underpaid reviewers, so it's not going to happen, sadly. At least not until people start paying for their content.

Nvidia won't have to pressure anyone to include RT titles in their benchmark suites. Any professional reviewer will add a couple of titles for it, as is done with all new major features, APIs, etc. That's the point of having a diverse lineup of games, after all. And any site only including RT-on testing would either need to present a compelling argument for this, or I would ignore them as that would be a clear sign of poor methodology on their part. That has nothing to do with Nvidia.

And I never commented on your opinions of how current RT looks, so I have no idea who you're responding to there.
 
Joined
Dec 26, 2016
Messages
287 (0.10/day)
Processor Ryzen 3900x
Motherboard B550M Steel Legend
Cooling XPX (custom loop)
Memory 32GB 3200MHz cl16
Video Card(s) 3080 with Bykski block (custom loop)
Storage 980 Pro
Case Fractal 804
Power Supply Focus Plus Gold 750FX
Mouse G603
Keyboard G610 brown
Software yes, lots!
Another point would be power consumption, I'd rather decompress stuff once and have a big game dir instead of continually having my GPU/CPU decompressig stuff and using power forthat just to dump it a second later and do it all over again and again...
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Another point would be power consumption, I'd rather decompress stuff once and have a big game dir instead of continually having my GPU/CPU decompressig stuff and using power forthat just to dump it a second later and do it all over again and again...
That's a decent point, but it fails in the face of practicalities. Especially with a dedicated hardware decompression block you could probably decompress hundreds of petabytes of game data for the price of a 1TB NVMe SSD.
 
Joined
Dec 26, 2016
Messages
287 (0.10/day)
Processor Ryzen 3900x
Motherboard B550M Steel Legend
Cooling XPX (custom loop)
Memory 32GB 3200MHz cl16
Video Card(s) 3080 with Bykski block (custom loop)
Storage 980 Pro
Case Fractal 804
Power Supply Focus Plus Gold 750FX
Mouse G603
Keyboard G610 brown
Software yes, lots!
As I understand nVidia its not a dedicated fixed function (like video en/decoding) unit but rather Tensor/Cuda cores that get allocated to do this. Which would impact rendering performance and cost electrical power.

If it was a fixed function block, why would Turing be able to do it? Its not like Huang would have a new hardware function in Turing and forgot to brag about it for two years.
 
Joined
May 19, 2011
Messages
107 (0.02/day)
Maybe Intel can put its iGPU to use and pull off something similar? Once it gets on Gen 4 at least.
 
Joined
Nov 11, 2016
Messages
3,412 (1.16/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
Last edited:
Joined
Jul 7, 2019
Messages
916 (0.47/day)
This sounds similar to something AMD said they've been working on awhile back, where their future GPUs would be able to directly talk to the storage system for the necessary assets and bypass the CPU. And true, we saw a form of it in the consoles. MS decided to use their DirectStorage API, whereas Sony used a fairly powerful custom controller instead. Here's to hoping RDNA2 is also able to directly talk to SSD/NVMe drives as well.

I do wonder if that special workstation GPU of AMD's, that allowed installation of an NVMe as a large cache drive, also helped that concept along.
 
Joined
Dec 26, 2016
Messages
287 (0.10/day)
Processor Ryzen 3900x
Motherboard B550M Steel Legend
Cooling XPX (custom loop)
Memory 32GB 3200MHz cl16
Video Card(s) 3080 with Bykski block (custom loop)
Storage 980 Pro
Case Fractal 804
Power Supply Focus Plus Gold 750FX
Mouse G603
Keyboard G610 brown
Software yes, lots!
No man, throughput has never been the problem. Every now an then there are articles here on Techpowerup comparing 4, 8 and 16 PCIe Lanes for GPUs and it always turns out that the impact of pcie troughput is marginal.
Its all about Disk i/o and decompressing assets.
 
Last edited:
Joined
Oct 12, 2019
Messages
128 (0.07/day)
RT certainly doesn't remove any GPU bottleneck - it introduces a massive new one! - so it will never be used for CPU testing by any reviewer with even a modest amount of knowledge of their field.

Yeah, that's what I said, too. Now CPU benchmarks will be burdened with really uncommon workload, ray-tracing is far from being just GPU-based. Unlike you, I think virtually all benchmarks will have something RT-based, which is kinda bad because it doesn't concern 99% users right now.

I said what I think of supposed GPU-accelerated storage - I'll believe it when I see it. NVIDIA put a lot of dubious claims lately - I actually know quite a bit about ray-tracing and know they are misrepresenting what they do, but they also ray-trace *sound* (sic) and whatnot - now they accelerate PCIe4 M.2. Right. I need to see real-life proof that it's happening and what are the gains.
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Yeah, that's what I said, too. Now CPU benchmarks will be burdened with really uncommon workload, ray-tracing is far from being just GPU-based. Unlike you, I think virtually all benchmarks will have something RT-based, which is kinda bad because it doesn't concern 99% users right now.

I said what I think of supposed GPU-accelerated storage - I'll believe it when I see it. NVIDIA put a lot of dubious claims lately - I actually know quite a bit about ray-tracing and know they are misrepresenting what they do, but they also ray-trace *sound* (sic) and whatnot - now they accelerate PCIe4 M.2. Right. I need to see real-life proof that it's happening and what are the gains.
Making their GPUs do their own decompression isn't exactly a dubious claim, I mean, MS has made a new API for it (well, for hardware accelerated decompression to offload the CPU) that is also used on the upcoming Xboxes. As such it is likely to be a standard feature of many games in a couple of years. Ray tracing sound is of course a bit weird, but it makes sense - sound waves propagate in a way that can be simplified down to a collection of rays, including bouncing off surfaces and the like, though of course this behaviour (as well as bending around corners etc.) is different than how light rays behave, and it will always be a simplification rather than a simulation of how real sound waves work. It's still a technique that can allow for far more realistic spatial audio than we have today. As for Nvidia misrepresenting their RTRT implementation, I'll leave it to you to flesh that out, as IMO it's pretty clear that what we have currently is relatively low-performance and not really suited to fully path traced graphics, but well suited for low bounce count light effects like reflections and global illumination. Even for that performance still needs to increase, but it's passable as a first-generation solution of something that was previously entirely possible (non-real time RT has of course been around for ages). Nvidia does like to sell RTX as the second coming of raptor Jesus, but it's not like they're saying rasterization is dead or fully path traced games are going to be the standard from now on.
 
Joined
Nov 11, 2016
Messages
3,412 (1.16/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
No man, throughput has never been the problem. Every now an then there are articles here on Techpowerup comparing 4, 8 and 16 PCIe Lanes for GPUs and it always turns out that the impact of pcie troughput is marginal.
Its all about Disk i/o and decompressing assets.

Huh did you read the DirectStorage blog from MS ?
Basically with the new API, an NVMe drive can easily saturate its max bandwidth.
Pretty much all modern NVMe drive can handle >300 000 IOPs, at 64K blocks size that >19.2GB/s of bandwidth
PCIe Gen 4 x4 max bandwidth is 7.8GB/s.
Now when you compress/decompress the data stream, the effective bandwidth is 14GB/s as noted in Nvidia slide. Which is even higher throughput than using RAMDISK on Dual Channel DDR4.
I guess another option to increase throughput is using 2 PCIe 4.0 x4 NVMe in RAID 0. Either way you have plenty of option to take advantage of MS DirectStorage API: RAID 0, high core count CPU or a Nvidia GPU.
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
Perfect example of why I think AMD should make a infinity fabric mGPU/GPU bridge put a individual memory chip or a pair in dual channel a pair of M.2 slots and a CPU chip. Have it host it's own OS and do compression/decompression on the fly along with cache acceleration like StoreMi completely offloaded from the primary socketed CPU/RAM/Storage and OS including those random Windows 10 background telemetry and updates nonsense for HBCC. They could actually do that pretty easily and have like 2-4 cores they could dedicated to it. They'd probably have extra processing overhead too to have a one of the newer revision protocol USB headers for a front panel device as well for that matter. It's could be semi multi-purpose yet directly tie in and be compatible with it's GPU's and a good resource offload device. It sit in a PCIe x1 slot and draw both power and additional bandwidth from that too may as well and good spot to mount it and allows for a 1-slot blower and cooler for it to keep it ice cool.
1599161417945.png
 
Joined
Oct 12, 2005
Messages
707 (0.10/day)
The compression is not just useful for saving SSD space but also for bandwidth saving.

Let say you have a link that can send 10 GB/s. You want to send 2 GB uncompress or 1 GB compress. The first one will take at least 200 ms where the second one would take 100 ms.

This is just for pure data transfer but you can see how it can reduce latency on large transfers

Also, these days the major energy cost come from moving the data around and not from doing the calculation itself. If you can move the data in a compressed state, you can save power there.

But what I would like to know is can we just uncompress just before using it and continue to save on bandwidth and storage while it sit in GPU memory? Just in time decompression!

That do not seem to do that there but I think it would be the thing to do as soon as we can have decompression engine fast enough to handle the load.

Perfect example of why I think AMD should make a infinity fabric mGPU/GPU bridge put a individual memory chip or a pair in dual channel a pair of M.2 slots and a CPU chip. Have it host it's own OS and do compression/decompression on the fly along with cache acceleration like StoreMi completely offloaded from the primary socketed CPU/RAM/Storage and OS including those random Windows 10 background telemetry and updates nonsense for HBCC. They could actually do that pretty easily and have like 2-4 cores they could dedicated to it. They'd probably have extra processing overhead too to have a one of the newer revision protocol USB headers for a front panel device as well for that matter. It's could be semi multi-purpose yet directly tie in and be compatible with it's GPU's and a good resource offload device. It sit in a PCIe x1 slot and draw both power and additional bandwidth from that too may as well and good spot to mount it and allows for a 1-slot blower and cooler for it to keep it ice cool.
View attachment 167594


I think the future might be interesting. If AMD want to be the leader on PC, they might bring to PC OMI (Open memory interface) where the memory or storage is attached to the CPU via a super fast serial bus (Using way less pin and die space than modern memory technology). The actual memory controller would be shifted directly on the memory stick. The CPU would become memory agnostics. You could upgrade your CPU or memory independly. Storage (like optane) could also be attach via this.

The pin count is much smaller than with modern memory so you can have way more channel if required.
1599164380782.png


This is based on the OpenCAPI protocol. OpenCAPI itself would be used to attach any kind of accelerator. The chiplet architecture from AMD would probably make it easy for them to switch to these kinds or architecture and it's probably the future.

These are open standard pushed by IBM but i would see AMD using them or pushing their own standard in the future that have a similar goal. With these standard, the GPU could connect directly to the Memory controler and vice versa.
 
Last edited:
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
The compression is not just useful for saving SSD space but also for bandwidth saving.

Let say you have a link that can send 10 GB/s. You want to send 2 GB uncompress or 1 GB compress. The first one will take at least 200 ms where the second one would take 100 ms.

This is just for pure data transfer but you can see how it can reduce latency on large transfers

Also, these days the major energy cost come from moving the data around and not from doing the calculation itself. If you can move the data in a compressed state, you can save power there.

But what I would like to know is can we just uncompress just before using it and continue to save on bandwidth and storage while it sit in GPU memory? Just in time decompression!

That do not seem to do that there but I think it would be the thing to do as soon as we can have decompression engine fast enough to handle the load.




I think the future might be interesting. If AMD want to be the leader on PC, they might bring to PC OMI (Open memory interface) where the memory or storage is attached to the CPU via a super fast serial bus (Using way less pin and die space than modern memory technology). The actual memory controller would be shifted directly on the memory stick. The CPU would become memory agnostics. You could upgrade your CPU or memory independly. Storage (like optane) could also be attach via this.

The pin count is much smaller than with modern memory so you can have way more channel if required.
View attachment 167596

This is based on the OpenCAPI protocol. OpenCAPI itself would be used to attach any kind of accelerator. The chiplet architecture from AMD would probably make it easy for them to switch to these kinds or architecture and it's probably the future.

These are open standard pushed by IBM but i would see AMD using them or pushing their own standard in the future that have a similar goal. With these standard, the GPU could connect directly to the Memory controler and vice versa.
Interesting though how cost effective would it b relative to the performance and storage capacity. If AMD wanted to do it cheaper they could just pair 2GB DDR4 high frequency chip and some extra GDDR6X preferably a bit faster than RDNA2 for example utilizes or as fast anyway and just a inexpensive 2-4c CPU that handles offloading all the cache acceleration, compression, decompression and provides a bit of persistent storage at the same time that HBCC can tap into throw on a 1 slot cooler and blower fan. I really think AMD could do all that for maybe $150's +/- $25-$50's and it would be a really good way of extending GPU VRAM and performance. Additionally it's just a really great way to add-in a storage acceleration card.

AMD could sell a lot of those types of devices outside of gaming as well it's a device that is desirable in today's society in other area's ML, data centers, ect...could have the microSD slot be PCIe based and slot it into a PCIe x1 slot as well doubt it would add much extra to the cost plus it's a good way to draw power for all those things I would think 75w would be actually overkill to power those things and probably closer to like what 10w to 25w!!? About the only thing that would draw much power would be the CPU and 2-4c CPU isn't gonna draw crap for power these days could just use a mobile chip great way for AMD to bin those further in fact kill 2 birds with one stone. Like mentioned the compression saves a lot of bandwidth/latency and part of VRAM usage is old data that's still stuck in usage because of bandwidth and latency constraints getting in the way which if you can speed those things up you obviously stream in and out the data more effectively and quickly at any given moment. Just having that extra capacity space to send the data to and fetch back quickly would be big with HBCC. It would be interesting if it reduced latency enough to pretty much eliminate most of the CF/SLI micro stutter negatives as well. What you mention sound cool, but I'm really unsure about the cost aspect on that. I think what I mention could be done at good entry price point and of course AMD could improve them gradually yearly or every other year. You might actually speed up your own GPU with one and not have to buy a whole god damn new card in the process.

To me this is kind of what AMD should do with it's GPU's integrate just a fraction of this onto a GPU and make future GPU's compatible with it so any ugprade is compatible with it and help do some mGPU assigning the newer quicker card to cache accelerate/compress/decompress/offload possibly a touch on the fly post process while matching the other cards performance capabilities simply uses excess CPU/GPU resources for all the other improvement aspects. Really provided the two cards aren't too far apart in age and specs I'd think it would work alright. In cases were there are more glaring gaps certain features could just be enabled/disabled in terms of what it accelerates be it rendering, storage, compression/decompression perhaps it's older and it just handles the cache and compression on one card and the newer card handles all the other stuff. I just can't see how it could be a bad thing for AMD to do they can use it to help bin mobile chips better which is more lucrative add a selling feature to it's GPU's and diversify into another area entirely in storage acceleration it could even be used for path tracing storage on either the CPU cache or the SSD. It free's up overhead to the rest of the system as well the OS the CPU the memory and the storage become less strained from the offloading impact of it.
 
Joined
Dec 13, 2019
Messages
47 (0.03/day)
SSDs are huge and cheap, why not just put uncompressed (or less compressed) data there? Even if a game were to use maybe 200 or 300 GB, I would prefer that to the load times of Wasteland 3.... I dont have 10 Games installed at every moment, so i could allocate SSD space for the 1-3 games that I am actually playing at the moment.

Ages ago every game would let you choose how much of the installation you wanted to put on HDD and how much would be left of the CD/DVD. Why not add an option to chose compression level of stored data?
I have many games installed. SSDs are expensive and you cannot have enough storage.
 
Joined
Feb 18, 2009
Messages
1,825 (0.32/day)
Location
Slovenia
System Name Multiple - Win7, Win10, Kubuntu
Processor Intel Core i7 3820 OC@ 4.0 GHz
Motherboard Asus P9X79
Cooling Noctua NH-L12
Memory Corsair Vengeance 32GB 1333MHz
Video Card(s) Sapphire ATI Radeon RX 480 8GB
Storage Samsung SSD: 970 EVO 1TB, 2x870 EVO 250GB,860 Evo 250GB,850 Evo 250GB, WD 4x1TB, 2x2TB, 4x4TB
Display(s) Asus PB328Q 32' 1440p@75hz
Case Cooler Master CM Storm Trooper
Power Supply Corsair HX750, HX550, Galaxy 520W
Mouse Multiple, Razer Mamba Elite, Logitech M500
Keyboard Multiple - Lenovo, HP, Dell, Logitech
This seems redundant with 8, 12 and 16 core cpus.

16 CPUs seems a lot , but games have 10 busy threads these days, what about streaming, recording, doing some other stuff, IO would bog down 4 cores out of that, it's a waste where GPUs can do it much better and more direct.

The point of this is that it bypasses the CPU, it had to go all into the CPU, RAM, and circuitry travel distance, a whole deoutor before it went to the GPU, now all of that is bypassed.
 
Last edited:

nagmat

New Member
Joined
Sep 6, 2020
Messages
1 (0.00/day)
Where can I find detailed information(repository) or source code about how it works algorithmically?
 
Joined
Oct 12, 2019
Messages
128 (0.07/day)
Making their GPUs do their own decompression isn't exactly a dubious claim, I mean, MS has made a new API for it (well, for hardware accelerated decompression to offload the CPU) that is also used on the upcoming Xboxes. As such it is likely to be a standard feature of many games in a couple of years. Ray tracing sound is of course a bit weird, but it makes sense - sound waves propagate in a way that can be simplified down to a collection of rays, including bouncing off surfaces and the like, though of course this behaviour (as well as bending around corners etc.) is different than how light rays behave, and it will always be a simplification rather than a simulation of how real sound waves work. It's still a technique that can allow for far more realistic spatial audio than we have today. As for Nvidia misrepresenting their RTRT implementation, I'll leave it to you to flesh that out, as IMO it's pretty clear that what we have currently is relatively low-performance and not really suited to fully path traced graphics, but well suited for low bounce count light effects like reflections and global illumination. Even for that performance still needs to increase, but it's passable as a first-generation solution of something that was previously entirely possible (non-real time RT has of course been around for ages). Nvidia does like to sell RTX as the second coming of raptor Jesus, but it's not like they're saying rasterization is dead or fully path traced games are going to be the standard from now on.

Hey, however nice I try to be - I seem to get a "sensei" here. There is very little I should "flesh out" about ray tracing and rendering in general, I'm connected closely with the damned thing over 3 decades. Pretty much everything was fleshed out long time ago...

Seriously, try to be less patronizing... Especially when making uncalled-for replies to people who know much more about the subject than you do...
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Hey, however nice I try to be - I seem to get a "sensei" here. There is very little I should "flesh out" about ray tracing and rendering in general, I'm connected closely with the damned thing over 3 decades. Pretty much everything was fleshed out long time ago...

Seriously, try to be less patronizing... Especially when making uncalled-for replies to people who know much more about the subject than you do...
All I'm saying is that you're making some claims here that you're not backing up with anything of substance, beyond alluding to experience as if that explains anything. I'm not doubting your experience, nor the value of it - not whatsoever - but all that tells us is that you ought to know a lot about this, not what you know. Because that is what I'm asking for here: an explanation of what you are saying. I'm asking you to present your points. I'm not contesting your claims (well, you could say that about decompression and RT audio, but you don't seem interested in discussing those), but you said something to the effect of current RT being complete garbage, which... well, needs fleshing out. How? Why? On what level? I mean, sure, we've all seen the examples of terrible reflection resolution in BF1 etc., but it seems you are claiming quite a bit more than that - though again, it's hard to judge going by your vague wording. So maybe try not to be insulted when someone asks you to flesh out your claims, and instead ... do so? Share some of that knowledge that you - rather patronizingly, I might add - claim that I should accept on blind faith? I think we're both talking past each other quite a bit here, but as far as I understand the general attitude on any discussion forum, making a vague claim - especially one backed by another claim of expert knowledge - and then refusing to go beyond this vagueness is a rather impolite thing to do. You're welcome to disagree, and I'll be happy to leave it at that, but the ball is firmly in your court.

I was also interested in seeing you argue your points about both the two first points in the post you quoted (about GPU-accelerated decompression and "RT" audio), but again you don't seem to have come here to have any kind of exchange of opinions or knowledge. Which is really too bad.

Where can I find detailed information(repository) or source code about how it works algorithmically?
I would assume on Nvidia's internal servers and the work computers of their employees, and nowhere else. Nvidia doesn't tend to be very open source-oriented.
 
Joined
Apr 12, 2013
Messages
7,529 (1.77/day)
Well here's a food for thought ~ has anyone tried NVMe 4.0 drives with Directstorage, or RTX IO, & seen how PCIE 3.0 would be a limiting factor in it? I do believe if this works like the way PS5 demos have shown PCIe 3.0 could be a major bottleneck, mainly on Intel, especially a year or two down the line with AAA titles!
 
Top