Is there anway to make use of unused VRAM?

Mussels · May 21, 2023

Gica said:
As you can see, the processor cannot access VRAM directly. Only the GPU can do it and any attempt to use this memory as RAM, by detours, is just fun and useless. Even the igp has its RAM memory reserved, impossible for the CPU to access.

View attachment 296919

That render you've shown isn't an official diagram of how they're actually designed or work

Direct3D12 ultimate and Directstorage are changing how it all works, and much of it is already active.
The CPU can edit content of VRAM, and the GPU can read from NVME. They're all directly linked over PCI-E now.

It's just not that fast at doing so, latencies are good but overall bandwidth isn't
So loading in a new texture to prevent microstutter - hell yes.

As a write cache or something... maybe?
The CUDA version definitely was GPU controlled and had no CPU usage on my testing, but the CPU version of the software would likely have been software driven by the CPU.

Gica · May 21, 2023

Mussels said:
??? I think you've misunderstood how that all works.
Very old image, but a clear example
View attachment 296917

I do not believe.

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

NVIDIA receives a lot of questions about graphics memory, also known as the frame buffer, video memory, or "VRAM", and so with the unveiling of our new GeForce RTX 4060 Family of graphics cards we wanted to share some insights, so gamers can make the best buying decisions for their gaming needs...

www.techpowerup.com

"Each chip uses two separate 16-bit channels to connect to a single 32-bit"

Mussels said:
That render you've shown isn't an official diagram of how they're actually designed or work

Direct3D12 ultimate and Directstorage are changing how it all works, and much of it is already active.
The CPU can edit content of VRAM, and the GPU can read from NVME. They're all directly linked over PCI-E now.

It's just not that fast at doing so, latencies are good but overall bandwidth isn't
So loading in a new texture to prevent microstutter - hell yes.

As a write cache or something... maybe?
The CUDA version definitely was GPU controlled and had no CPU usage on my testing, but the CPU version of the software would likely have been software driven by the CPU.

The idea is that the CPU cannot use the real performance of the VRAM memory.
To simplify: a defender cannot replace the lack of a striker, nor vice versa. Everyone has their role and there they give the maximum yield.

chrcoluk · May 21, 2023

80251 said:
It might be interesting to use a VRAM disk with primocache -- maybe as a multi-gigabyte write cache (to save on writes to SSD's without using up system RAM).

Why would I do that when system ram is plentiful and VRAM is scarce?

Mussels · Jun 1, 2023

Gica said:
I do not believe.

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

NVIDIA receives a lot of questions about graphics memory, also known as the frame buffer, video memory, or "VRAM", and so with the unveiling of our new GeForce RTX 4060 Family of graphics cards we wanted to share some insights, so gamers can make the best buying decisions for their gaming needs...

www.techpowerup.com

"Each chip uses two separate 16-bit channels to connect to a single 32-bit"

The idea is that the CPU cannot use the real performance of the VRAM memory.
To simplify: a defender cannot replace the lack of a striker, nor vice versa. Everyone has their role and there they give the maximum yield.

Bolded line, correct
Underlined line... incorrect. Poor analogy.

The CPU can't use the VRAM fast because it has no direct access, and no high-speed link to it.
The CPU has to tell the GPU what to do, which then uses the VRAM, and then passes the result back.
That's not about somethings role but about an order of events and the speed between each of them in multiple metrics of bandwidth, latency, and processing delays.

As stated, DX12 has updates to change that but it's not used yet and definitely not part of that CUDA VRAM code.

I'm still not sure what you think you're saying with regards to the memory.
They use two 1GB or 2GB modules paired together to connect to the GPUs 32 bit wide memory buses, and then multiply that to reach their desired amount - in this case, 128 bit wide (and 192 bit wide on the previous gen)
I can discuss a lot more of that, but stating random facts and numbers doesn't explain what you mean by any of it.
PCI-E 4.0 x16 has a theoretical max of 32GB/s without any overheads so it could never bypass that, and PCI-E 5.0 GPU's could do 64GB/s... if the GPU architecture could actually provide that level of speed.

this quote sums this thread up nicely

In testing with a variety of games and synthetic benchmarks, the 32 MB L2 cache reduced memory bus traffic by just over 50% on average compared to the performance of a 2 MB L2 cache. See the reduced VRAM accesses in the Ada Memory Subsystem diagram above.

This 50% traffic reduction allows the GPU to use its memory bandwidth 2X more efficiently. As a result, in this scenario, isolating for memory performance, an Ada GPU with 288 GB/sec of peak memory bandwidth would perform similarly to an Ampere GPU with 554 GB/sec of peak memory bandwidth. Across an array of games and synthetic tests, the greatly increased hit rates improve frame rates by up to 34%.

They care about internal speed. Not external.

By using a caching system they managed to get the same gaming performance despite halving the bandwidth - but that VRAM is now halved in speed for something as simple as file transfers on a VRAM drive. That cache would not help that sort of activity, as the entire GPU as a whole was never designed or optimised for it.

Yay, your 128 bit GPU with 288GB/s bandwidth can at most send 64GB/s of that at best to another location of equal speed, which cant exist within the system as that would fill the DRAM in seconds and overwhelm even the fastest NVME drives. You then take into account that various file sizes, types, overheads, compression and so on and even simple things like reading and writing at the same time all slow those values down and it becomes a very poor proposition to use VRAM for anything other than its intended purpose, of being where the GPU stores the code it's crunching away at.

OneMoar · Jun 1, 2023

yes there is:

GitHub - prsyahmi/GpuRamDrive: RamDrive that is backed by GPU Memory

RamDrive that is backed by GPU Memory. Contribute to prsyahmi/GpuRamDrive development by creating an account on GitHub.

github.com

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 4TB WD SA510, 2x 3TB WD Red, 1x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

System Name	RPC MK2.5
Processor	Ryzen 5800x
Motherboard	Gigabyte Aorus Pro V2
Cooling	Thermalright Phantom Spirit SE
Memory	CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s)	GIGABYTE RTX 3070 Ti GAMING OC
Storage	Nextorage NE1N 2TB ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s)	LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case	Phanteks P300 /w 300A front panel conversion
Audio Device(s)	onboard
Power Supply	SeaSonic Focus+ Platinum 750W
Mouse	Kone burst Pro
Keyboard	SteelSeries Apex 7
Software	Windows 11 +startisallback

Is there anway to make use of unused VRAM?

Mussels

Freshwater Moderator

Gica

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

chrcoluk

Mussels

Freshwater Moderator

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

OneMoar

There is Always Moar

GitHub - prsyahmi/GpuRamDrive: RamDrive that is backed by GPU Memory