• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon RX 9070 XT Could Get a 32 GB GDDR6 Upgrade

Joined
Feb 21, 2006
Messages
2,284 (0.33/day)
Location
Toronto, Ontario
System Name The Expanse
Processor AMD Ryzen 7 5800X3D
Motherboard Asus Prime X570-Pro BIOS 5013 AM4 AGESA V2 PI 1.2.0.Cc.
Cooling Corsair H150i Pro
Memory 32GB GSkill Trident RGB DDR4-3200 14-14-14-34-1T (B-Die)
Video Card(s) XFX Radeon RX 7900 XTX Magnetic Air (24.12.1)
Storage WD SN850X 2TB / Corsair MP600 1TB / Samsung 860Evo 1TB x2 Raid 0 / Asus NAS AS1004T V2 20TB
Display(s) LG 34GP83A-B 34 Inch 21: 9 UltraGear Curved QHD (3440 x 1440) 1ms Nano IPS 160Hz
Case Fractal Design Meshify S2
Audio Device(s) Creative X-Fi + Logitech Z-5500 + HS80 Wireless
Power Supply Corsair AX850 Titanium
Mouse Corsair Dark Core RGB SE
Keyboard Corsair K100
Software Windows 10 Pro x64 22H2
Benchmark Scores 3800X https://valid.x86.fr/1zr4a5 5800X https://valid.x86.fr/2dey9c 5800X3D https://valid.x86.fr/b7d
With the 5090, NV has released a 32GB VRAM consumer GPU, ofc AMD is going to do the same (wasn't it the same with 24GB VRAM consumer GPUs?). The difference is the 9070 (XT) is based on a 256-bit chip using GDDR6 ~600 GB/s vs 512-bit GDDR7 1792 GB/s for the 5090. Still fast enough.

AFAIK, only modded games may require more than 24GB VRAM in 4K right now, but 32GB are nice for fully offloading/hosting big-ish LLMs locally.

Regarding CUDA/ML stack, indeed, I think of AMD GPUs only in terms of running/inferencing LLMs, not training/finetuning, but I read it's still possible and supposedly got easier over the last years, but CUDA is tier agnostic and supports consumer GPUs, workstation GPUs and enterprise cards. To improve this, UDNA (U for unified) will replace RDNA at some point.

5090' idle power consumption unfortunately increased to 30W (4090 22W), but it's still not too bad (it's more than linear in video playback: 54W 5090 vs 26W 4090) considering there are 16 2GB modules (linear increase: 22W[4090]/12[GDDR6X]*16[GDDR7] = 29.33W).

For me to consider this RDNA4 32GB GPU (in no particular order):
  • DLSS 2-like upscaling quality improvement
  • Fix HDMI 2.1 48GB/s, aka HDMI 2.1a on Linux
  • Back to good power scaling like in RDNA2
  • Low idle power consumption, linear increase with the amount of VRAM compared to the 16GB VRAM, at the worst
  • Just like the 5090, 9070 (XT) 32GB also must be a consumer GPU, so that the price increase is minimal
So, AMD, it's 48GB VRAM consumer GPUs for the UDNA arch after RDNA4 then as well? Would allow to fully offload `Llama-3.3-70B-Instruct-Q4_K_M.gguf` (42.5GB) (by then we will have a different and more capable 70B LLM, ofc), or allow for much higher context.


Yes, 24GB VRAM can't fit a e.g 27GB `Qwen2.5-32B-Instruct-Q6_K.gguf` SOTA LLM, but the .gguf format allows to offload the rest of the LLM layers to RAM, but it will run much slower. The tokens per second speed increases exponentially the more layers are offloaded to the GPU, I did some testing:
View attachment 384684
I'm not sure they will do 48GB on consumer just yet as that will give the W7800 and W7900 workstation gpu's some competition but we shall see.

Right now this model gives me the best performance on a 24GB VRAM gpu

1739465206870.png


Doing about 28 tok / sec

1739465347775.png


Looks like this rumor is false.

1739476291342.png
 
Last edited:
Joined
Mar 12, 2024
Messages
82 (0.24/day)
System Name SOCIETY
Processor AMD Ryzen 9 7800x3D
Motherboard MSI MAG X670E TOMAHAWK
Cooling Arctic Liquid Freezer II 420
Memory 64GB 6000mhz
Video Card(s) Nvidia RTX 3090
Storage WD SN850X 4TB, Micron 1100 2TB, ZFS NAS over 10gbe network
Display(s) 27" Dell S2721DGF, 24" ASUS IPS, 24" Dell IPS
Case Corsair 750D
Power Supply Cooler Master 1200W Gold
Mouse Razer Deathadder
Keyboard ROG Falchion
VR HMD Pimax 8KX
Software Windows 10 with Debian VM
AFAIK, only modded games may require more than 24GB VRAM in 4K right now, but 32GB are nice for fully offloading/hosting big-ish LLMs locally.

Regarding CUDA/ML stack, indeed, I think of AMD GPUs only in terms of running/inferencing LLMs, not training/finetuning, but I read it's still possible and supposedly got easier over the last years, but CUDA is tier agnostic and supports consumer GPUs, workstation GPUs and enterprise cards. To improve this, UDNA (U for unified) will replace RDNA at some point.

5090' idle power consumption unfortunately increased to 30W (4090 22W), but it's still not too bad (it's more than linear in video playback: 54W 5090 vs 26W 4090) considering there are 16 2GB modules (linear increase: 22W[4090]/12[GDDR6X]*16[GDDR7] = 29.33W).

For me to consider this RDNA4 32GB GPU (in no particular order):
  • DLSS 2-like upscaling quality improvement
  • Fix HDMI 2.1 48GB/s, aka HDMI 2.1a on Linux
  • Back to good power scaling like in RDNA2
  • Low idle power consumption, linear increase with the amount of VRAM compared to the 16GB VRAM, at the worst
  • Just like the 5090, 9070 (XT) 32GB also must be a consumer GPU, so that the price increase is minimal
So, AMD, it's 48GB VRAM consumer GPUs for the UDNA arch after RDNA4 then as well? Would allow to fully offload `Llama-3.3-70B-Instruct-Q4_K_M.gguf` (42.5GB) (by then we will have a different and more capable 70B LLM, ofc), or allow for much higher context.


Yes, 24GB VRAM can't fit a e.g 27GB `Qwen2.5-32B-Instruct-Q6_K.gguf` SOTA LLM, but the .gguf format allows to offload the rest of the LLM layers to RAM, but it will run much slower. The tokens per second speed increases exponentially the more layers are offloaded to the GPU, I did some testing:
Yeah spot on.
Regarding VRAM and games: I think its hard to get games to use above 16GB but the one exception similar to what you said with mods, is VRChat. This game is quite unusual in that you're seeing user uploaded unity assets and as a result the optimization is horrible, a tragedy of the commons situation in which not enough individuals optimize their assets so each avatar could be as bad as 500MB of vram... so if you go to a populated room, there is no real upper bound for how much VRAM you'd like! And, as that's a VR game, displayport 2 is a must for future proof because current-gen VR headsets already saturate what DP 1.4 can do.

Regarding VRAM and AI: If you train a LoRA for SD XL, you'll already cross the 16GB boundary. SD 3 and Flux are going to be worse. Training speed isn't really an issue here, just vram.
Inference as you say, doesn't need hardly as much vram, so the 9070xt will shine with even less VRAM.

But as a competitor to say a 5080, the 5080 just doesnt have enough vram. I think 24GB is enough, 32 is a bonus, but 16 just isn't enough for these admittedly obscure tasks.
 
Joined
May 17, 2021
Messages
3,474 (2.54/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
not sure that cards need it, it's mostly to serve the AI crowd, grab more money, less gpus available for gamers
 
Joined
Apr 30, 2011
Messages
2,734 (0.54/day)
Location
Greece
Processor AMD Ryzen 5 5600@80W
Motherboard MSI B550 Tomahawk
Cooling ZALMAN CNPS9X OPTIMA
Memory 2*8GB PATRIOT PVS416G400C9K@3733MT_C16
Video Card(s) Sapphire Radeon RX 6750 XT Pulse 12GB
Storage Sandisk SSD 128GB, Kingston A2000 NVMe 1TB, Samsung F1 1TB, WD Black 10TB
Display(s) AOC 27G2U/BK IPS 144Hz
Case SHARKOON M25-W 7.1 BLACK
Audio Device(s) Realtek 7.1 onboard
Power Supply Seasonic Core GC 500W
Mouse Sharkoon SHARK Force Black
Keyboard Trust GXT280
Software Win 7 Ultimate 64bit/Win 10 pro 64bit/Manjaro Linux
Smart move if they manage to offer that 32GB iteration as a workstation GPU in order to lighten gamers' GPUs series demand from whoever (non-gamer) needs more RAM for apps. And they will be able to sell them for higher profit margins that should allow AMD to keep the gamers' GPUs in normal pricing.
 
Joined
Jan 14, 2019
Messages
14,378 (6.47/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case It's not about size, but how you use it
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
Look at the update, guys. There's no 32 GB gamer card, but maybe a Radeon Pro workstation card coming later.
 
Top