- Joined
- Feb 21, 2006
- Messages
- 2,284 (0.33/day)
- Location
- Toronto, Ontario
System Name | The Expanse |
---|---|
Processor | AMD Ryzen 7 5800X3D |
Motherboard | Asus Prime X570-Pro BIOS 5013 AM4 AGESA V2 PI 1.2.0.Cc. |
Cooling | Corsair H150i Pro |
Memory | 32GB GSkill Trident RGB DDR4-3200 14-14-14-34-1T (B-Die) |
Video Card(s) | XFX Radeon RX 7900 XTX Magnetic Air (24.12.1) |
Storage | WD SN850X 2TB / Corsair MP600 1TB / Samsung 860Evo 1TB x2 Raid 0 / Asus NAS AS1004T V2 20TB |
Display(s) | LG 34GP83A-B 34 Inch 21: 9 UltraGear Curved QHD (3440 x 1440) 1ms Nano IPS 160Hz |
Case | Fractal Design Meshify S2 |
Audio Device(s) | Creative X-Fi + Logitech Z-5500 + HS80 Wireless |
Power Supply | Corsair AX850 Titanium |
Mouse | Corsair Dark Core RGB SE |
Keyboard | Corsair K100 |
Software | Windows 10 Pro x64 22H2 |
Benchmark Scores | 3800X https://valid.x86.fr/1zr4a5 5800X https://valid.x86.fr/2dey9c 5800X3D https://valid.x86.fr/b7d |
I'm not sure they will do 48GB on consumer just yet as that will give the W7800 and W7900 workstation gpu's some competition but we shall see.With the 5090, NV has released a 32GB VRAM consumer GPU, ofc AMD is going to do the same (wasn't it the same with 24GB VRAM consumer GPUs?). The difference is the 9070 (XT) is based on a 256-bit chip using GDDR6 ~600 GB/s vs 512-bit GDDR7 1792 GB/s for the 5090. Still fast enough.
AFAIK, only modded games may require more than 24GB VRAM in 4K right now, but 32GB are nice for fully offloading/hosting big-ish LLMs locally.
Regarding CUDA/ML stack, indeed, I think of AMD GPUs only in terms of running/inferencing LLMs, not training/finetuning, but I read it's still possible and supposedly got easier over the last years, but CUDA is tier agnostic and supports consumer GPUs, workstation GPUs and enterprise cards. To improve this, UDNA (U for unified) will replace RDNA at some point.
5090' idle power consumption unfortunately increased to 30W (4090 22W), but it's still not too bad (it's more than linear in video playback: 54W 5090 vs 26W 4090) considering there are 16 2GB modules (linear increase: 22W[4090]/12[GDDR6X]*16[GDDR7] = 29.33W).
For me to consider this RDNA4 32GB GPU (in no particular order):
So, AMD, it's 48GB VRAM consumer GPUs for the UDNA arch after RDNA4 then as well? Would allow to fully offload `Llama-3.3-70B-Instruct-Q4_K_M.gguf` (42.5GB) (by then we will have a different and more capable 70B LLM, ofc), or allow for much higher context.
- DLSS 2-like upscaling quality improvement
- Fix HDMI 2.1 48GB/s, aka HDMI 2.1a on Linux
- Back to good power scaling like in RDNA2
- Low idle power consumption, linear increase with the amount of VRAM compared to the 16GB VRAM, at the worst
- Just like the 5090, 9070 (XT) 32GB also must be a consumer GPU, so that the price increase is minimal
Yes, 24GB VRAM can't fit a e.g 27GB `Qwen2.5-32B-Instruct-Q6_K.gguf` SOTA LLM, but the .gguf format allows to offload the rest of the LLM layers to RAM, but it will run much slower. The tokens per second speed increases exponentially the more layers are offloaded to the GPU, I did some testing:
View attachment 384684
Right now this model gives me the best performance on a 24GB VRAM gpu
Doing about 28 tok / sec
Looks like this rumor is false.
Last edited: