Wednesday, January 29th 2025

AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

AMD today put out detailed guides on how to get DeepSeek R1 distilled reasoning models to run on Radeon RX graphics cards and Ryzen AI processors. The guide confirms that the new Ryzen AI Max "Strix Halo" processors come in hardwired to LPCAMM2 memory configurations of 32 GB, 64 GB, and 128 GB, and there won't be a 16 GB memory option for notebook manufacturers to cheap out with. The guide goes on to explain that "Strix Halo" will be able to locally accelerate DeepSeek-R1-Distill-Llama with 70 billion parameters on the 64 GB and 128 GB memory configurations of "Strix Halo" powered notebooks, while the 32 GB model should be able to run DeepSeek-R1-Distill-Qwen-32B. Ryzen AI "Strix Point" mobile processors should be capable of running DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Llama-14B on their RDNA 3.5 iGPUs and NPUs. Meanwhile, older generation processors based on "Phoenix Point" and "Hawk Point" chips should be capable of DeepSeek-R1-Distill-Llama-14B. The company recommends running all of the above distills in Q4 K M quantization.

Switching gears to the discrete graphics cards, and AMD is only recommending its Radeon RX 7000 series for now, since the RDNA 3 graphics architecture introduces AI accelerators. The flagship Radeon RX 7900 XTX is recommended for DeepSeek-R1-Distill-Qwen-32B distill, while all SKUs with 12 GB to 20 GB of memory—that's RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 GRE, and RX 7900 XT, are recommended till DeepSeek-R1-Distill-Qwen-14B. The mainstream RX 7600 with its 8 GB memory is only recommended till DeepSeek-R1-Distill-Llama-8B. You will need LM Studio 0.3.8 or later and Radeon Software Adrenalin 25.1.1 beta or later drivers. AMD put out first party LMStudio 0.3.8 tokens/second performance numbers for the RX 7900 XTX, comparing it with the NVIDIA GeForce RTX 4080 SUPER and the RTX 4090.
When compared to the RTX 4080 SUPER, the RX 7900 XTX posts up to 34% higher performance with DeepSeek-R1-Distill-Qwen-7B, up to 27% higher performance with DeepSeek-R1-Distill-Llama-8B, and up to 22% higher performance with DeepSeek-R1-Distill-Qwen-14B. Next up, the big face-off between the RX 7900 XTX and the GeForce RTX 4090 with its 24 GB of memory. The RX 7900 XTX is shown to prevail in 3 out of 4 tests, posting up to 13% higher performance with DeepSeek-R1-Distill-Qwen-7B, up to 11% higher performance with DeepSeek-R1-Distill-Llama-8B, and up to 2% higher performance with DeepSeek-R1-Distill-Qwen-14B. It only falls behind the RTX 4090 by 4% with the larger DeepSeek-R1-Distill-Qwen-32B model.

Catch the step-by-step guide on getting DeepSeek R1 disrilled reasoning models to run on AMD hardware in the source link below.
Source: AMD Community
Add your own comment

28 Comments on AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

#26
Beermotor
hatyiiSo if the 7900 XTX is faster for AI than the 4090 and AMD mentions that RDNA3 specifically can run this model well because of hardware advantages over RDNA2, explain to me why is the new FSR version was supposed to be exclusive to their new GPUs? I mean even an RTX 2000 GPU can benefit of DLSS, so I'm just confused about these stuff.
IIRC RDNA3 has a lot higher throughput in some floating point formats (e.g. FP32) than Lovelace and vice versa for other formats.

I'm not sure what DLSS/FSR use and don't care.
Posted on Reply
#27
Vayra86
alwaysstsI think N41 (partially) got canned because they know once people have >80TF and 24GB (essentially a 4090) most ain't upgrading for a long-long time. Those that wanted that at >$1000+ bought a 4090.
Cutting the price of 4080 from $1200 to $1000 probably also had something to do with it, as I think that's where AMD wanted to compete.
Similar reason for the gap in nV products. Why GB203 limited to <80TF (1 less cluster than half GB202 + PL locks) and doesn't have a 24GB option. Gotta milk needing those upgrades as long as possible...
Very good points
Posted on Reply
#28
10tothemin9volts
The DeepSeek-R1-Distill-* models are not the real DeepSeek, it's DeepSeek-R1 (without the Distill in its name) and it's 685B parameters. You can run quants and the smallest one ("IQ1_S") requires around 134GB of memory at the minimum, but some say IQ2_* quants are recommened at the minimum, which is 183GB. If Strix Halo only has 128 GB (quad channel) RAM, then that's not enough and it really should have 256 GB.
Posted on Reply
Add your own comment
Jan 30th, 2025 20:39 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts