Monday, September 23rd 2024

AMD Ryzen AI Max 390 "Strix Halo" Surfaces in Geekbench AI Benchmark

In case you missed it, AMD's new madcap enthusiast silicon engineering effort, the "Strix Halo," is real, and comes with the Ryzen AI Max 300 series branding. These are chiplet-based mobile processors with one or two "Zen 5" CCDs—same ones found in "Granite Ridge" desktop processors—paired with a large SoC die that has an oversized iGPU. This arrangement lets AMD give the processor up to 16 full-sized "Zen 5" CPU cores, and an iGPU with as many as 40 RDNA 3.5 compute units (2,560 stream processors), and a 256-bit LPDDR5/x memory interface for UMA.

"Strix Halo" is designed for ultraportable gaming notebooks or mobile workstations where low PCB footprint is of the essence, and discrete GPU is not an option. For enthusiast gaming notebooks with discrete GPUs, AMD is designing the "Fire Range" processor, which is essentially a mobile BGA version of "Granite Ridge," and a successor to the Ryzen 7045 series "Dragon Range." The Ryzen AI Max series has three models based on CPU and iGPU CU counts—the Ryzen AI Max 395+ (16-core/32-thread with 40 CU), the Ryzen AI Max 390 (12-core/24-thread with 40 CU), and the Ryzen AI Max 385 (8-core/16-thread, 32 CU). An alleged Ryzen AI Max 390 engineering sample surfaced on the Geekbench AI benchmark online database.
The online database entry for this Geekbench AI benchmark submission mentions a processor that identifies itself as "AMD Eng Sample: 100-000001421-50_Y," which corresponds with the Ryzen AI Max 390 (12-core/24-thread, 40 CU). The processor has a CPU base frequency of 3.20 GHz, and maximum boost frequency of 5.00 GHz, at least for this engineering sample (the retail chip could differ). This processor is driving a prototype HP ZBook Ultra 14 G1a mobile workstation, and is wired to 64 GB of memory.

The processor yielded a single-precision Geekbench AI score of 4733 points, half-precision score of 4944 points, and quantized score of 13944 points. HotHardware notes that this is a rather large 60% delta with the desktop Ryzen 9 9900X processor. There could be several reasons behind this. The screenshot shows that the notebook is running on a Balanced power plan; and the benchmark uses 256-bit AVX2 SIMD instructions, and not the newer AVX512. The "Zen 5" cores on the "Strix Halo" are carried over from "Granite Ridge" and EPYC "Turin," since they're the same 8-core CCD, and feature full 512-bit FP data-paths. This is unlike the "Zen 5" cores on the "Strix Point" monolithic silicon, which are restricted to a dual-pumped 256-bit FP data-path even when executing AVX512 or VNNI instructions. Therefore, AI benchmarks that use AVX512/VNNI could yield different results. Then there's the fact that this is an engineering sample, and AMD could be deliberately nerfing its performance.
Sources: Geekbench Browser, HotHardware
Add your own comment

14 Comments on AMD Ryzen AI Max 390 "Strix Halo" Surfaces in Geekbench AI Benchmark

#2
mikesg
How to throw away the best product your company has had in years, call it "AI Max".
Posted on Reply
#3
zeljans
I am excited to see this chips in small form factor PC.
Something like Minisforum or Beelink.
Just not to be slow in the rollout, 9 months later.

Presumably it will cost a lot, 16 core CPU + GPU + fast integrated ram, cooling, + ~220wats power adapter. 1600$ easy.

Mac Studio replacement.
Posted on Reply
#4
Caring1
mikesgHow to throw away the best product your company has had in years, call it "AI Max".
Would you rather Ai Janet?
Posted on Reply
#5
AVATARAT
This is interesting for gaming as the communication between CPU/GPU/RAM will be very low latency.
Posted on Reply
#6
_JP_
mikesgHow to throw away the best product your company has had in years, call it "AI Max".
It makes the investors/shareholders happy, so why not?
For the average Joe it's still about "My budget is X".
Posted on Reply
#7
SL2
mikesgHow to throw away the best product your company has had in years, call it "AI Max".
No one cares, give it a rest.

In other news: Gigabyte PSU explodes, Windows Vista RTM is a resource hog, some people used drugs at the Woodstock festival 1969, and the Anglo-Saxon king Harold Godwinson have died at the battle of Hastings.
Posted on Reply
#8
igormp
AVATARATThis is interesting for gaming as the communication between CPU/GPU/RAM will be very low latency.
LPDDR5 memory actually has higher latencies compared to your regular DDR5 sticks.

That CPU is also chiplet-based, so you have your regular desktop CCDs communicating with the IO Die in a similar fashion to Ryzen 9000.
Posted on Reply
#9
trsttte
igormpLPDDR5 memory actually has higher latencies compared to your regular DDR5 sticks.

That CPU is also chiplet-based, so you have your regular desktop CCDs communicating with the IO Die in a similar fashion to Ryzen 9000.
Higher latency but bigger bandwidth. Small enough latency for general computing and boosted bandwidth so those 40 compute units don't go to waste.
Posted on Reply
#10
JWNoctis
igormpLPDDR5 memory actually has higher latencies compared to your regular DDR5 sticks.

That CPU is also chiplet-based, so you have your regular desktop CCDs communicating with the IO Die in a similar fashion to Ryzen 9000.
Doesn't that also imply a 64GB/s bandwidth limit between each CCD and the IOD, if they keep the same IF architecture there? That adds up to much less than half the theoretical bandwidth of a 256-bit LPDDR5 interface, although that is probably still a long way above current AM5 offerings, provided there are no bottlenecks elsewhere.

Either way, iGPU/NPU offload is definitely going to be needed for the workload it is expected to do.

Mobile chips starting to outstrip non-HEDT desktop CPU performance. What a world we live in.
Posted on Reply
#11
igormp
trsttteHigher latency but bigger bandwidth. Small enough latency for general computing and boosted bandwidth so those 40 compute units don't go to waste.
The gpu is going to love it for sure. The cpu, however, not that much (but it's far from awful, don't get me wrong).
JWNoctisDoesn't that also imply a 64GB/s bandwidth limit between each CCD and the IOD, if they keep the same IF architecture there? That adds up to much less than half the theoretical bandwidth of a 256-bit LPDDR5 interface, although that is probably still a long way above current AM5 offerings, provided there are no bottlenecks elsewhere.

Either way, iGPU/NPU offload is definitely going to be needed for the workload it is expected to do.

Mobile chips starting to outstrip non-HEDT desktop CPU performance. What a world we live in.
Yeah, that's likely going to be the case. But I believe that high bandwidth is meant to keep the gpu/npu fed (as our colleague said above), and not the CPU itself, so it still makes sense if that's indeed the case.
Posted on Reply
#12
AVATARAT
igormpLPDDR5 memory actually has higher latencies compared to your regular DDR5 sticks.

That CPU is also chiplet-based, so you have your regular desktop CCDs communicating with the IO Die in a similar fashion to Ryzen 9000.
Yes it is, but all the components and memory are much closer than on a normal system.
Posted on Reply
#13
igormp
AVATARATYes it is, but all the components and memory are much closer than on a normal system.
I don't think strix halo will have on-package ram (like apple chips or Lunar lake), just soldered ram, so it's not that close.

Anyhow, even though it's closer, latencies are still bad. Just take a look at apple chips, even with the memory on the package, its latencies are still in the 100s (worse than even Ryzen desktop).
Posted on Reply
#14
JWNoctis
AVATARATYes it is, but all the components and memory are much closer than on a normal system.
Between CPU and GPU since the memory is unified, and between GPU and RAM since GDDR6/X had greater bandwidth but even worse latencies, yes.
Posted on Reply
Add your own comment
Sep 26th, 2024 22:10 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts