Tuesday, February 18th 2025

AMD Ryzen AI Max+ "Strix Halo" Die Exposed and Annotated

AMD's "Strix Halo" APU, marketed as Ryzen AI Max+, has just been exposed in die-shot analysis. Confirming the processor's triple-die architecture, the package showcases a total silicon footprint of 441.72 mm² that integrates advanced CPU, GPU, and AI acceleration capabilities within a single package. The processor's architecture centers on two 67.07 mm² CPU CCDs, each housing eight Zen 5 cores with a dedicated 8 MB L2 cache. A substantial 307.58 mm² I/O complements these die that houses an RDNA 3.5-based integrated GPU featuring 40 CUs and AMD's XDNA 2 NPU. The memory subsystem demonstrates a 256-bit LPDDR5X interface capable of delivering 256 GB/s bandwidth, supported by 32 MB of strategically placed Last Level Cache to optimize data throughput.

The die shots reveal notable optimizations for mobile deployment, including shortened die-to-die interfaces that reduce the interconnect distance by 2 mm compared to desktop implementations. Some through-silicon via structures are present, which suggest potential compatibility with AMD's 3D V-Cache technology, though the company has not officially confirmed plans for such implementations. The I/O die integrates comprehensive connectivity options, including PCIe 4.0 x16 lanes and USB4 support, while also housing dedicated media engines with full AV1 codec support. Initial deployments of the Strix Halo APU will commence with the ASUS ROG Flow Z13 launch on February 25, marking the beginning of what AMD anticipates will be broad adoption across premium mobile computing platforms.
Sources: Tony Yu on Bilibili, Kurnal on X, via Tom's Hardware
Add your own comment

27 Comments on AMD Ryzen AI Max+ "Strix Halo" Die Exposed and Annotated

#26
Tek-Check
igormpIt would not help, but rather hinder the performance if you have a dedicated GPU. Reminder that NPUs can only run stuff in either INT4 or INT8 at best, and they are tied to the CPU memory, which limits its performance nonetheless.
Yes, but the set-up of some models in LM Studio allows GPU offload only to a certain degree, such as 65% or 85%, depending on allowed settings, so a part of workloads would still bleed into RAM. It'd be good to see if NPU could help a bit in such situations, for example increase from 20 t/s to 21 t/s. I'd like to see such measurements.
Posted on Reply
#27
igormp
Tek-CheckYes, but the set-up of some models in LM Studio allows GPU offload only to a certain degree, such as 65% or 85%, depending on allowed settings, so a part of workloads would still bleed into RAM.
For bigger models, yeah, true.
Tek-CheckIt'd be good to see if NPU could help a bit in such situations, for example increase from 20 t/s to 21 t/s. I'd like to see such measurements
Again, only for models that are in INT4 or INT8 format, which would mean sacrificing model quality in order to get space savings. But in this specific case, you could indeed get a minor perf increase, albeit a really minimal one as in your example.
Posted on Reply
Add your own comment
May 6th, 2025 16:25 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts