Tuesday, February 18th 2025

AMD Ryzen AI Max+ "Strix Halo" Die Exposed and Annotated
AMD's "Strix Halo" APU, marketed as Ryzen AI Max+, has just been exposed in die-shot analysis. Confirming the processor's triple-die architecture, the package showcases a total silicon footprint of 441.72 mm² that integrates advanced CPU, GPU, and AI acceleration capabilities within a single package. The processor's architecture centers on two 67.07 mm² CPU CCDs, each housing eight Zen 5 cores with a dedicated 8 MB L2 cache. A substantial 307.58 mm² I/O complements these die that houses an RDNA 3.5-based integrated GPU featuring 40 CUs and AMD's XDNA 2 NPU. The memory subsystem demonstrates a 256-bit LPDDR5X interface capable of delivering 256 GB/s bandwidth, supported by 32 MB of strategically placed Last Level Cache to optimize data throughput.
The die shots reveal notable optimizations for mobile deployment, including shortened die-to-die interfaces that reduce the interconnect distance by 2 mm compared to desktop implementations. Some through-silicon via structures are present, which suggest potential compatibility with AMD's 3D V-Cache technology, though the company has not officially confirmed plans for such implementations. The I/O die integrates comprehensive connectivity options, including PCIe 4.0 x16 lanes and USB4 support, while also housing dedicated media engines with full AV1 codec support. Initial deployments of the Strix Halo APU will commence with the ASUS ROG Flow Z13 launch on February 25, marking the beginning of what AMD anticipates will be broad adoption across premium mobile computing platforms.
Sources:
Tony Yu on Bilibili, Kurnal on X, via Tom's Hardware
The die shots reveal notable optimizations for mobile deployment, including shortened die-to-die interfaces that reduce the interconnect distance by 2 mm compared to desktop implementations. Some through-silicon via structures are present, which suggest potential compatibility with AMD's 3D V-Cache technology, though the company has not officially confirmed plans for such implementations. The I/O die integrates comprehensive connectivity options, including PCIe 4.0 x16 lanes and USB4 support, while also housing dedicated media engines with full AV1 codec support. Initial deployments of the Strix Halo APU will commence with the ASUS ROG Flow Z13 launch on February 25, marking the beginning of what AMD anticipates will be broad adoption across premium mobile computing platforms.
27 Comments on AMD Ryzen AI Max+ "Strix Halo" Die Exposed and Annotated
It looks like they are trialing this interconnect on Strix Halo and they will implement it across Zen6 chiplets and IOD as a new high-bandwidth, low latency and high-efficiency interconnect standard. Quite exciting, indeed.
Some modern GPUs have USB-C port as an interface for DP video signal, and not for carrying USB or PCIe data.
On Strix Halo IOD, USB3 and USB4 PHY are additional pieces of logic, as well as NPU, which are not present on GPU die.
It took them four iterations to come to this maximum size design for the package used. It's better to have such product in the market than wait another year or so for yet another lab chip iteration to be perfected. It's more practical the way it is. Final MI300 is CoWoS-S, though they did have CoWoS-R as test chip. Navi 31 uses InFO-R/oS packaging with 4 RDL layers.
semianalysis.com/2023/06/12/amd-mi300-taming-the-hype-ai-performance/
For Strix Halo, we don't know, until we know. It's either InFO-R or InFO-L, or another interation.
It seems to me that it "needs" to be there only due to Microsoft's commercial tantrum.
Do you technically need NPU capabilities? Sure don't. Just like you technically don't need RT capabilities to run RT stuff. You can do it all on a GPU with RT cores or even a CPU. It'll just be slower and use more energy. NPU can do something the CPU could do, but it'll do it much faster and use a lot less power. It's kinda why general processors can have multiple fixed function processors/engines on the same package.
AMD's software stack is still lacking when it comes to using their GPUs for ML acceleration, I guess that's why they are shoving an NPU in this product.
Most NPUs in computers don't do anything because almost nothing uses them. Even Copilot, which is what these NPUs are supposed to be for, doesn't use them (yet?).
NPUs are useful for some AI productivity tasks, but most people who need powerful AI accelerators for productivity buy graphics cards with more powerful capabilities than any integrated NPU.
A typical user's AI needs are more efficiently handled by cloud services which can dynamically distribute load across thousands of NPUs/TPUs, not by having their own NPU that does literally nothing 99%+ of the time.
Where are NPUs being used for power management, and what does an NPU do in those situations that can't be done by a basic microcontroller?
Maybe NPUs will end up great in the long run, but it would require a major shift in both how people use their computers and how software uses NPU hardware. I'm expecting it will end up the same way as "smart"/IOT devices, and that these NPUs in everything will be about as useful as wifi connections for fridges and toasters - maybe useful for a few people with specialised needs, but stupid and irrelevant for everyone else.
One example of more flexible data flow through NPU
For example, during an average Zoom call in future version of it, NPU will handle denoising, improving picture and sound. It will be a better experience, but you will never think it's because there is NPU. Hence this impression that NPU does not do anything obvious and tangible. It will be also taking care of background things that we usually do not think about.
But the performance of just running the game does not improve if you suddenly plug an NPU in there. NPUs are meant for inference of models, period. Not "some tasks", only model inference, which there aren't many in the desktop world that you run locally at the moment.
If your application doesn't make use of such, there's nothing to be gained.
Again, within Intel's example it did not boost the game performance. They first hindered the game's performance by running a new task that used the GPU, and then offloaded it later. Similar fashion to people moving streaming stuff into a different device, it doesn't increase performance per se, just offloads stuff, and given that you added a new piece of hardware, this is kinda expected to happen. It would not help, but rather hinder the performance if you have a dedicated GPU.
Reminder that NPUs can only run stuff in either INT4 or INT8 at best, and they are tied to the CPU memory, which limits its performance nonetheless. The examples you gave are still related to model inference. They run a model to do such tasks, in a similar fashion to your example of webcam effects and whatnot.
Model inference can mean lots of things, not only LLMs.
But yeah, not many desktop stuff making use of such. On mobiles you have tons of smaller models running on-device already, from image gallery stuff, to camera processing, to audio transcription. I'm not sure if those will even make sense on a desktop, but we shall see.