Friday, April 19th 2024
AMD "Strix Halo" Zen 5 Mobile Processor Pictured: Chiplet-based, Uses 256-bit LPDDR5X
Enthusiasts on the ChipHell forum scored an alleged image of AMD's upcoming "Strix Halo" mobile processor, and set out to create some highly plausible schematic slides. These are speculative. While "Strix Point" is the mobile processor that succeeds the current "Hawk Point" and "Phoenix" processors; "Strix Halo" is in a category of its own—to offer gaming experiences comparable to discrete GPUs in the ultraportable form-factor where powerful discrete GPUs are generally not possible. "Strix Halo" also goes head on against Apple's M3 Max and M3 Pro processors powering the latest crop of MacBook Pros. It has the same advantages as a single-chip solution, as the M3 Max.
The "Strix Halo" silicon is a chiplet-based processor, although very different from "Fire Range". The "Fire Range" processor is essentially a BGA version of the desktop "Granite Ridge" processor—it's the same combination of one or two "Zen 5" CCDs that talk to a client I/O die, and is meant for performance-thru-enthusiast segment notebooks. "Strix Halo," on the other hand, use the same one or two "Zen 5" CCDs, but with a large SoC die featuring an oversized iGPU, and 256-bit LPDDR5X memory controllers not found on the cIOD. This is key to what AMD is trying to achieve—CPU and graphics performance in the league of the M3 Pro and M3 Max at comparable PCB and power footprints.The iGPU of the "Strix Halo" processor is based on the RDNA 3+ graphics architecture, and features a massive 40 RDNA compute units. These work out to 2,560 stream processors, 80 AI accelerators, 40 Ray accelerators, 160 TMUs, and an unknown number of ROPs (we predict at least 64). The slide predicts an iGPU engine clock as high as 3.00 GHz.
Graphics is an extremely memory sensitive application, and so AMD is using a 256-bit (quad-channel or octa-subchannel) LPDDR5X-8533 memory interface, for an effective cached bandwidth of around 500 GB/s. The memory controllers are cushioned by a 32 MB L4 cache located on the SoC die. The way we understand this cache hierarchy, the CCDs (CPU cores) can treat this as a victim cache, besides the iGPU treating this like an L2 cache (similar to the Infinite Cache found in RDNA 3 discrete GPUs).
The iGPU isn't the only logic-heavy and memory-sensitive device on the SoC die, there's also a NPU. From what we gather, this is the exact same NPU model found in "Strix Point" processors, with a performance of around 45-50 AI TOPS, and is based on the XDNA 2 architecture developed by AMD's Xilinx team.The SoC I/O of "Strix Halo" isn't as comprehensive as "Fire Range," because the chip has been designed on the idea that the notebook will use its large iGPU. It has PCIe Gen 5, but only a total of 12 Gen 5 lanes—4 toward an M.2 NVMe slot, and 8 to spare for a discrete GPU (if present), although these can be used to connect any PCIe device, including additional M.2 slots. There's also integrated 40 Gbps USB4, and 20 Gbps USB 3.2 Gen 2.
As for the CPU, since "Strix Halo" is using one or two "Zen 5" CCDs, its CPU performance will be similar to "Fire Range." You get up to 16 "Zen 5" CPU cores, with 32 MB of L3 cache per CCD, or 64 MB of total CPU L3 cache. The CCDs are connected to the SoC die either using conventional IFOP (Infinity Fabric over package), just like "Fire Range" and "Granite Ridge," or there's even a possibility that AMD is using Infinity Fanout links like on some of its chiplet-based RDNA 3 discrete GPUs.Lastly, there are some highly speculative performance predictions for the "Strix Halo" iGPU, which puts it competitive to the GeForce RTX 4060M and RTX 4070M.
Sources:
ChipHell Forums, harukaze5719 (Twitter)
The "Strix Halo" silicon is a chiplet-based processor, although very different from "Fire Range". The "Fire Range" processor is essentially a BGA version of the desktop "Granite Ridge" processor—it's the same combination of one or two "Zen 5" CCDs that talk to a client I/O die, and is meant for performance-thru-enthusiast segment notebooks. "Strix Halo," on the other hand, use the same one or two "Zen 5" CCDs, but with a large SoC die featuring an oversized iGPU, and 256-bit LPDDR5X memory controllers not found on the cIOD. This is key to what AMD is trying to achieve—CPU and graphics performance in the league of the M3 Pro and M3 Max at comparable PCB and power footprints.The iGPU of the "Strix Halo" processor is based on the RDNA 3+ graphics architecture, and features a massive 40 RDNA compute units. These work out to 2,560 stream processors, 80 AI accelerators, 40 Ray accelerators, 160 TMUs, and an unknown number of ROPs (we predict at least 64). The slide predicts an iGPU engine clock as high as 3.00 GHz.
Graphics is an extremely memory sensitive application, and so AMD is using a 256-bit (quad-channel or octa-subchannel) LPDDR5X-8533 memory interface, for an effective cached bandwidth of around 500 GB/s. The memory controllers are cushioned by a 32 MB L4 cache located on the SoC die. The way we understand this cache hierarchy, the CCDs (CPU cores) can treat this as a victim cache, besides the iGPU treating this like an L2 cache (similar to the Infinite Cache found in RDNA 3 discrete GPUs).
The iGPU isn't the only logic-heavy and memory-sensitive device on the SoC die, there's also a NPU. From what we gather, this is the exact same NPU model found in "Strix Point" processors, with a performance of around 45-50 AI TOPS, and is based on the XDNA 2 architecture developed by AMD's Xilinx team.The SoC I/O of "Strix Halo" isn't as comprehensive as "Fire Range," because the chip has been designed on the idea that the notebook will use its large iGPU. It has PCIe Gen 5, but only a total of 12 Gen 5 lanes—4 toward an M.2 NVMe slot, and 8 to spare for a discrete GPU (if present), although these can be used to connect any PCIe device, including additional M.2 slots. There's also integrated 40 Gbps USB4, and 20 Gbps USB 3.2 Gen 2.
As for the CPU, since "Strix Halo" is using one or two "Zen 5" CCDs, its CPU performance will be similar to "Fire Range." You get up to 16 "Zen 5" CPU cores, with 32 MB of L3 cache per CCD, or 64 MB of total CPU L3 cache. The CCDs are connected to the SoC die either using conventional IFOP (Infinity Fabric over package), just like "Fire Range" and "Granite Ridge," or there's even a possibility that AMD is using Infinity Fanout links like on some of its chiplet-based RDNA 3 discrete GPUs.Lastly, there are some highly speculative performance predictions for the "Strix Halo" iGPU, which puts it competitive to the GeForce RTX 4060M and RTX 4070M.
109 Comments on AMD "Strix Halo" Zen 5 Mobile Processor Pictured: Chiplet-based, Uses 256-bit LPDDR5X
www.phoronix.com/review/framework-16-windows-linux/6
My wild guess was that developing such chips could put new PCs closer to the performance/cost of consoles, so Sony and/or Microsoft got AMD to sign a clause preventing it from releasing a high-performance SoC for X amount of years. Akin to Samsung's contract for the Xclipse RDNA GPUs having a clause that prevents AMD from releasing SoCs that work below 5W. There were so many things wrong with Kaby Lake G.
- It released in 2018 using the old 4-core Kaby Lake CPU from early 2017, already after Intel had released 6-core Coffee Lake mobile CPUs
- They called it a Vega GPU when in reality it used the ISA of a Polaris GPU, so no improved geometry processing, no rapid-packed-math, etc.
- Intel charged way too much for it, to the point that it was much cheaper to get laptops with a Core i5 + GTX1050 which also got better performance overall.
- Very large bandwidth (200GB/s) couldn't be taken advantage of with only 4GB available and using such a small 24CU GPU @ 1GHz.
In the end it was just bad design and planning. They launched a premium product with old silicon. Not just bad players. I have a friend who perpetuates that myth because he had a bad experience with an AMD GPU... in 2007.I was dual booting until literally yesterday.
And I wouldn't use the word "crap." But there is a difference. It's been improving yes, but I am impatient. :laugh:
I've always speculated a lot of that is down to the number of useless services always enabled/required on Windows. But in recent years there's also generally better schedulers available on linux as well which helps them a lot, especially post Android.
At this point, my only problem is the unreasonably high video playback power consumption on RDNA 3, but I don't think that can be improved with drivers, unfortunately.
I play mostly indie titles so...
My issue is that as long a people won't admit there are problems, AMD has no incentive to fix their Windows drivers.
That said, Nvidia wasn't trouble free either, but these messages need to not happen too often, or its gonna be a short ride on RDNA3. If big titles like this can't run stable entirely, meh. Though it is known Rockstar didn't really give RDR2 PC that much aftercare either. Benefit of the doubt.
But I can imagine that, they will fck up the price and energy management running that 256bit wide memory bus on light load....
Sometimes I wonder what it would have been like if AMD and nvidia merged as was the original attempt before AMD settled for ATI upon Jensun wanting more control than AMD was willing to give.
Somehow valve managed to make the steam deck a success despite putting Arch of all distros on it, so it's not as bad as you'd think!
As a developer though, I noticed a funny thing about Windows... the I/O is such that if a program is ported in a naive way, though it may work, it will have worse performance than on Linux. "stat" is a fast command on linux, but not so on Windows. I just do not do any JS development on windows anymore for example because of how attrocious the tools like npm and webpack handle thousands of tiny files. And text searching them is no better.