Monday, July 29th 2024
AMD Strix Point Silicon Pictured and Annotated
The first die shot of AMD's new 4 nm "Strix Point" mobile processor surfaced, thanks to an enthusiast on Chinese social media. "Strix Point" is a significantly larger die than "Phoenix." It measures 12.06 mm x 18.71 mm (L x W), compared to the 9.06 mm x 15.01 mm of "Phoenix." Much of this die size increase comes from the larger CPU, iGPU, and NPU. The process has been improved from TSMC N4 on "Phoenix" and its derivative "Hawk Point," to the newer TSMC N4P node.
Nemez (GPUsAreMagic) annotated the die shot in great detail. The CPU now has 12 cores spread across two CCX, one of which contains four "Zen 5" cores sharing a 16 MB L3 cache; and the other with eight "Zen 5c" cores sharing an 8 MB L3 cache. The two CCXs connect to the rest of the chip over Infinity Fabric. The rather large iGPU takes up the central region of the die. It is based on the RDNA 3.5 graphics architecture, and features 8 workgroup processors (WGPs), or 16 compute units (CU) worth 1,024 stream processors. Other key components include four render backends worth 16 ROPs, and control logic. The GPU has its own 2 MB of L2 cache that cushions transfers to the Infinity Fabric.Slightly separated from the iGPU are its allied components, the Media Engine, and the Display Engine. The Media Engine provides hardware acceleration for encoding and decoding of h.264, h.265, and AV1, besides several legacy video formats. The Display Engine is responsible for encoding the frame output of the iGPU to the various connector formats (such as DisplayPort, eDP, HDMI), including hardware-accelerated display stream compression; while the display PHYs handle the physical layer of the connectors.
The NPU is the third major logic component of "Strix Point." This second generation NPU by AMD is visibly larger than the one found in "Phoenix." It is based on the more advanced XDNA 2 architecture, and contains 32 AI engine tiles, talking to its own high-speed local memory, and a control logic that interfaces with Infinity Fabric. This NPU is designed to meet and exceed the hardware requirements of Microsoft Copilot+, and provides a throughput of 50 TOPS.
The memory controller supports dual-channel (160-bit) DDR5 with native DDR5-5600; and 128-bit LPDDR5 at speeds of up to LPDDR5-7500. The controller features an unspecified size of SRAM cache, which Nemez notes was also seen on the "Phoenix 2" and "Phoenix" dies, but not on the memory controller of the cIOD found in "Raphael" and "Dragon Range."
The "Strix Point" silicon has a smaller PCIe root complex than "Phoenix," which in turn has a smaller root complex than "Cezanne." AMD has been reducing the PCIe lane count by 4 over the past three generations. "Cezanne" features 24 PCIe Gen 3 lanes (x16 PEG + x4 NVMe + x4 chipset bus or GPP); while "Phoenix" truncates this to 20 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 chipset bus or GPP + x4 configured as USB4). The newer "Strix Point" cuts it down further to just 16 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 configured as USB4 or GPP).
The idea behind the PCIe lane reduction is that "Strix Point" is designed to square off against "Lunar Lake," which too only has x4 for PEG/GPP, and when "Arrow Lake-H" and "Arrow Lake-HX" eventually hit the scene, they'll be met with AMD's "Fire Range" chip that has a 28-lane PCIe Gen 5 interface and can be paired with even the fastest discrete mobile GPUs.
Sources:
harukaze5719 (Twitter), Nemez (Twitter)
Nemez (GPUsAreMagic) annotated the die shot in great detail. The CPU now has 12 cores spread across two CCX, one of which contains four "Zen 5" cores sharing a 16 MB L3 cache; and the other with eight "Zen 5c" cores sharing an 8 MB L3 cache. The two CCXs connect to the rest of the chip over Infinity Fabric. The rather large iGPU takes up the central region of the die. It is based on the RDNA 3.5 graphics architecture, and features 8 workgroup processors (WGPs), or 16 compute units (CU) worth 1,024 stream processors. Other key components include four render backends worth 16 ROPs, and control logic. The GPU has its own 2 MB of L2 cache that cushions transfers to the Infinity Fabric.Slightly separated from the iGPU are its allied components, the Media Engine, and the Display Engine. The Media Engine provides hardware acceleration for encoding and decoding of h.264, h.265, and AV1, besides several legacy video formats. The Display Engine is responsible for encoding the frame output of the iGPU to the various connector formats (such as DisplayPort, eDP, HDMI), including hardware-accelerated display stream compression; while the display PHYs handle the physical layer of the connectors.
The NPU is the third major logic component of "Strix Point." This second generation NPU by AMD is visibly larger than the one found in "Phoenix." It is based on the more advanced XDNA 2 architecture, and contains 32 AI engine tiles, talking to its own high-speed local memory, and a control logic that interfaces with Infinity Fabric. This NPU is designed to meet and exceed the hardware requirements of Microsoft Copilot+, and provides a throughput of 50 TOPS.
The memory controller supports dual-channel (160-bit) DDR5 with native DDR5-5600; and 128-bit LPDDR5 at speeds of up to LPDDR5-7500. The controller features an unspecified size of SRAM cache, which Nemez notes was also seen on the "Phoenix 2" and "Phoenix" dies, but not on the memory controller of the cIOD found in "Raphael" and "Dragon Range."
The "Strix Point" silicon has a smaller PCIe root complex than "Phoenix," which in turn has a smaller root complex than "Cezanne." AMD has been reducing the PCIe lane count by 4 over the past three generations. "Cezanne" features 24 PCIe Gen 3 lanes (x16 PEG + x4 NVMe + x4 chipset bus or GPP); while "Phoenix" truncates this to 20 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 chipset bus or GPP + x4 configured as USB4). The newer "Strix Point" cuts it down further to just 16 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 configured as USB4 or GPP).
The idea behind the PCIe lane reduction is that "Strix Point" is designed to square off against "Lunar Lake," which too only has x4 for PEG/GPP, and when "Arrow Lake-H" and "Arrow Lake-HX" eventually hit the scene, they'll be met with AMD's "Fire Range" chip that has a 28-lane PCIe Gen 5 interface and can be paired with even the fastest discrete mobile GPUs.
41 Comments on AMD Strix Point Silicon Pictured and Annotated
You know what, forget the thin-and-lights since I know your partners won't make them before Q3 2025 rebranded parts. Just give us a Strix Flow X13 without a dGPU, and 32GB of RAM. Please. I don't give a shit about the AM5 incarnation that's gonna arrive 18 months late.
It makes sense since the XG mobile port seems to be only on PCI-E 3.0 x8 (and 3.0 x4 on the Ally non-X). No clue if they have plans on upgrading it to PCI-E 4.0 since the CPU can support it, especially if it doesn't have a dGPU alongside.
An updated Zen5 one would be awesome though but I keep my laptops for a long time usually 3-4 gens
Intel has been using CPPC for years for laptops and since X299 HEDT. On the software side Windows pretty much just sees some core are clocked higher than others.
Intel's E-core not only has lower clocks but vastly different IPC compare to the P-cores, meaning instructions require different number of clock cycles to complete. This is what makes it differcult for scheduling.
It's not just the base X13 (which would be PERFECT if it was sold outside the US, and came with 32GB). I'd buy the G14 again if they just yknow, removed the dGPU for a base spec. Even if it's chunky, that's insane battery life and unparalleled cooling perf for the APU there.
But as it stands there's no way I'm buying the G14 again. Asus is using even the shitty entry level dGPU options to push the G14's price point ever higher. I very rarely turn on the dGPU, and there's no point paying for what I don't actually use. I only bought it because the market for actually decent (decent design, decent screen, decent battery, decent availability) iGPU-only Rembrandt laptops was an utter wasteland, and there was no other option.
The single slot SO-DIMM also makes for some wack-ass upgrade paths so if the base Phoenix X13 was available here, I would've just gotten that instead. Even if it's 16GB. I see AMD trying to wean laptop makers off their overdependence on the APU as a do-it-all CPU, and rightly so. But unless Fire Range makes a serious leap forward in terms of idle power (not happening without serious design changes), I don't see that happening anytime soon. Dragon Range is much better than desktop and is good enough for parity with Intel in the desktop replacement bricks where zero attention is paid to power optimization, but laptops like G14 are a different story.
I would have purchased it for the same price without the dgpu lol. I want to say at launch it was 1600 usd.
It's only for internet/youtube/email when I travel.
I seem to remember a news article last week that Strix is indeed segregating into 2CCX again. Which just means that we have basically made zero progress whatsoever (relative to chiplet parts) from improving APU gaming performance beyond 16MB L3, and also have essentially gone back to Zen 2 core-to-core latency.
The only other real comparison seems to be Bergamo, where the CCD is split into two with two clusters of 4c around 2 separate L3 clusters. And obviously that is a 2CCX affair.
And it's a valid point about scheduling. Apparently Zen 2 was so long ago that we don't remember anymore what it's like to be confined to a 4 core CCX.
On a normal out of the box fresh install I can easily turn off anywhere between 10~40 applications/services combined, or even higher in some cases.
4C/8T may not be very exciting but it's enough to drive most casual computing without fuss, so your average person who uses web/email/CAD/media applications it's plenty. Even most gaming is still okay on 4C/8T if you're not trying to feed a fairly serious dGPU that can keep minimum FPS close to or in the triple-digit framerates (case in point, Steam Deck)
So yeah, this increased focus on the CPU-to-IGP ratio in favour of the IGP is welcoming. Fewer full-fat CPU cores, and more graphics compute is what APUs have needed for a very very long time. On the desktop, for the last decade or more, it has been absolutely fine to buy an entry-level i5 that satisfies the bare minimum needed to feed a discrete graphics card, and then spend all the remaining budget on a beefy graphics card, since so many games are GPU-limited even with this massive bias towards the GPU budget. It's certainly way more sensible for gaming than blowing most of the budget on a flagship CPU and then slapping a Geforce GT 710 in there!
Either way, that's how APUs have felt for ages - Their budget is obviously die area and shared power budget, but AMD have been putting 8 full-fat CPU cores on a mobile part that would struggle to feed those 8 cores enough power even if this was a CPU without an IGP, and then slap on an inadequate number of last-gen graphics cores as an afterthought. Less than three years ago, the best APU was the 5700G using 8 of the latest Zen3+CPU cores, but slapping a paltry 8CU of ancient Vega graphics into it, a stupid decision given that Vega was optimised for HBM and the shared DDR4 of on APU is the exact opposite of that!!
So yeah, AMD has finally started taking APU IGPs seriously. First we jumped from Vega to RDNA2 in the mobile space, and now we're getting RNDA 3.5 on this APU before seeing it in dGPUs. The CU count has also progressed upward again. I had Vega10 in my 2700U what feels like an eternity ago, and then mainstream models regressed to 6U or 8CU for half a decade - it was so bad that Intel caught up with AMD's IGPs :O
And as much as I hate to admit it, you know exactly why less cores is not the answer. More cores at lower all core clock is the way to go for efficiency.
I can really only think of two possible explanations for this thing existing:
- They want 12 cores to keep up with core counts but don't have a big enough ring for them all
- They went super low effort in order to rush a Copilot ready product to market, and just glued an enterprise 5c CCX into half of a consumer CCX
Kinda hard to put more WGPs in there if the memory controller only just barely managed to get enough bandwidth to fully feed 12CUs Need Strix to be in a design that will most likely only ever exist if Strix Halo comes to fruition. sadgeThat is still an *if* though... That was me saying that even 4C/8T is enough for the vast majority of casual users, not a reflection on Strix Point's 4C+8c configuration. And you're saying 8C would have taken up less space, but in this generation Dragon range dedicated mobile parts are coming in with 16C, so only 4C+8c is MUCH smaller than it could have been from a CPU-core perspective.
APU die got quite a bit bigger at 232 vs 178mm^2. Can't just pin that on the iGPU. Not sure where Dragon/Fire Range factors into this, it's a huge package and not the same segment.