Monday, July 29th 2024
AMD Strix Point Silicon Pictured and Annotated
The first die shot of AMD's new 4 nm "Strix Point" mobile processor surfaced, thanks to an enthusiast on Chinese social media. "Strix Point" is a significantly larger die than "Phoenix." It measures 12.06 mm x 18.71 mm (L x W), compared to the 9.06 mm x 15.01 mm of "Phoenix." Much of this die size increase comes from the larger CPU, iGPU, and NPU. The process has been improved from TSMC N4 on "Phoenix" and its derivative "Hawk Point," to the newer TSMC N4P node.
Nemez (GPUsAreMagic) annotated the die shot in great detail. The CPU now has 12 cores spread across two CCX, one of which contains four "Zen 5" cores sharing a 16 MB L3 cache; and the other with eight "Zen 5c" cores sharing an 8 MB L3 cache. The two CCXs connect to the rest of the chip over Infinity Fabric. The rather large iGPU takes up the central region of the die. It is based on the RDNA 3.5 graphics architecture, and features 8 workgroup processors (WGPs), or 16 compute units (CU) worth 1,024 stream processors. Other key components include four render backends worth 16 ROPs, and control logic. The GPU has its own 2 MB of L2 cache that cushions transfers to the Infinity Fabric.Slightly separated from the iGPU are its allied components, the Media Engine, and the Display Engine. The Media Engine provides hardware acceleration for encoding and decoding of h.264, h.265, and AV1, besides several legacy video formats. The Display Engine is responsible for encoding the frame output of the iGPU to the various connector formats (such as DisplayPort, eDP, HDMI), including hardware-accelerated display stream compression; while the display PHYs handle the physical layer of the connectors.
The NPU is the third major logic component of "Strix Point." This second generation NPU by AMD is visibly larger than the one found in "Phoenix." It is based on the more advanced XDNA 2 architecture, and contains 32 AI engine tiles, talking to its own high-speed local memory, and a control logic that interfaces with Infinity Fabric. This NPU is designed to meet and exceed the hardware requirements of Microsoft Copilot+, and provides a throughput of 50 TOPS.
The memory controller supports dual-channel (160-bit) DDR5 with native DDR5-5600; and 128-bit LPDDR5 at speeds of up to LPDDR5-7500. The controller features an unspecified size of SRAM cache, which Nemez notes was also seen on the "Phoenix 2" and "Phoenix" dies, but not on the memory controller of the cIOD found in "Raphael" and "Dragon Range."
The "Strix Point" silicon has a smaller PCIe root complex than "Phoenix," which in turn has a smaller root complex than "Cezanne." AMD has been reducing the PCIe lane count by 4 over the past three generations. "Cezanne" features 24 PCIe Gen 3 lanes (x16 PEG + x4 NVMe + x4 chipset bus or GPP); while "Phoenix" truncates this to 20 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 chipset bus or GPP + x4 configured as USB4). The newer "Strix Point" cuts it down further to just 16 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 configured as USB4 or GPP).
The idea behind the PCIe lane reduction is that "Strix Point" is designed to square off against "Lunar Lake," which too only has x4 for PEG/GPP, and when "Arrow Lake-H" and "Arrow Lake-HX" eventually hit the scene, they'll be met with AMD's "Fire Range" chip that has a 28-lane PCIe Gen 5 interface and can be paired with even the fastest discrete mobile GPUs.
Sources:
harukaze5719 (Twitter), Nemez (Twitter)
Nemez (GPUsAreMagic) annotated the die shot in great detail. The CPU now has 12 cores spread across two CCX, one of which contains four "Zen 5" cores sharing a 16 MB L3 cache; and the other with eight "Zen 5c" cores sharing an 8 MB L3 cache. The two CCXs connect to the rest of the chip over Infinity Fabric. The rather large iGPU takes up the central region of the die. It is based on the RDNA 3.5 graphics architecture, and features 8 workgroup processors (WGPs), or 16 compute units (CU) worth 1,024 stream processors. Other key components include four render backends worth 16 ROPs, and control logic. The GPU has its own 2 MB of L2 cache that cushions transfers to the Infinity Fabric.Slightly separated from the iGPU are its allied components, the Media Engine, and the Display Engine. The Media Engine provides hardware acceleration for encoding and decoding of h.264, h.265, and AV1, besides several legacy video formats. The Display Engine is responsible for encoding the frame output of the iGPU to the various connector formats (such as DisplayPort, eDP, HDMI), including hardware-accelerated display stream compression; while the display PHYs handle the physical layer of the connectors.
The NPU is the third major logic component of "Strix Point." This second generation NPU by AMD is visibly larger than the one found in "Phoenix." It is based on the more advanced XDNA 2 architecture, and contains 32 AI engine tiles, talking to its own high-speed local memory, and a control logic that interfaces with Infinity Fabric. This NPU is designed to meet and exceed the hardware requirements of Microsoft Copilot+, and provides a throughput of 50 TOPS.
The memory controller supports dual-channel (160-bit) DDR5 with native DDR5-5600; and 128-bit LPDDR5 at speeds of up to LPDDR5-7500. The controller features an unspecified size of SRAM cache, which Nemez notes was also seen on the "Phoenix 2" and "Phoenix" dies, but not on the memory controller of the cIOD found in "Raphael" and "Dragon Range."
The "Strix Point" silicon has a smaller PCIe root complex than "Phoenix," which in turn has a smaller root complex than "Cezanne." AMD has been reducing the PCIe lane count by 4 over the past three generations. "Cezanne" features 24 PCIe Gen 3 lanes (x16 PEG + x4 NVMe + x4 chipset bus or GPP); while "Phoenix" truncates this to 20 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 chipset bus or GPP + x4 configured as USB4). The newer "Strix Point" cuts it down further to just 16 PCIe Gen 4 lanes (x8 PEG + x4 NVMe + x4 configured as USB4 or GPP).
The idea behind the PCIe lane reduction is that "Strix Point" is designed to square off against "Lunar Lake," which too only has x4 for PEG/GPP, and when "Arrow Lake-H" and "Arrow Lake-HX" eventually hit the scene, they'll be met with AMD's "Fire Range" chip that has a 28-lane PCIe Gen 5 interface and can be paired with even the fastest discrete mobile GPUs.
41 Comments on AMD Strix Point Silicon Pictured and Annotated
tbh they should leave the WGP counts alone for a little while, they REALLY need to stop hiking the price and treating their APUs like they're made of gold, if the APUs are ever to get the mobile marketshare they deserve. There could be 50CUs in there and it wouldn't make a lick of difference if it just continues to be delayed vaporware every generation
Looking at the largest UK retailer for new laptops, filtering by AMD processor and sorting by popularity, the top 10 results are:
- 5700U - Zen2 and Vega rehashed architectures from 5 and 7 years ago, respectively.
- 7520U - Zen2 and minimal, near-worthless 2CU graphics.
- 7320U - Zen2 and minimal, near-worthless 2CU graphics
- 5500U - Zen2 and Vega rubbish again. 2017 called....
- 8840HS - Finally! A current-gen product.
- 7530U - Oh look, shitty, ancient Vega7
- 7530U - ...and again
- 7730U - Vega 8 doesn't make it better :\
- 8840U - Yay, the another current-gen product.
- 8845HS - Current-gen, but the 780M is wasted because this thing has an RTX card.
It's a pretty sorry sight to see. Even if the products are good, AMD is barely selling them and their rejects from the Vega IGP days are still flooding the shelves in 2H 2024!with FHD120Hz/Oled, Touchpads, the analog sticks with programmable rings around and the triggers with click or analog locking ...
So I look at the market here (and across the border) and what Zen 4/RDNA2 can I find? The ROG Flow X13 (7940HS) @ $900, a HP Envy (8840HS) @ $850 in Paraguay (so add 50% over what exceeds $500) and a HP Firefly (7840HS) @ $1850 in Brazil. Now, pray tell, where the heck are the x840U devices?
AMD could be making a killing in this arena but there's almost nothing on sale so how does AMD expect to get sales?
I hope those tiny L3 caches don't gimp performance like it does on the desktop chips. Some odd decisions in there... But I'm sure they know what they're doing more than I do.
USB 2.0 is still plenty for lots of things, and things like onboard audio are actually moving towards USB 2.0 for some reason. One for audio, another for biometrics, another for any lighting controller the system might choose to implement, and the rest for keyboard and the touchpad, maybe. Embedded uses would probably also need a few of these.
I also wonder how much longer it would be, before they'd make a monolithic APU with more than 16MB of L3 cache visible to a single core. Would that be power inefficient somehow, or is it something else? Hopefully it'd be an easy solution since similar non-uniform architectures has been there a while now. There probably won't be any solution if the game really does need more than 4 Zen 5 cores, though - It'd be interesting if some of them ended up faster on the 8-core Zen 5c cluster.
I feel they should have stuck to a single CPU type, and not mixed them - This may work on a future games consoles where they are purpose built, have huge memory bandwidth on a custom bus and run custom OS's and drivers tuned to that hardware, but PC/Windows still has serious scheduling issues, exasperated by mixing different types of CPU cores and their physical location on the die and how they communicate. I just wonder if they should have gone all Zen5c or 8 Zen5 cores and cut the c-cores entirely. 16MB is a ridiculous choice for the full Zen5 cores on a hand-held gaming console, as it instantly takes away 20% of its performance, but I guess there is only 4 cores to argue between it, although the chances of them all being fully occupied, and fighting for cache during gameplay is higher.
An 8 core Zen5c with 32MB of L3 would have been my choice. Better battery life, better long-term performance due to not thermal throttling as much, no Windows scheduler issues, and just nicer for the customer with a cooler, more performant device in their hand/lap.