Wednesday, July 24th 2024

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Since its reveal last week, we got a slightly more technical deep-dive from AMD on its two upcoming processors—the "Strix Point" silicon powering its Ryzen AI 300 series mobile processors; and the "Granite Ridge" chiplet MCM powering its Ryzen 9000 desktop processors. We present a closer look into the "Strix Point" SoC in this article. It turns out that "Strix Point" takes a significantly different approach to heterogeneous multicore than "Phoenix 2." AMD gave us a close look at how this works. AMD built the "Strix Point" monolithic silicon on the TSMC N4P foundry node, with a die-area of around 232 mm².

The "Strix Point" silicon sees the company's Infinity Fabric interconnect as its omnipresent ether. This is a point-to-point interconnect, unlike the ringbus on some Intel processors. The main compute machinery on the "Strix Point" SoC are its two CPU compute complexes (CCX), each with a 32b (read)/16b (write) per cycle data-path to the fabric. The concept of CCX makes a comeback with "Strix Point" after nearly two generations of "Zen." The first CCX contains the chip's four full-sized "Zen 5" CPU cores, which share a 16 MB L3 cache among themselves. The second CCX contains the chip's eight "Zen 5c" cores that share a smaller 8 MB L3 cache. Each of the 12 cores has a 1 MB dedicated L2 cache.
This approach to heterogeneous multicore is significantly different from "Phoenix 2," where the two "Zen 4" and four "Zen 4c" cores were part of a common CCX, with a common 16 MB L3 cache accessible to all six cores.

The "Zen 5" cores on "Strix Point" will be able to sustain high boost frequencies, in excess of 5.00 GHz, and should benefit from the larger 16 MB L3 cache that's shared among just four cores (similar L3 cache per core to "Granite Ridge"). The "Zen 5c" cores, on the other hand, operate at lower base- and boost frequencies than the "Zen 5" cores, and have lesser amounts of available L3 caches. For threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.

The Zen 5c core is about 25% smaller in die-area than the Zen 5 core. For reference, the Zen 4c core is about 35% smaller than a regular Zen 4 core. AMD has worked to slightly improve the maximum boost frequencies of the Zen 5c core compared to its predecessor, so the frequency band of the Zen 5c cores are a tiny bit closer. The lower maximum voltages and maximum boost frequencies of Zen 5c cores put them at a significant power efficiency advantage over the Zen 5 cores. AMD is continuing to rely on a software based scheduling solution that ensures the right kind of processing workload goes to the right kind of core. The company says that the software based solution lets it correct "scheduling mistakes" over time.

The iGPU is the most bandwidth-hungry device on the fabric, and gets its widest data-path—4x 32B/cycle. Based on the RDNA 3.5 graphics architecture, which retains the SIMD engine and IPC of RDNA 3, but with several improvements to the performance/Watt, this iGPU also features 8 workgroup processors (WGPs), compared to the 6 on the current "Phoenix" silicon. This works out to 16 CU, or 1,024 stream processors. The iGPU also features 4 render backends+, which work out to 16 ROPs.

The third most bandwidth-hungry device is the XDNA 2 NPU, with a 32B/cycle data-path that's of a comparable bandwidth to a CCX. The NPU features four blocks of 8 XDNA 2 arrays, and 32 AI engine tiles; for 50 TOPS of AI inferencing throughput, and can be overclocked. It also supports the Block FP16 data format (not to be confused with bfloat16), which offers the precision of FP16, with the performance of FP8.

Besides the three logic-heavy components, there are other accelerators that are fairly demanding on the bandwidth, such as the Video CoreNext engine that accelerates encoding and decoding; the audio coprocessor that processes the audio stack when the system is "powered down," so it can respond to voice commands; the display controller that handles the display I/O, including display stream compression, if called for; the SMU, Microsoft Pluton, TPM, and other manageability hardware.

The I/O interfaces of the "Strix Point" SoC include a memory controller that supports 128-bit LPDDR5, LPDDR5x, and dual-channel DDR5 (160-bit). The PCI-Express root complex is slightly truncated compared to the one "Phoenix" comes with. There are a total of 16 PCIe Gen 4 lanes. All 16 should be usable in notebooks that lack a discrete FCH chipset, but the usable lane count should drop to 12 when AMD eventually adapts this silicon to Socket AM5 for desktop APUs. On gaming notebooks that use Ryzen AI HX or H 300 series processors, discrete GPUs should have a Gen 4 x8 connection. USB connectivity includes a 40 Gbps USB4, or two 20 Gbps USB 3.2 Gen 2x2, two additional 10 Gbps USB 3.2 Gen 2, and three classic USB 2.0.
Add your own comment

14 Comments on AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

#1
AnotherReader
That's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.
Posted on Reply
#2
Caring1
Centralized hot spot should be easier to cool.
Posted on Reply
#4
Daven
I'm not sure I would have gone with the term 'classic' to describe performance cores but okay.

The green block 'CPU core' for the classic cores is transistor to transistor the same as the green block 'CPU core' for the compact core. AMD has reduced the spacing between transistors which limits the max clock frequency but maintains IPC.

We now have both pictures of the chip arrangement of Turin Epycs:


Turin 128 'Classic' Cores (256 threads)

Turin 192 'Compact' Cores (384 threads)
Posted on Reply
#5
Klemc
Why CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares
Posted on Reply
#6
Patriot
KlemcWhy CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares
you sound cheap, bigger cost more, for them and then for us.
Posted on Reply
#7
Klemc
Patriotyou sound cheap, bigger cost more, for them and then for us.
But less problems... right, if cheaper then i will not change.
Posted on Reply
#8
T1beriu
KlemcWhy CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares
Every square mm cost $$$.
Posted on Reply
#9
Patriot
KlemcBut less problems... right, if cheaper then i will not change.
Not really, higher latency, mores power usage for longer traces to maintain signal integrity.
Posted on Reply
#10
Wirko
btarunrFor threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.
Separate L3 caches ... that's weird. It seems that thread migration will come with a large penalty of re-filling the other L3, and the two caches must also be kept coherent at all times.
Posted on Reply
#11
Minus Infinity
AnotherReaderThat's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.
Really, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz
Posted on Reply
#12
R0H1T
Minus InfinityZen 5c is actually weaker than Skymont E
Practically 0% chance of that given they(zen5) have the same IPC.
Posted on Reply
#13
ratirt
KlemcBut less problems... right, if cheaper then i will not change.
You need to understand also, wafer these are being printed in have flaws. The bigger the chip, the more probability the chip will be defective. (Something will not work as it should) Meaning getting the full speced chip will be harder. Less of them will meet the requirements which will boost price for these.
Posted on Reply
#14
AnotherReader
Minus InfinityReally, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz
8 of the new cores are smaller than the other 4 cores so it's more like 10 Zen 4 cores in die area. With such a great increase in die size, giving the IGP a large last level cache like its discrete counterparts wouldn't have increased die size that much.

As for Zen 5c vs Skymont E, SMT will allow Zen 5c to keep up with the latter in many workloads even at lower clocks. Of course, we don't know yet what clocks Zen 5c will reach.
Posted on Reply
Add your own comment
Jul 25th, 2024 15:57 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts