Wednesday, July 24th 2024

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Jul 24th, 2024 08:00 Discuss (16 Comments)

Since its reveal last week, we got a slightly more technical deep-dive from AMD on its two upcoming processors—the "Strix Point" silicon powering its Ryzen AI 300 series mobile processors; and the "Granite Ridge" chiplet MCM powering its Ryzen 9000 desktop processors. We present a closer look into the "Strix Point" SoC in this article. It turns out that "Strix Point" takes a significantly different approach to heterogeneous multicore than "Phoenix 2." AMD gave us a close look at how this works. AMD built the "Strix Point" monolithic silicon on the TSMC N4P foundry node, with a die-area of around 232 mm².

The "Strix Point" silicon sees the company's Infinity Fabric interconnect as its omnipresent ether. This is a point-to-point interconnect, unlike the ringbus on some Intel processors. The main compute machinery on the "Strix Point" SoC are its two CPU compute complexes (CCX), each with a 32b (read)/16b (write) per cycle data-path to the fabric. The concept of CCX makes a comeback with "Strix Point" after nearly two generations of "Zen." The first CCX contains the chip's four full-sized "Zen 5" CPU cores, which share a 16 MB L3 cache among themselves. The second CCX contains the chip's eight "Zen 5c" cores that share a smaller 8 MB L3 cache. Each of the 12 cores has a 1 MB dedicated L2 cache.

This approach to heterogeneous multicore is significantly different from "Phoenix 2," where the two "Zen 4" and four "Zen 4c" cores were part of a common CCX, with a common 16 MB L3 cache accessible to all six cores.

The "Zen 5" cores on "Strix Point" will be able to sustain high boost frequencies, in excess of 5.00 GHz, and should benefit from the larger 16 MB L3 cache that's shared among just four cores (similar L3 cache per core to "Granite Ridge"). The "Zen 5c" cores, on the other hand, operate at lower base- and boost frequencies than the "Zen 5" cores, and have lesser amounts of available L3 caches. For threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.

The Zen 5c core is about 25% smaller in die-area than the Zen 5 core. For reference, the Zen 4c core is about 35% smaller than a regular Zen 4 core. AMD has worked to slightly improve the maximum boost frequencies of the Zen 5c core compared to its predecessor, so the frequency band of the Zen 5c cores are a tiny bit closer. The lower maximum voltages and maximum boost frequencies of Zen 5c cores put them at a significant power efficiency advantage over the Zen 5 cores. AMD is continuing to rely on a software based scheduling solution that ensures the right kind of processing workload goes to the right kind of core. The company says that the software based solution lets it correct "scheduling mistakes" over time.

The iGPU is the most bandwidth-hungry device on the fabric, and gets its widest data-path—4x 32B/cycle. Based on the RDNA 3.5 graphics architecture, which retains the SIMD engine and IPC of RDNA 3, but with several improvements to the performance/Watt, this iGPU also features 8 workgroup processors (WGPs), compared to the 6 on the current "Phoenix" silicon. This works out to 16 CU, or 1,024 stream processors. The iGPU also features 4 render backends+, which work out to 16 ROPs.

The third most bandwidth-hungry device is the XDNA 2 NPU, with a 32B/cycle data-path that's of a comparable bandwidth to a CCX. The NPU features four blocks of 8 XDNA 2 arrays, and 32 AI engine tiles; for 50 TOPS of AI inferencing throughput, and can be overclocked. It also supports the Block FP16 data format (not to be confused with bfloat16), which offers the precision of FP16, with the performance of FP8.

Besides the three logic-heavy components, there are other accelerators that are fairly demanding on the bandwidth, such as the Video CoreNext engine that accelerates encoding and decoding; the audio coprocessor that processes the audio stack when the system is "powered down," so it can respond to voice commands; the display controller that handles the display I/O, including display stream compression, if called for; the SMU, Microsoft Pluton, TPM, and other manageability hardware.

The I/O interfaces of the "Strix Point" SoC include a memory controller that supports 128-bit LPDDR5, LPDDR5x, and dual-channel DDR5 (160-bit). The PCI-Express root complex is slightly truncated compared to the one "Phoenix" comes with. There are a total of 16 PCIe Gen 4 lanes. All 16 should be usable in notebooks that lack a discrete FCH chipset, but the usable lane count should drop to 12 when AMD eventually adapts this silicon to Socket AM5 for desktop APUs. On gaming notebooks that use Ryzen AI HX or H 300 series processors, discrete GPUs should have a Gen 4 x8 connection. USB connectivity includes a 40 Gbps USB4, or two 20 Gbps USB 3.2 Gen 2x2, two additional 10 Gbps USB 3.2 Gen 2, and three classic USB 2.0.

Add your own comment

16 Comments on AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

AnotherReader

That's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.

Caring1

Centralized hot spot should be easier to cool.

Klemc

That's cool

Daven

I'm not sure I would have gone with the term 'classic' to describe performance cores but okay.

The green block 'CPU core' for the classic cores is transistor to transistor the same as the green block 'CPU core' for the compact core. AMD has reduced the spacing between transistors which limits the max clock frequency but maintains IPC.

We now have both pictures of the chip arrangement of Turin Epycs:

Turin 128 'Classic' Cores (256 threads)

Turin 192 'Compact' Cores (384 threads)

Klemc

Why CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares

Patriot

KlemcWhy CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares

you sound cheap, bigger cost more, for them and then for us.

Klemc

Patriotyou sound cheap, bigger cost more, for them and then for us.

But less problems... right, if cheaper then i will not change.

T1beriu

KlemcWhy CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares

Every square mm cost $$$.

Patriot

KlemcBut less problems... right, if cheaper then i will not change.

Not really, higher latency, mores power usage for longer traces to maintain signal integrity.

#10

Wirko

btarunrFor threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.

Separate L3 caches ... that's weird. It seems that thread migration will come with a large penalty of re-filling the other L3, and the two caches must also be kept coherent at all times.

#11

Minus Infinity

AnotherReaderThat's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.

Really, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz

#12

R0H1T

Minus InfinityZen 5c is actually weaker than Skymont E

Practically 0% chance of that given they(zen5) have the same IPC.

#13

ratirt

KlemcBut less problems... right, if cheaper then i will not change.

You need to understand also, wafer, these are being printed on have flaws. The bigger the chip, the more probability the chip will be defective. (Something will not work as it should) Meaning getting the full spec'ed chip will be harder. Less of them will meet the requirements which will boost price for these.

#14

AnotherReader

Minus InfinityReally, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz

8 of the new cores are smaller than the other 4 cores so it's more like 10 Zen 4 cores in die area. With such a great increase in die size, giving the IGP a large last level cache like its discrete counterparts wouldn't have increased die size that much.

As for Zen 5c vs Skymont E, SMT will allow Zen 5c to keep up with the latter in many workloads even at lower clocks. Of course, we don't know yet what clocks Zen 5c will reach.

#15

Zareek

I saw Strix and my mind went right to, oh boy, Strix Halo information.

NPU = wasted die space... Fricken' M$ and their co-pilot BS. That die space would be far more useful as more PCIe lanes.

#16

Minus InfinityReally, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz

Lemme stop you right there. These days what Intel "says" should be taken with a boulder of salt unless backed up by 3 years of data. We're at the point where even coming up with a CPU that's provably faster in any day 1 benches is not worth a lot given that it might just run in a fashion that literally kills it within the year. They did it with woefully insecure designs, they did it with absurd power levels, they did it with high voltages, they'll keep doing it as long as enough people have more brand loyalty than common sense.

Add your own comment

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

16 Comments on AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Related News

16 Comments on AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts