• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,821 (7.63/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Since its reveal last week, we got a slightly more technical deep-dive from AMD on its two upcoming processors—the "Strix Point" silicon powering its Ryzen AI 300 series mobile processors; and the "Granite Ridge" chiplet MCM powering its Ryzen 9000 desktop processors. We present a closer look into the "Strix Point" SoC in this article. It turns out that "Strix Point" takes a significantly different approach to heterogeneous multicore than "Phoenix 2." AMD gave us a close look at how this works. AMD built the "Strix Point" monolithic silicon on the TSMC N4P foundry node, with a die-area of around 232 mm².

The "Strix Point" silicon sees the company's Infinity Fabric interconnect as its omnipresent ether. This is a point-to-point interconnect, unlike the ringbus on some Intel processors. The main compute machinery on the "Strix Point" SoC are its two CPU compute complexes (CCX), each with a 32b (read)/16b (write) per cycle data-path to the fabric. The concept of CCX makes a comeback with "Strix Point" after nearly two generations of "Zen." The first CCX contains the chip's four full-sized "Zen 5" CPU cores, which share a 16 MB L3 cache among themselves. The second CCX contains the chip's eight "Zen 5c" cores that share a smaller 8 MB L3 cache. Each of the 12 cores has a 1 MB dedicated L2 cache.



This approach to heterogeneous multicore is significantly different from "Phoenix 2," where the two "Zen 4" and four "Zen 4c" cores were part of a common CCX, with a common 16 MB L3 cache accessible to all six cores.

The "Zen 5" cores on "Strix Point" will be able to sustain high boost frequencies, in excess of 5.00 GHz, and should benefit from the larger 16 MB L3 cache that's shared among just four cores (similar L3 cache per core to "Granite Ridge"). The "Zen 5c" cores, on the other hand, operate at lower base- and boost frequencies than the "Zen 5" cores, and have lesser amounts of available L3 caches. For threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.

The Zen 5c core is about 25% smaller in die-area than the Zen 5 core. For reference, the Zen 4c core is about 35% smaller than a regular Zen 4 core. AMD has worked to slightly improve the maximum boost frequencies of the Zen 5c core compared to its predecessor, so the frequency band of the Zen 5c cores are a tiny bit closer. The lower maximum voltages and maximum boost frequencies of Zen 5c cores put them at a significant power efficiency advantage over the Zen 5 cores. AMD is continuing to rely on a software based scheduling solution that ensures the right kind of processing workload goes to the right kind of core. The company says that the software based solution lets it correct "scheduling mistakes" over time.

The iGPU is the most bandwidth-hungry device on the fabric, and gets its widest data-path—4x 32B/cycle. Based on the RDNA 3.5 graphics architecture, which retains the SIMD engine and IPC of RDNA 3, but with several improvements to the performance/Watt, this iGPU also features 8 workgroup processors (WGPs), compared to the 6 on the current "Phoenix" silicon. This works out to 16 CU, or 1,024 stream processors. The iGPU also features 4 render backends+, which work out to 16 ROPs.

The third most bandwidth-hungry device is the XDNA 2 NPU, with a 32B/cycle data-path that's of a comparable bandwidth to a CCX. The NPU features four blocks of 8 XDNA 2 arrays, and 32 AI engine tiles; for 50 TOPS of AI inferencing throughput, and can be overclocked. It also supports the Block FP16 data format (not to be confused with bfloat16), which offers the precision of FP16, with the performance of FP8.

Besides the three logic-heavy components, there are other accelerators that are fairly demanding on the bandwidth, such as the Video CoreNext engine that accelerates encoding and decoding; the audio coprocessor that processes the audio stack when the system is "powered down," so it can respond to voice commands; the display controller that handles the display I/O, including display stream compression, if called for; the SMU, Microsoft Pluton, TPM, and other manageability hardware.

The I/O interfaces of the "Strix Point" SoC include a memory controller that supports 128-bit LPDDR5, LPDDR5x, and dual-channel DDR5 (160-bit). The PCI-Express root complex is slightly truncated compared to the one "Phoenix" comes with. There are a total of 16 PCIe Gen 4 lanes. All 16 should be usable in notebooks that lack a discrete FCH chipset, but the usable lane count should drop to 12 when AMD eventually adapts this silicon to Socket AM5 for desktop APUs. On gaming notebooks that use Ryzen AI HX or H 300 series processors, discrete GPUs should have a Gen 4 x8 connection. USB connectivity includes a 40 Gbps USB4, or two 20 Gbps USB 3.2 Gen 2x2, two additional 10 Gbps USB 3.2 Gen 2, and three classic USB 2.0.

View at TechPowerUp Main Site
 
Joined
Nov 26, 2021
Messages
1,425 (1.46/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
That's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.
 
Joined
Oct 22, 2014
Messages
13,630 (3.82/day)
Location
Sunshine Coast
System Name Lenovo ThinkCentre
Processor AMD 5650GE
Motherboard Lenovo
Memory 32 GB DDR4
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Lenovo
Software W11 Pro 64 bit
Centralized hot spot should be easier to cool.
 
Joined
Jan 29, 2023
Messages
1,196 (2.20/day)
System Name KLM
Processor 7800X3D
Motherboard B-650E-E Strix
Cooling Arctic Cooling III 280
Memory 16x2 Fury Renegade 6000-32
Video Card(s) 4070-ti PNY
Storage 512+512+1+2+2+2+2+6+500+256+4+4+4
Display(s) VA 32" 4K@60 - OLED 27" 2K@240
Case 4000D Airflow
Audio Device(s) Edifier 1280Ts
Power Supply Shift 1000
Mouse 502 Hero
Keyboard K68
Software EMDB
Benchmark Scores 0>1000
Joined
Dec 12, 2016
Messages
1,465 (0.53/day)
I'm not sure I would have gone with the term 'classic' to describe performance cores but okay.

The green block 'CPU core' for the classic cores is transistor to transistor the same as the green block 'CPU core' for the compact core. AMD has reduced the spacing between transistors which limits the max clock frequency but maintains IPC.

We now have both pictures of the chip arrangement of Turin Epycs:

1721830344975.png

Turin 128 'Classic' Cores (256 threads)
1721830375253.png

Turin 192 'Compact' Cores (384 threads)
 
Joined
Jan 29, 2023
Messages
1,196 (2.20/day)
System Name KLM
Processor 7800X3D
Motherboard B-650E-E Strix
Cooling Arctic Cooling III 280
Memory 16x2 Fury Renegade 6000-32
Video Card(s) 4070-ti PNY
Storage 512+512+1+2+2+2+2+6+500+256+4+4+4
Display(s) VA 32" 4K@60 - OLED 27" 2K@240
Case 4000D Airflow
Audio Device(s) Edifier 1280Ts
Power Supply Shift 1000
Mouse 502 Hero
Keyboard K68
Software EMDB
Benchmark Scores 0>1000
Why CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares
 
Joined
Oct 27, 2009
Messages
1,156 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
Why CPU need to be small, i mean, it's small piece of a few cm/cm, if size can help make it cold... easyly, why not make CPU bigger, who cares
you sound cheap, bigger cost more, for them and then for us.
 
Joined
Jan 29, 2023
Messages
1,196 (2.20/day)
System Name KLM
Processor 7800X3D
Motherboard B-650E-E Strix
Cooling Arctic Cooling III 280
Memory 16x2 Fury Renegade 6000-32
Video Card(s) 4070-ti PNY
Storage 512+512+1+2+2+2+2+6+500+256+4+4+4
Display(s) VA 32" 4K@60 - OLED 27" 2K@240
Case 4000D Airflow
Audio Device(s) Edifier 1280Ts
Power Supply Shift 1000
Mouse 502 Hero
Keyboard K68
Software EMDB
Benchmark Scores 0>1000
you sound cheap, bigger cost more, for them and then for us.
But less problems... right, if cheaper then i will not change.
 
Joined
Oct 27, 2009
Messages
1,156 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
But less problems... right, if cheaper then i will not change.
Not really, higher latency, mores power usage for longer traces to maintain signal integrity.
 
Joined
Jan 3, 2021
Messages
2,978 (2.29/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
For threads to migrate between the two core types, they will have to go through the fabric, and in some cases, even incur a round-trip to the main memory.
Separate L3 caches ... that's weird. It seems that thread migration will come with a large penalty of re-filling the other L3, and the two caches must also be kept coherent at all times.
 
Joined
May 3, 2018
Messages
2,592 (1.14/day)
That's a big increase in die size; the 8840HS is 178 mm^2. Given that Zen 5 is as compact as Zen 4, the increase is probably due to the much larger NPU with a smaller contribution from the larger GPU.
Really, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz
 
Joined
May 31, 2016
Messages
4,405 (1.48/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
But less problems... right, if cheaper then i will not change.
You need to understand also, wafer these are being printed in have flaws. The bigger the chip, the more probability the chip will be defective. (Something will not work as it should) Meaning getting the full speced chip will be harder. Less of them will meet the requirements which will boost price for these.
 
Joined
Nov 26, 2021
Messages
1,425 (1.46/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Really, cpu 50% more cores, larger NPU, 33% more iGPU CU's and yet it's only 30% larger. Way less than expected.

It will be funny if after Zen 4c being way stronger than Gracemont E, that Zen 5c is actually weaker than Skymont E, a strong possibility given Intel saying Skymont E is as strong as Raptor Cove P cores, and they clock to 4.7GHz
8 of the new cores are smaller than the other 4 cores so it's more like 10 Zen 4 cores in die area. With such a great increase in die size, giving the IGP a large last level cache like its discrete counterparts wouldn't have increased die size that much.

As for Zen 5c vs Skymont E, SMT will allow Zen 5c to keep up with the latter in many workloads even at lower clocks. Of course, we don't know yet what clocks Zen 5c will reach.
 
Top