• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Strix Point SoC "Zen 5" and "Zen 5c" CPU Cores Have 256-bit FPU Datapaths

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,233 (7.55/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD in its architecture deep-dive Q&A session with the press, confirmed that the "Zen 5" and "Zen 5c" cores on the "Strix Point" silicon only feature 256-bit wide FPU data-paths, unlike the "Zen 5" cores in the "Granite Ridge" Ryzen 9000 desktop processors. "The Zen 5c used in Strix has a 256-bit data-path, and so does the Zen 5 used inside of Strix," said Mike Clark, AMD corporate fellow and chief architecture of the "Zen" CPU cores. "So there's no delta as you move back and forth [thread migration between the Zen 5 and Zen 5c complexes] in vector throughput," he added.

It doesn't seem like AMD disabled a physically available feature, but rather, the company developed a variant of both the "Zen 5" and "Zen 5c" cores that physically lack the 512-bit data-paths. "And you get the area advantage to be able to scale out a little bit more," Clark continued. This suggests that the "Zen 5" and "Zen 5c" cores on "Strix Point" are physically smaller than the ones on the 4 nm "Eldora" 8-core CCD that is featured in "Granite Ridge" and some of the key models of the upcoming 5th Gen EPYC "Turin" server processors.



One of the star-attractions of the "Zen 5" microarchitecture is its floating-point unit, which supports AVX512 with a full 512-bit data path. In comparison, the previous-generation "Zen 4" handled AVX512 using a dual-pumped 256-bit FPU. The new 512-bit FPU, depending on the exact workload and other factors, is about 20-40% faster than "Zen 4" at 512-bit floating-point workloads, which is why "Zen 5" is expected to post significant gains in AI inferencing performance, as well as plow through benchmarks that use AVX512.

We're not sure how the lack of a 512-bit FP data-path affects performance of instructions relevant to AI acceleration, since "Strix Point" is mainly being designed for Microsoft Copilot+ ready AI PCs. It's possible that AVX512 and AVX-VNNI are being run on a dual-pumped 256-bit data-path similar to how it is done on "Zen 4." There could be some performance/Watt advantages to doing it this way, which could be relevant to mobile platforms.

View at TechPowerUp Main Site
 
Joined
Jan 11, 2022
Messages
871 (0.83/day)
That is going to be nasty in the future when there is no feature parity and software that runs fine on zen5 desktop that won’t be running on zen 5 laptop/mini/aio
 
Joined
Jun 29, 2018
Messages
537 (0.23/day)
That is going to be nasty in the future when there is no feature parity and software that runs fine on zen5 desktop that won’t be running on zen 5 laptop/mini/aio
There's no incompatibility here.
All Zen 5/5c cores support the exact same instructions. It's only the internal execution pipeline that's fully 512-bit wide on desktop/server variants and "2x256"-bit on mobile. With the latter being very close to what Zen 4/4c was doing I suppose.
 
Joined
Jan 11, 2022
Messages
871 (0.83/day)
There's no incompatibility here.
All Zen 5/5c cores support the exact same instructions. It's only the internal execution pipeline that's fully 512-bit wide on desktop/server variants and "2x256"-bit on mobile. With the latter being very close to what Zen 4/4c was doing I suppose.
Sure and in say 5 years or so, some very popular software is going to need the full 512 to fully function and the double pumped one isn't going to be enough.
I was seriously bummed out as a kid when my 486SX2 wouldn't play quake
 
Joined
Jun 29, 2018
Messages
537 (0.23/day)
Sure and in say 5 years or so, some very popular software is going to need the full 512 to fully function and the double pumped one isn't going to be enough.
I was seriously bummed out as a kid when my 486SX2 wouldn't play quake
486SX couldn't run Quake because it was a processor without a FPU, so it was unable to execute x87 instructions needed by the game.
In this situation all Zen 5 variants support the same instruction sets including AVX-512. From software perspective there is no difference between them other than execution speed.
 
Joined
Jul 13, 2016
Messages
3,279 (1.07/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Sure and in say 5 years or so, some very popular software is going to need the full 512 to fully function and the double pumped one isn't going to be enough.
I was seriously bummed out as a kid when my 486SX2 wouldn't play quake

First, no developer making a mass-market app is going to develop a product with AVX 512 support and not have a fall back implementation. Not unless you are talking something very niche where the dev knows people who use their app all have newer hardware. There will still be a significant chunk of users without AVX 512 support in 5 years, devs won't just up an abandon them.

Second, people using CPUs with double pumped AVX 512 do in fact have AVX 512 support. They will be able to use the app unlike in your scenario where you could not play quake. Double pumped AVX512 is pretty performant on Zen 4 processors and I expect the same to apply to these mobile processors as well.

The mobile CPUs being double-pumped is a non-issue for compatibility.
 
Last edited:
Joined
Jun 1, 2021
Messages
306 (0.24/day)
Considering how AMD put in the Geekbench AES benchmark to calculate that IPC increase, this change will probably have a pretty signficant decrease if you were to calculate the IPC from the same benchmarks as AMD did.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,835 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Sure and in say 5 years or so, some very popular software is going to need the full 512 to fully function and the double pumped one isn't going to be enough.
This won't be the case, to software there is no detectable difference, these are the exact same instructions. It's just that the 512-bit datapath runs faster than the other (not by a factor of 2)
 
Joined
Mar 16, 2017
Messages
2,098 (0.75/day)
Location
Tanagra
System Name Budget Box
Processor Xeon E5-2667v2
Motherboard ASUS P9X79 Pro
Cooling Some cheap tower cooler, I dunno
Memory 32GB 1866-DDR3 ECC
Video Card(s) XFX RX 5600XT
Storage WD NVME 1GB
Display(s) ASUS Pro Art 27"
Case Antec P7 Neo
Sure and in say 5 years or so, some very popular software is going to need the full 512 to fully function and the double pumped one isn't going to be enough.
I was seriously bummed out as a kid when my 486SX2 wouldn't play quake
I think you're more likely to run into some app that won't run without an NPU, but even that should fall back to the GPU in a pinch. I can't imagine any popular software targeting specific hardware, especially something like AVX512, where desktop Intel processors since Adler Lake don't support that feature at all. It's coming back again, but talk about a setback if you're hoping for popular consumer adoption.
 
Joined
Jan 3, 2021
Messages
3,491 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The AVX512 units don't only perform FP operations but also integer and bitwise operations on vectors. I don't know enough to judge but those may have a bigger impact on games and other consumer workloads than FP operations if the performance is halved or significantly reduced. Integer math is used everywhere, FP math has a narrower range of usability.
 
Joined
May 22, 2024
Messages
411 (2.20/day)
System Name Kuro
Processor AMD Ryzen 7 7800X3D@65W
Motherboard MSI MAG B650 Tomahawk WiFi
Cooling Thermalright Phantom Spirit 120 EVO
Memory Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s) PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage Crucial T500 2TB + WD Blue 8TB
Case Lian Li LANCOOL 216
Power Supply MSI MPG A850G
Software Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores 17761 C23 Multi@65W
The AVX512 units don't only perform FP operations but also integer and bitwise operations on vectors. I don't know enough to judge but those may have a bigger impact on games and other consumer workloads than FP operations if the performance is halved or significantly reduced. Integer math is used everywhere, FP math has a narrower range of usability.
Current benchmark results seem to point towards games and consumer workloads not making good use of such features anyway, outside a few exceptions. But the capability had to be there first. Capability one of the major makes is no longer (or not yet with AVX10) providing.

I think applications making really good use of AVX512 tends to be memory bandwidth bound, if not load/store bound, on current consumer hardware, anyway.

Back on topic, I wonder whether it had anything to do with more than power consumption and efficiency, and whether there would be a separate moniker for these reduced cores.
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,147 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
The AVX512 units don't only perform FP operations but also integer and bitwise operations on vectors. I don't know enough to judge but those may have a bigger impact on games and other consumer workloads than FP operations if the performance is halved or significantly reduced. Integer math is used everywhere, FP math has a narrower range of usability.

Is there really that much significance in this difference of true AVX-512 capability vs. AVX-512 on 256-bit hardware? APU dies were born with and have never escaped the half L3 curse. We have already been expecting poorer CPU performance in all aspects from them every year since 2017, so this is just more of the same.
 
Joined
Oct 24, 2022
Messages
191 (0.25/day)
If AMD put the memory controller on the same die of the x86 cores, I think Ryzen CPUs would have a performance gain of around 20%.
 
Joined
Sep 1, 2009
Messages
1,232 (0.22/day)
Location
CO
System Name 4k
Processor AMD 5800x3D
Motherboard MSI MAG b550m Mortar Wifi
Cooling ARCTIC Liquid Freezer II 240
Memory 4x8Gb Crucial Ballistix 3600 CL16 bl8g36c16u4b.m8fe1
Video Card(s) Nvidia Reference 3080Ti
Storage ADATA XPG SX8200 Pro 1TB
Display(s) LG 48" C1
Case CORSAIR Carbide AIR 240 Micro-ATX
Audio Device(s) Asus Xonar STX
Power Supply EVGA SuperNOVA 650W
Software Microsoft Windows10 Pro x64
If AMD put the memory controller on the same die of the x86 cores, I think Ryzen CPUs would have a performance gain of around 20%.
Even if true this design would 100% go against the chiplet design. The whole reason why the controller is separate is because the cores are the same between varying different product stacks. The Memory controller and I/O die are the only changes "more complicated than that, but for simplicity" between the different product stacks.
 
Joined
Oct 24, 2022
Messages
191 (0.25/day)
Even if true this design would 100% go against the chiplet design. The whole reason why the controller is separate is because the cores are the same between varying different product stacks. The Memory controller and I/O die are the only changes "more complicated than that, but for simplicity" between the different product stacks.
I know that, but AMD is already making several different core configurations. And they may have already developed (AI) apps that do much of the chip design work in a few minutes or hours, work that used to take several months.

With the memory controller integrated on the same die of the x86 cores, the x86 cores would have a much lower RAM access latency and, thus, the chip's IPC would increase.
 
Joined
Sep 1, 2009
Messages
1,232 (0.22/day)
Location
CO
System Name 4k
Processor AMD 5800x3D
Motherboard MSI MAG b550m Mortar Wifi
Cooling ARCTIC Liquid Freezer II 240
Memory 4x8Gb Crucial Ballistix 3600 CL16 bl8g36c16u4b.m8fe1
Video Card(s) Nvidia Reference 3080Ti
Storage ADATA XPG SX8200 Pro 1TB
Display(s) LG 48" C1
Case CORSAIR Carbide AIR 240 Micro-ATX
Audio Device(s) Asus Xonar STX
Power Supply EVGA SuperNOVA 650W
Software Microsoft Windows10 Pro x64
I know that, but AMD is already making several different core configurations. And they may have already developed (AI) apps that do much of the chip design work in a few minutes or hours, work that used to take several months.

With the memory controller integrated on the same die of the x86 cores, the x86 cores would have a much lower RAM access latency and, thus, the chip's IPC would increase.
This is speculation but i will assume its not worth it finically compared to how flexible their product stack is now.
 
Top