Wednesday, July 24th 2024
AMD Strix Point SoC "Zen 5" and "Zen 5c" CPU Cores Have 256-bit FPU Datapaths
AMD in its architecture deep-dive Q&A session with the press, confirmed that the "Zen 5" and "Zen 5c" cores on the "Strix Point" silicon only feature 256-bit wide FPU data-paths, unlike the "Zen 5" cores in the "Granite Ridge" Ryzen 9000 desktop processors. "The Zen 5c used in Strix has a 256-bit data-path, and so does the Zen 5 used inside of Strix," said Mike Clark, AMD corporate fellow and chief architecture of the "Zen" CPU cores. "So there's no delta as you move back and forth [thread migration between the Zen 5 and Zen 5c complexes] in vector throughput," he added.
It doesn't seem like AMD disabled a physically available feature, but rather, the company developed a variant of both the "Zen 5" and "Zen 5c" cores that physically lack the 512-bit data-paths. "And you get the area advantage to be able to scale out a little bit more," Clark continued. This suggests that the "Zen 5" and "Zen 5c" cores on "Strix Point" are physically smaller than the ones on the 4 nm "Eldora" 8-core CCD that is featured in "Granite Ridge" and some of the key models of the upcoming 5th Gen EPYC "Turin" server processors.One of the star-attractions of the "Zen 5" microarchitecture is its floating-point unit, which supports AVX512 with a full 512-bit data path. In comparison, the previous-generation "Zen 4" handled AVX512 using a dual-pumped 256-bit FPU. The new 512-bit FPU, depending on the exact workload and other factors, is about 20-40% faster than "Zen 4" at 512-bit floating-point workloads, which is why "Zen 5" is expected to post significant gains in AI inferencing performance, as well as plow through benchmarks that use AVX512.
We're not sure how the lack of a 512-bit FP data-path affects performance of instructions relevant to AI acceleration, since "Strix Point" is mainly being designed for Microsoft Copilot+ ready AI PCs. It's possible that AVX512 and AVX-VNNI are being run on a dual-pumped 256-bit data-path similar to how it is done on "Zen 4." There could be some performance/Watt advantages to doing it this way, which could be relevant to mobile platforms.
It doesn't seem like AMD disabled a physically available feature, but rather, the company developed a variant of both the "Zen 5" and "Zen 5c" cores that physically lack the 512-bit data-paths. "And you get the area advantage to be able to scale out a little bit more," Clark continued. This suggests that the "Zen 5" and "Zen 5c" cores on "Strix Point" are physically smaller than the ones on the 4 nm "Eldora" 8-core CCD that is featured in "Granite Ridge" and some of the key models of the upcoming 5th Gen EPYC "Turin" server processors.One of the star-attractions of the "Zen 5" microarchitecture is its floating-point unit, which supports AVX512 with a full 512-bit data path. In comparison, the previous-generation "Zen 4" handled AVX512 using a dual-pumped 256-bit FPU. The new 512-bit FPU, depending on the exact workload and other factors, is about 20-40% faster than "Zen 4" at 512-bit floating-point workloads, which is why "Zen 5" is expected to post significant gains in AI inferencing performance, as well as plow through benchmarks that use AVX512.
We're not sure how the lack of a 512-bit FP data-path affects performance of instructions relevant to AI acceleration, since "Strix Point" is mainly being designed for Microsoft Copilot+ ready AI PCs. It's possible that AVX512 and AVX-VNNI are being run on a dual-pumped 256-bit data-path similar to how it is done on "Zen 4." There could be some performance/Watt advantages to doing it this way, which could be relevant to mobile platforms.
15 Comments on AMD Strix Point SoC "Zen 5" and "Zen 5c" CPU Cores Have 256-bit FPU Datapaths
All Zen 5/5c cores support the exact same instructions. It's only the internal execution pipeline that's fully 512-bit wide on desktop/server variants and "2x256"-bit on mobile. With the latter being very close to what Zen 4/4c was doing I suppose.
I was seriously bummed out as a kid when my 486SX2 wouldn't play quake
In this situation all Zen 5 variants support the same instruction sets including AVX-512. From software perspective there is no difference between them other than execution speed.
Second, people using CPUs with double pumped AVX 512 do in fact have AVX 512 support. They will be able to use the app unlike in your scenario where you could not play quake. Double pumped AVX512 is pretty performant on Zen 4 processors and I expect the same to apply to these mobile processors as well.
The mobile CPUs being double-pumped is a non-issue for compatibility.
I think applications making really good use of AVX512 tends to be memory bandwidth bound, if not load/store bound, on current consumer hardware, anyway.
Back on topic, I wonder whether it had anything to do with more than power consumption and efficiency, and whether there would be a separate moniker for these reduced cores.
With the memory controller integrated on the same die of the x86 cores, the x86 cores would have a much lower RAM access latency and, thus, the chip's IPC would increase.