Friday, April 5th 2024
AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU
AMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the answer to how the company managed this—using a true 512-bit FPU. Currently, AMD uses a dual-pumped 256-bit FPU to execute AVX-512 workloads on "Zen 4." The updated FPU should significantly improve the core's performance in workloads that take advantage of 512-bit AVX or VNNI instructions, such as AI.
Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.
Source:
Moore's Law is Dead (YouTube)
Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.
63 Comments on AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU
So whoever made it added elements of truth here and there.
Only trust the reviews and the FPS in game :)
And it's also borderline criminal AMD do not rectify the L3 cache starvation issue without the "3D cache band-aid" cash grab. Even a better memory controller would help in this regard.
Someone must have just made slides on top of this info.
I've stopped counting all the times I've read Zen as Ryzen in a leak, without thinking. That's not to say that Ryzen won't have this.
Similar to nVidia when they were releasing benchmarks with the tiniest of writing saying "using dlsss"
And I'm not even saying AVX-512 is bad, my question was more about what changed in the meantime.
I'd just like to see more mainstream consumer applications using such an instruction set.
At the same time, I realize this is basically a chicken-and-egg problem: if AVX-512 isn't available, apps that use it won't be either.
I mean, of course AMD could lower the price for various reasons, but the reason being smaller die size alone isn't very likely I'm afraid.