Friday, April 5th 2024

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

AMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the answer to how the company managed this—using a true 512-bit FPU. Currently, AMD uses a dual-pumped 256-bit FPU to execute AVX-512 workloads on "Zen 4." The updated FPU should significantly improve the core's performance in workloads that take advantage of 512-bit AVX or VNNI instructions, such as AI.

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.
Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.
Source: Moore's Law is Dead (YouTube)
Add your own comment

63 Comments on AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

#1
cellar door
At this point, I'm pretty sure this guy makes up these charts and plasters their YT name on it. ...womp womp, this is a 0/10 leak
Posted on Reply
#2
ghazi
I have no interest in AVX512 but the upgrades on the integer side look to be compelling. Look forward to seeing how beefed up the front end is, it's about time to go wider.
Posted on Reply
#3
freeagent
cellar doorAt this point, I'm pretty sure this guy makes up these charts and plasters their YT name on it. ...womp womp, this is a 0/10 leak
Lets see you do better :laugh:
Posted on Reply
#4
AsRock
TPU addict
"Leaked", sure... i will believe when some one gets fired for it.
Posted on Reply
#5
JohH
This is a fake slide. But gcc patches do show 6 ALU and 4 AGU for znver5.
So whoever made it added elements of truth here and there.
Posted on Reply
#6
KrazyT
Fake, legit ... who cares ?
Only trust the reviews and the FPS in game :)
Posted on Reply
#7
qcmadness
Full 512-bit FPUs could be a curse with high power consumptions and high transistor budget.
Posted on Reply
#8
stimpy88
The low L2 cache size is an obvious planned mistake and low hanging fruit for Zen 6 to fix, we know AMD were experimenting with larger L2 cache sizes, and that 2MB was the sweet spot, and 3MB offering only slight low single-digit uplift in perf over 2MB. One of the reasons for the infamous "AMD dip".

And it's also borderline criminal AMD do not rectify the L3 cache starvation issue without the "3D cache band-aid" cash grab. Even a better memory controller would help in this regard.
Posted on Reply
#9
Rus4kova
cellar doorAt this point, I'm pretty sure this guy makes up these charts and plasters their YT name on it. ...womp womp, this is a 0/10 leak
Because it makes perfect sense to show the leaked slides unedited .. oh wait.
Posted on Reply
#10
Wirko
ghaziI have no interest in AVX512 but the upgrades on the integer side look to be compelling. Look forward to seeing how beefed up the front end is, it's about time to go wider.
AVX512 is for integer and bitwise operations too, not only for FP. That's where SPEC-int gains, purportedly very big, come from.
Posted on Reply
#11
Denver
JohHThis is a fake slide. But gcc patches do show 6 ALU and 4 AGU for znver5.
So whoever made it added elements of truth here and there.
Yes, it's practically confirmed that zen5 will bring some drastic changes compared to its predecessor.
Someone must have just made slides on top of this info.
Posted on Reply
#12
bug
qcmadnessFull 512-bit FPUs could be a curse with high power consumptions and high transistor budget.
I'm a bit confused. A few years ago we were burning Intel to the stake for AVX-512 (linuxiac.com/linus-torvalds-criticizes-intel-avx-512/, but not only). Now we're cheering for the same AVX-512?
Posted on Reply
#13
Denver
bugI'm a bit confused. A few years ago we were burning Intel to the stake for AVX-512 (linuxiac.com/linus-torvalds-criticizes-intel-avx-512/, but not only). Now we're cheering for the same AVX-512?
Maybe they've found a way to use the full AVX512 without the thermal implications and power consumption.
Posted on Reply
#14
bug
DenverMaybe they've found a way to use the full AVX512 without the thermal implications and power consumption.
Thermal have certainly improved, but the discussion was more about the large amount of die space being used for specialized purposes. That's still the case. Considering the increased competition for fab capacity, you'd think "wasted" transistors is more of o problem today than it was 4 years ago.
Posted on Reply
#15
Daven
bugThermal have certainly improved, but the discussion was more about the large amount of die space being used for specialized purposes. That's still the case. Considering the increased competition for fab capacity, you'd think "wasted" transistors is more of o problem today than it was 4 years ago.
Isn’t there some AI / machine learning algorithms that can use AVX512 now?
Posted on Reply
#16
bug
DavenIsn’t there some AI / machine learning algorithms that can use AVX512 now?
If run locally, maybe. But currently most models worth anything are too big to run a consumer PC. And that's not going to change: no matter how capable PCs will grow, the cloud will always be better.
Posted on Reply
#17
SL2
bugIf run locally, maybe. But currently most models worth anything are too big to run a consumer PC. And that's not going to change: no matter how capable PCs will grow, the cloud will always be better.
Zen 5 isn't for consumer PC's alone, tho.

I've stopped counting all the times I've read Zen as Ryzen in a leak, without thinking. That's not to say that Ryzen won't have this.
Posted on Reply
#18
Panther_Seraphin
bugI'm a bit confused. A few years ago we were burning Intel to the stake for AVX-512 (linuxiac.com/linus-torvalds-criticizes-intel-avx-512/, but not only). Now we're cheering for the same AVX-512?
There was a lot of hubub about Intel marketing using the AVX benchmarking to show it still having a massive lead in general. When in actual fact there was little to no lead in anything that didn't use avx512

Similar to nVidia when they were releasing benchmarks with the tiniest of writing saying "using dlsss"
Posted on Reply
#19
SL2
bugNow we're cheering for the same AVX-512?
Who's cheering?
Posted on Reply
#20
bug
SL2Zen 5 isn't for consumer PC's alone, tho.

I've stopped counting all the times I've read Zen as Ryzen in a leak, without thinking. That's not to say that Ryzen won't have this.
That's true, but so far AMD has made no difference in that regard between server and desktop.

And I'm not even saying AVX-512 is bad, my question was more about what changed in the meantime.
Posted on Reply
#21
SL2
bugThat's true, but so far AMD has made no difference in that regard between server and desktop.
That's why there's no point questioning any feature in a Ryzen CPU as long as it makes sense in EPYC. I'm pretty sure the latter dictates a lot of the design due to $.
bugAnd I'm not even saying AVX-512 is bad, my question was more about what changed in the meantime.
I think you've answered that already. ;)
bugAnd that's not going to change: no matter how capable PCs will grow, the cloud will always be better.
Posted on Reply
#23
Denver
bugThermal have certainly improved, but the discussion was more about the large amount of die space being used for specialized purposes. That's still the case. Considering the increased competition for fab capacity, you'd think "wasted" transistors is more of o problem today than it was 4 years ago.
If it translates into an advantage in AMD's most valuable market, I suppose it's worth it. The gains that AVX512 brings when used properly are massive.

I'd just like to see more mainstream consumer applications using such an instruction set.
Posted on Reply
#24
bug
DenverIf it translates into an advantage in AMD's most valuable market, I suppose it's worth it. The gains that AVX512 brings when used properly are massive.

I'd just like to see more mainstream consumer applications using such an instruction set.
I'm a bit more in the other camp: if it only benefits like 10% of the typical workloads, I'd rather do without and have CPUs that are 20-30% cheaper instead.

At the same time, I realize this is basically a chicken-and-egg problem: if AVX-512 isn't available, apps that use it won't be either.
Posted on Reply
#25
SL2
bugI'd rather do without and have CPUs that are 20-30% cheaper instead.
You mean due to smaller die? Yeah, I don't think that's gonna happen.

I mean, of course AMD could lower the price for various reasons, but the reason being smaller die size alone isn't very likely I'm afraid.
Posted on Reply
Add your own comment
Nov 20th, 2024 08:41 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts