• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

SL2

Joined
Jan 27, 2006
Messages
1,856 (0.28/day)
I'd rather do without and have CPUs that are 20-30% cheaper instead.
You mean due to smaller die? Yeah, I don't think that's gonna happen.

I mean, of course AMD could lower the price for various reasons, but the reason being smaller die size alone isn't very likely I'm afraid.
 

bug

Joined
May 22, 2015
Messages
13,274 (4.04/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
You mean due to smaller die? Yeah, I don't think that's gonna happen.

I mean, of course AMD could lower the price for various reasons, but the reason being smaller die size alone isn't very likely I'm afraid.
Die size makes the biggest impact on the retail price of a CPU. Waffers have predetermined sizes, they cost the same to make. The more chips you turn them into, the lower the price.
 
Joined
Aug 13, 2020
Messages
86 (0.06/day)
I'm sure the shutdown at TSMC from the earthquakes will definitely impact AMD...Delay or reduced shipments if delivered on time...
 
Joined
Dec 16, 2017
Messages
2,733 (1.17/day)
Location
Buenos Aires, Argentina
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / WD20EZRX / ST10000VN0008 / SA400S37960G / SNV21000G / NM620 2TB
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Mouse Microsoft Trackball Optical 1.0
Keyboard HP Vectra VE keyboard (Part # D4950-63004)
Software Whatever build of Windows 11 is being served in Dev channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624
I'd just like to see more mainstream consumer applications using such an instruction set.

There are some mainstream uses, such as Blender and some image/video encoding/decoding libraries, but not much else. Maybe RPCS3 if you want to consider PS3 emulation as "mainstream"

Wonder if this will be a compelling upgrade for Zen3 gamers.
Gotta change board and RAM for this, at least, so it'd probably need some impressive numbers (+20% over Zen4).
 
Joined
Jul 13, 2016
Messages
2,889 (1.01/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
If run locally, maybe. But currently most models worth anything are too big to run a consumer PC. And that's not going to change: no matter how capable PCs will grow, the cloud will always be better.

This is simply not true. You have large models like Llama2, Mistral, ect with a massive amount of parameters working well on regular desktop PCs. You also have Stable diffusion XL and the upcoming stable diffusion 3 models. There's also plenty of AI models that don't require much to run like AI voice enhancers, voice isolation, layer isolation, ect. You are assuming that every AI model worth having is super big and resource intensive but you can see from things like DLSS and SDXL Lighting that AI can be a powerful tool without needing a massive amount of resources. These smaller models can be extremely handy and light on resources.
 
Joined
Jan 2, 2019
Messages
61 (0.03/day)
Location
Calgary, Canada
Here a couple of comments...

- A source for that leak is Very Questionable

- Intel AVX-512 ISA is a Complete Tech Disaster ( * )

( * )
It is based on my experience using an Intel Xeon Phi server. We reached its performance limitations in less than 4 weeks after a project was started.
 
Joined
Jun 29, 2018
Messages
467 (0.22/day)
I'm a bit confused. A few years ago we were burning Intel to the stake for AVX-512 (https://linuxiac.com/linus-torvalds-criticizes-intel-avx-512/, but not only). Now we're cheering for the same AVX-512?
We were burning Intel at the stake because their implementation was subpar. Engaging early AVX-512 implementations caused severe downclocking for the entire CPU even if only a single core was using it. The same issue affected AVX2 to a lesser extent. This made using AVX-512 a hazard for normal CPU operations, often resulting in performance significantly worse than AVX/AVX2 versions.
Since then Intel designs have reduced the penalty and almost eliminated it altogether for Sapphire Rapids.
Thermal have certainly improved, but the discussion was more about the large amount of die space being used for specialized purposes. That's still the case. Considering the increased competition for fab capacity, you'd think "wasted" transistors is more of o problem today than it was 4 years ago.
Even with an older Skylake-X implementation that contained 2 AVX-512-capable units (one created by combining two 256-bit units, and one dedicated) the difference isn't as big, since only the red part is "dedicated" for AVX-512. Obviously there's other parts of the CPU that need to be extended for it as well.


Source

I'm a bit more in the other camp: if it only benefits like 10% of the typical workloads, I'd rather do without and have CPUs that are 20-30% cheaper instead.

At the same time, I realize this is basically a chicken-and-egg problem: if AVX-512 isn't available, apps that use it won't be either.
Current Intel desktop/mobile P-cores contain the transistors for one AVX-512 unit (the combined 2x256-bit), and the miscellaneous stuff all over the core. The server parts extend this base core with a second dedicated 512-bit unit, more cache, a mesh agent and an AMX unit, among other things we can't be sure of just from die shots.
Meteor Lake is also built on the same principle using Redwood Cove cores. It would be prohibitively expensive for Intel to design a special version of the core without them when the combined unit is used for AVX2 anyway. All that makes the E-core business even more controversial.
I doubt purging AVX-512 completely would result in 20-30% less area.

Gains from AVX-512 can be significant, some benchmarks on Phoronix show up to 20x improvement using AVX-512-FP16, but most are not as drastic. Another recent gain of 10x in AI LLM prompt evaluation speed. We're starting to see some Linux distributions compiling software specifically for the x86-64-v4 target which includes AVX-512. It's not only about the vector length, since AVX-512 contains other general improvements usable even by strictly integer-based software.
 
Joined
Feb 10, 2023
Messages
161 (0.35/day)
Location
Lake Superior
In znver5 FP store ports are fused for 512-bit operations but can be used separately for 256-bit operations. In some AVX(2) workloads this will improve performance as well.

Code:
(define_reservation "znver5-fp-store256" "znver5-fp-store0|znver5-fp-store1")
(define_reservation "znver5-fp-store-512" "znver5-fp-store0+znver5-fp-store1")
 
Joined
Dec 12, 2016
Messages
1,298 (0.48/day)
Die size makes the biggest impact on the retail price of a CPU. Waffers have predetermined sizes, they cost the same to make. The more chips you turn them into, the lower the price.
Don’t forget the law of mass production where reductions in cost can be achieved at scale. It’s cheaper to make millions of a single complex, large core design than a much smaller volume of a few simpler, smaller cores. That’s why AMD has the same chiplet for both Epyc and Ryzen.
 
Joined
Aug 20, 2007
Messages
20,821 (3.40/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11

bug

Joined
May 22, 2015
Messages
13,274 (4.04/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
The criticism was due to the product segmentation not the product.
You didn't even open the link I provided, did you?
 
Joined
Mar 13, 2021
Messages
402 (0.35/day)
Processor AMD 7600x
Motherboard Asrock x670e Steel Legend
Cooling Silver Arrow Extreme IBe Rev B with 2x 120 Gentle Typhoons
Memory 4x16Gb Patriot Viper Non RGB @ 6000 30-36-36-36-40
Video Card(s) XFX 6950XT MERC 319
Storage 2x Crucial P5 Plus 1Tb NVME
Display(s) 3x Dell Ultrasharp U2414h
Case Coolermaster Stacker 832
Power Supply Thermaltake Toughpower PF3 850 watt
Mouse Logitech G502 (OG)
Keyboard Logitech G512
You didn't even open the link I provided, did you?
Read what he says

He complains at the time that Intel were trying to market AVX512 as the magic bullet to solve all problems. When in actual fact if you used it, it was horrible.

You run AVX512 code on Alder lake and your down in 3.5Ghz Territory when the Turbos were 5Ghz+ for most other things. It also meant the P Cores were physically larger per core for near 0 benefit for most work loads where as a 10-12 core design with only AVX2 would have been better for most use cases. And the other half of your die was completely useless for doing AVX512 workload so there was also that as you had to disable your E cores to use it effectively.


AMD at the time were giving him everything he wanted. More cores, Decent power levels/consumption per core and no gimmicky tools to use to extract extra performance. As he stated at the time AVX512 should have been only in HPC/Server areas and the desktop had little to no benefit from it then.
 
Joined
Aug 20, 2007
Messages
20,821 (3.40/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11
You didn't even open the link I provided, did you?
I've read it before. I know what Torvalds argues.

Have a quote:

He also cautioned against placing too much weight on floating-point performance benchmarks. Especially those that take advantage of exotic new instruction sets that have a fragmented and varied implementation across product lines.
 
Joined
Feb 11, 2020
Messages
201 (0.13/day)
Oh, man, what a huge let-down. I had my hopes up it was the general instruction pipeline that was up by 40%. But alas not it seems.

Zen4 AVX512 is already a huge winner the way it is. It single-handedly turned the AVX512 ship around. It didn't need any measurable extra power to do amazing amounts of work. I now fear Zen5 is not going to be that good.
 
Joined
Feb 10, 2023
Messages
161 (0.35/day)
Location
Lake Superior
Oh, man, what a huge let-down. I had my hopes up it was the general instruction pipeline that was up by 40%. But alas not it seems.

Zen4 AVX512 is already a huge winner the way it is. It single-handedly turned the AVX512 ship around. It didn't need any measurable extra power to do amazing amounts of work. I now fear Zen5 is not going to be that good.
Because of a fake slide?
The way Zen 5 implements 512-bit operations is not yet clear. It may simply be fusing ports fp0/fp1, like they do for stores, in one cycle instead of doing it sequentially. It wouldn't take much extra area. Nor extra power compared to a dense AVX2 loop.

And what we do have evidence for from Zen 5 changes to Linux and GCC suggests general pipeline improvements too. 8 wide dispatch from micro-op cache, 6 ALU and 4 AGU. The only confirmed change for FP is a second FP store unit which does suggest improved throughput of AVX2 and AVX512 programs.

And where did you get the idea it'd be 40% faster? Discredited RDNA3 hypebeasts on twitter?
 
Joined
Nov 27, 2023
Messages
1,151 (6.69/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original)
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (23H2)
And where did you get the idea it'd be 40% faster? Discredited RDNA3 hypebeasts on twitter?
Yeah, this should obviously be ridiculous - there hasn’t been a gen on gen improvement this massive… in a while. Not solely from the general instructions. Definitely not between generations of the same architecture. Otherwise we would be talking about a jump in overall performance that would be the biggest for AMD since Zen 1 when compared to Bulldozer and its derivatives. CPUs simply don’t increase in performance this drastically. Even the leaks and estimates for Zen 5 go for saner numbers like 10-15% IPC improvement (plausible) and 20-30% overall performance uplift compared to Zen 4 (again, tracks pretty well with what we’ve seen with previous gen increases, Zen+ aside for obvious reasons).
 
Joined
Jun 10, 2014
Messages
2,906 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The low L2 cache size is an obvious planned mistake and low hanging fruit for Zen 6 to fix, we know AMD were experimenting with larger L2 cache sizes, and that 2MB was the sweet spot, and 3MB offering only slight low single-digit uplift in perf over 2MB. One of the reasons for the infamous "AMD dip".
Even though we know the slide is fake, I just want to point out that no one, including the best engineers, could precisely assess the effect of a cache change without evaluating the performance of a specific microarchitecture. A change in cache size on one microarchitecture might not translate to the same proportional change on another. L2 and L1 especially, is very tied to how the pipeline works, which is why the cache configuration might change a lot between generations. And contrary to what most people believe, they don't design the microarchitecture around the cache, it's the other way around. If throwing in another MB or so would make a huge benefit, I'm sure they would. They do simulate all kinds of core configurations before they do a tapeout, so they have quite likely already simulated what a larger L2 cache, and whichever they pick is the overall best performing within the constraints of the architecture and node.

Also, keep in mind there are many more attributes than just size, like latency, number of banks, bandwidth, etc. If the next generation is moved to a new node with different characteristics, it may be achievable with e.g. a larger cache without worsening the latency significantly.
Additionally, many heavy AVX workloads are more sensitive to bandwidth than cache size.

And it's also borderline criminal AMD do not rectify the L3 cache starvation issue without the "3D cache band-aid" cash grab. Even a better memory controller would help in this regard.
I've often criticized the large L3, as it's a very "brute force" attempt to make up for shortcomings in the architecture, a sort of "band-aid" like you rightfully call it. But if Zen 5 is significantly better, especially in the front-end and scheduling of instructions, the usefulness of extra L3 may be actually reduced.
There will obviously still be the edge-case scenarios where the extra L3 shines (mostly very bloated code), but the overall gain is close to negligible, and it's such a waste of silicon for most uses.

AVX512 is for integer and bitwise operations too, not only for FP. That's where SPEC-int gains, purportedly very big, come from.
AVX certainly support integer operations too as you say, but I suspect SPECint isn't compiled to use it, although I haven't checked thoroughly. But even so, modern CPUs do auto-vectorize in some cases, but I don't know if the front-end will be fast enough to vectorize more than 4 64-bit or 8 32-bit ops (per vector unit, so 2x) per clock. I suspect it will be very underutilized in reality, but still, in the worst case with AMD having their vector units on separate execution ports, it will allow each vector unit to work as a single ALU. Or probably split, so each FMA-pair as ALU+MUL. (whether it's worth it in power draw is uncertain)
 
Joined
Dec 12, 2016
Messages
1,298 (0.48/day)
I for one am glad the nonsense of a one year cadence between Zen 4 and Zen 5 is dead. So many were saying why buy Zen 4 when Zen 5 would come a year later. AMD processor architectures are on a two year cadence just like GPUs. Its possible it could be up to six months early or up to six months late for some releases as circumstances dictate. But never less or more than that for a major release.

Longer cadence with more features and performance on the same established platform as the last gen. This is a big reason I buy AMD.
 
Joined
Jul 13, 2016
Messages
2,889 (1.01/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Oh, man, what a huge let-down. I had my hopes up it was the general instruction pipeline that was up by 40%. But alas not it seems.

Zen4 AVX512 is already a huge winner the way it is. It single-handedly turned the AVX512 ship around. It didn't need any measurable extra power to do amazing amounts of work. I now fear Zen5 is not going to be that good.

I'll assume based on your reaction here that you are not into tech news enough to know that a single slide cannot contain all the details of a given chip. Typically the press is given a deck of slides, not just a single slide, when a company releases a new CPU or GPU.

Nevermind that the slide turned out to be fake, you are drawing a conclusion based on wholly incomplete information. As usual with these kind of rumors and "leaked" slides, they are designed to generate clicks and engagement like what you've provided here. Don't fall for it, wait for official info to draw an informed conclusion.
 
Joined
Jan 11, 2022
Messages
500 (0.58/day)
You mean due to smaller die? Yeah, I don't think that's gonna happen.

I mean, of course AMD could lower the price for various reasons, but the reason being smaller die size alone isn't very likely I'm afraid.
AMD does chiplets, they can just cut back the number of cores per chiplet and have small dies.
 
Top