Monday, March 1st 2021
AMD "Zen 4" Microarchitecture to Support AVX-512
The next-generation "Zen 4" CPU microarchitecture powering AMD's 4th Gen EPYC "Genoa" enterprise processors, will support 512-bit AVX instruction sets, according to an alleged company slide leaked to the web on the ChipHell forums. The slide references "AVX3-512" support in addition to BFloat16 and "other ISA extensions." This would make "Zen 4" the first AMD microarchitecture to support AVX-512. It remains to be seen which specific instructions the architecture supports, and whether all of them are available to both the enterprise and client implementations of "Zen 4," or whether AMD would take an approach similar to Intel, in only enabling certain "relevant" instructions on the client parts. The slide also mentions core counts being "greater than 64" corresponding withour story from earlier today.
Sources:
ChipHell Forums, via VideoCardz
43 Comments on AMD "Zen 4" Microarchitecture to Support AVX-512
64 cores over 4 GHz...:pimp:
It's a good feature family - it's just not conceptually aligned with Intel and AMD's perpetual pursuit of high clock speeds on consumer mobile/desktop platforms. Hell, look at AVX2 downclocking on Ryzen 3000/5000 stock boost algorithms. You can have your AVX - but you'd better be prepared to drop those clocks lest you want to double your power draw or burn your chip, and that applies to both Intel and AMD.
Pretty likely that they implement it the same way they did AVX2 at first - using two narrower units for the actual execution part.
in some time each amd ryzen 5 will cost as XEON processor
It'd be nice to see wider adoption. These instruction sets are great but seemingly nobody uses them....
AMD supports AVX and AVX2, in fact, even VIA does...
en.wikipedia.org/wiki/Advanced_Vector_Extensions
Some examples
Death Stranding and Horizon, crew 2, GRID 2, path of exile and project cars
And here is a list of software that does, that includes Microsoft Teams (AVX2) which I use everyday for work.
en.wikipedia.org/wiki/Advanced_Vector_Extensions#Software
github.com/RPCS3/rpcs3/pull/8700
github.com/RPCS3/rpcs3/pull/8712 It's the same CPU arch that goes for both enterprise and consumer products. I think this announcement is more interesting for enterprise users in general.
At a minimum, AVX512 allows 256-bit cores to get issued 2-uops per instruction (doubling your throughput of the decoder, which is beginning to look like a problem!! Remember: Apple M1 is 8-instructions / clock tick, and AMD Zen is only 4-instructions/clock when decoding, 6-when in the uop cache). More "work" per instruction, so to speak, which was the design of the original Crays from the 1970s.
Intel is going with a native 512-bit implementation, but Centaur CNS (and probably AMD) are probably going to stick with 256-bit native, with 512-bit instructions. This grossly reduces power in the decoder, allows more instructions to fit in L1 cache (because it'd normally take two AVX256-bit instructions to make a 512-bit operation. Or... 1x 512-bit instruction to do 2x256-bit native work). Honestly, there's just a ton of advantages to supporting 512-bit, especially when you consider all the possible designs AMD can do here. There's really no reasons NOT to support 512-bit.