• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD "Zen 4" Microarchitecture to Support AVX-512

Joined
Apr 24, 2020
Messages
2,741 (1.59/day)
Like I said many times, wide SIMD is kind of stupid in CPUs, I just hope AVX512 doesn't ruin the power consumption of these chips.

It no longer has a major power-effect on Intel Rocket Lake chips, and AVX512 doesn't have any power-consumption increases on the Centaur CNS (an x86 with AVX512).

At a minimum, AVX512 allows 256-bit cores to get issued 2-uops per instruction (doubling your throughput of the decoder, which is beginning to look like a problem!! Remember: Apple M1 is 8-instructions / clock tick, and AMD Zen is only 4-instructions/clock when decoding, 6-when in the uop cache). More "work" per instruction, so to speak, which was the design of the original Crays from the 1970s.

Intel is going with a native 512-bit implementation, but Centaur CNS (and probably AMD) are probably going to stick with 256-bit native, with 512-bit instructions. This grossly reduces power in the decoder, allows more instructions to fit in L1 cache (because it'd normally take two AVX256-bit instructions to make a 512-bit operation. Or... 1x 512-bit instruction to do 2x256-bit native work). Honestly, there's just a ton of advantages to supporting 512-bit, especially when you consider all the possible designs AMD can do here. There's really no reasons NOT to support 512-bit.
 
Last edited:
Joined
Oct 25, 2020
Messages
10 (0.01/day)
I get why AMD wants to take away Intels last shiny toy, money.

AVX-512 is hard to program for effectively. Even if you manage to find a workload that benefits from it, the power scaling will more than likely ruin your day. Unless all you do on that entire system is crunching numbers for a long time.

Sources:
- https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/ (For people not too familiar with programming, OpenSSL is a very popular implementation of TLS and other encryption algorithms)
- https://www.phoronix.com/scan.php?page=news_item&px=Linus-Torvalds-On-AVX-512

Edit:
On Intels newest platforms Ice Lake (2019) and up this effect is greatly minimized. Thanks @dragontamer5788 for pointing this out.
 
Last edited:
Joined
Dec 29, 2010
Messages
3,815 (0.74/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
It must suck extra hard working for Intel today after reading this news.
 
Joined
Apr 24, 2020
Messages
2,741 (1.59/day)
AVX-512 is hard to program for effectively. Even if you manage to find a workload that benefits from it, the power scaling will more than likely ruin your day. Unless all you do on that entire system is crunching numbers for a long time.

Your links are out-of-date.


The AVX512 power-scaling problem is null-and-void as of Icelake / Rocket Lake. Just because Skylake-X had a poor AVX512 implementation doesn't mean that the next processors have that same issue. In fact: Intel commonly has "poor implementations" of SIMD for a generaton or two. (Ex: AVX on Sandy Bridge was utter crap and not worthwhile until Haswell). The point of the early implementations is to get assembly-programmers used to the instruction set before a newer, better processor implements the ISA for real.
 
Joined
Feb 8, 2021
Messages
92 (0.06/day)
Your links are out-of-date.


The AVX512 power-scaling problem is null-and-void as of Icelake / Rocket Lake. Just because Skylake-X had a poor AVX512 implementation doesn't mean that the next processors have that same issue. In fact: Intel commonly has "poor implementations" of SIMD for a generaton or two. (Ex: AVX on Sandy Bridge was utter crap and not worthwhile until Haswell). The point of the early implementations is to get assembly-programmers used to the instruction set before a newer, better processor implements the ISA for real.

and what happen when 70 % of world processor is intel and you program .. you dont use what the new man put on is processor .. dont have time to loose
 
Joined
Dec 26, 2006
Messages
3,884 (0.59/day)
Location
Northern Ontario Canada
Processor Ryzen 5700x
Motherboard Gigabyte X570S Aero G R1.1 BiosF5g
Cooling Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s) AMD RX 6800 - Asus Tuf
Storage Kingston KC3000 1TB & 2TB & 4TB Corsair MP600 Pro LPX
Display(s) LG 27UL550-W (27" 4k)
Case Be Quiet Pure Base 600 (no window)
Audio Device(s) Realtek ALC1220-VB
Power Supply SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse Mionix Naos Pro
Keyboard Corsair Strafe with browns
Software W10 22H2 Pro x64
It seems that AMD may be slowly encroaching into areas where Intel likes to toot their horn. Currently, Intel have a significant advantage with AVX 512 load, and also in the area of AI. So I will not be surprise AMD will start attacking these "strongholds". But I do feel AVX 512 is quite unlikely to be used by most folks.

But the intel fanbois have been saying this is awesome and AMD sucks since they don't have it.......................but those were fanbois though....................
 
Joined
Feb 25, 2012
Messages
63 (0.01/day)
My friend and I were unable to try StarCitizen a week ago, because his CPU is lacking AVX which the game needs... Not a big deal, every CPU since 2012 has it, but he uses a first gen i7 which unfortunately do not...
Are Intel Pentiums or Celerons every CPU?
f.e. latest Pentium G6600 (LGA1200).
 
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Understandably, there are so much confusion about AVX-512, so some of us have to clear up a few things.

Like I said many times, wide SIMD is kind of stupid in CPUs
And like I've told you numerous times; there are different levels of parallelism. The main advantage of CPU based SIMD is the virtual zero overhead, and the ability to mix SIMD with other instructions seamlessly. This is a stark contrast to the GPU, where there is a huge latency barrier. So it depends totally on the workload whether it should be GPU accelerated or not, or perhaps even both.

I just hope AVX512 doesn't ruin the power consumption of these chips.
Even if the core loses a few hundred MHz, it will still complete far more work compared to earlier AVX-versions or non-SIMD implementations.

Cool, but what consumer applications actually USE AVX 512. Actually, what consumer applications use AVX to begin with? Games certianly dont, they use SSE4 at most. I guess there are some production applications? But it doesnt seem to help at all given AMD crushes intel in every production market out there.

It'd be nice to see wider adoption. These instruction sets are great but seemingly nobody uses them....
AVX-512 is so far limited to custom enterprise software and educational institutions, I'm not aware of any mainstream consumer software benefiting from it yet.

So if you're a buyer today, you should probably not care about it unless you do plan to use custom software which benefits from it, or you do applications development.

Keep in mind that we are talking about Zen 4 here, which is probably 1-1.5 years away, and will stay in the market for >2 years, so by then AVX-512 may be very much relevant. If suddenly Photoshop, Blender or ffmpeg starts using it, then suddenly it will matter for many, and people will not be going back.

There is also some interesting movement in the Linux ecosystem, where Red Hat has been pushing x86 feature levels to more easily compile Linux and related software for more modern ISA features. So within the next couple of years we should expect large Linux distros to ship completely compiled for e.g. x86-64-v3(Haswell and Zen) or x86-64-v4(e.g. Ice Lake and Zen 4). This will be huge for the adoption rate of AVX.

AVX(1)/AVX2 is already used in numerous applications which you are familiar with; Blender, ffmpeg, WinRAR, 7zip, Chrome, etc. So you are probably using it a lot without even knowing it. But the majority of applications, libraries, drivers and the OS itself is compiled with x86-64 and SSE2 at best, so basically lagging >17 years behind. There is a huge potential here for performance, lower latency and energy efficiency.

CPU-based SIMD is suitable for anything computationally intensive. But it's kind of hard to use though, but many pieces of software still gets some "free" performance gains just from enabling compiler optimizations. The real heavy workloads which uses AVX generally is hand-optimized using intrinsics, which is time consuming, but luckily for most programs, only a tiny fraction of the code base is performance critical.

What AVX-512 brings to the table, is obviously an increased vector width to 512-bits. But it also brings a new more flexible instruction encoding and many more operations, which means many more algorithms can be efficiently be implemented with AVX instead of application specific instructions. Hopefully the new push in GCC and LLVM for x86 feature levels will lead to more compiler optimizations for auto-vectorization, I believe there is a huge potential here.

Possibly alluding to the fact that there is AVX(1), then AVX2 (256); AVX-512 would be AVX3 in that case.
I believe Intel screwed up with this naming scheme;
AVX(1) - Partial 128-bit and 256-bit, mostly a "small" extension of SSE 4.
AVX2 - Fully 256-bit and adds the very useful FMA. (I would argue this should have been the first AVX version)
AVX-512 - Fully 512-bit and more flexible

It must suck extra hard working for Intel today after reading this news.
I'm pretty sure Intel want's their ISA to be adopted. Right now the best argument against AVX-512 is AMD's lack of support.
 
Joined
Apr 24, 2020
Messages
2,741 (1.59/day)
AVX-512 is so far limited to custom enterprise software and educational institutions, I'm not aware of any mainstream consumer software benefiting from it yet.

To be fair, that's the right strategy.

AVX512 is a major change to how AVX was done. With opcode masks and other features, Intel likely was gathering data for how programmers would use those AVX512 features. Originally AVX512 was for the Xeon Phi series, but its important to "test" and see who adopts AVX512 in the server/desktop market someway first.

Intel can now remove extensions that aren't being used, as well as focus on which instructions need to be improved. In particular, Intel clearly has noticed the "throttling" issue in Skylake X and has fixed it (which was probably the biggest criticism of AVX512).

-----

You really want these issues to be figured out BEFORE mainstream programmers start relying upon those instructions. Once the mainstream starts using the instructions, you pretty much can never change them again.


1614626130597.png
 
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
To be fair, that's the right strategy.

AVX512 is a major change to how AVX was done. With opcode masks and other features, Intel likely was gathering data for how programmers would use those AVX512 features. Originally AVX512 was for the Xeon Phi series, but its important to "test" and see who adopts AVX512 in the server/desktop market someway first.

Intel can now remove extensions that aren't being used, as well as focus on which instructions need to be improved. In particular, Intel clearly has noticed the "throttling" issue in Skylake X and has fixed it (which was probably the biggest criticism of AVX512).

-----

You really want these issues to be figured out BEFORE mainstream programmers start relying upon those instructions. Once the mainstream starts using the instructions, you pretty much can never change them again.
I agree.
AVX-512 did emerge from requirements from researchers and enterprises, contrary to popular belief which seems to think it's something Intel concocted to show off in benchmarks.

I think Intel made a very smart move by making it modular, both to make it easier to implement for different segments, to make it easier for AMD to implement, and to evolve it further over time. It's very hard to figure out a balanced and well featured ISA revision without someone using it for a while.

Intel is probably at fault for the slow adoption rate of AVX-512. If Ice Lake desktop had released three years ago, we would possibly have seen some software by now.
 
Joined
Oct 12, 2005
Messages
720 (0.10/day)
Like I said many times, wide SIMD is kind of stupid in CPUs, I just hope AVX512 doesn't ruin the power consumption of these chips.
I don't think it will if it's not used but it might add a lot of silicon real eastate for something the end users might rarely use.

But there are still case where wide simd on CPU is a good idea, like super large simulation that require a lot of ram and wide SIMD. Some of these machine will be able to have 192 core/384 Thread with Terabytes of memory were largest GPU are very far from that.

I think AMD is not interested to have multiple CPU SKU. They will just create one for all Desktop Ryzen without APU, Threadripper and EPYC.
 
Joined
Jan 8, 2017
Messages
9,536 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Honestly, there's just a ton of advantages to supporting 512-bit, especially when you consider all the possible designs AMD can do here. There's really no reasons NOT to support 512-bit.

I know there are advantages but a "ton" is a gross exaggeration. The problem I have with SIMD in modern CPUs is that it's an archaic approach to solving data parallelism in hardware and adding wider and wider extensions with a million new instructions seems really inefficient and stupid to me, I wish there was more innovation.

If you really want the ability to have fast and robust vector-type processing maybe they could come up with something along the lines of a modified GPU compute unit without any of the graphics hardware with an open universal ISA, AMD/Intel could pull something like this off and make heterogenous computing an actual reality. You can't scale stuff like AVX easily, it just doesn't work look how dog slow AVX512 adoption is.

If this stuff really was critical the industry would have demanded it to be implemented everywhere from day one, but they haven't.
 
Last edited:
Joined
Apr 24, 2020
Messages
2,741 (1.59/day)
I know there are advantages but a "ton" is a gross exaggeration. The problem I have with SIMD in modern CPUs is that it's an archaic approach to solving data parallelism in hardware and adding wider and wider extensions with a million new instructions seems really inefficient and stupid to me, I wish there was more innovation.

But these new instructions are in fact, new and different.

4x parallel AES seems stupid when you first see it, but suddenly AES-GCM mode becomes popular and lo-and-behold, 4x128 bit parallel AES streams on a 512b instruction suddenly makes sense. And while the ARM-guys are stuck in 128-bit land, Intel benefits from 4x-parallel (and AMD Zen3 already benefits from the 2x-parallel) versions of those instructions. Turns out there's an AES mode-of-operation that actually benefits from that.

IMO, Intel needs to spend more time studying GPUs and implement a proper shuffle-crossbar (like NVidia and AMD's permute / bpermute instructions). Intel is kinda-sorta getting there with pshufb but its not quite as flexible as GPU-assembly instructions yet. Even then, AMD GPUs still opt for a flurry of new modes-of-operations for their shuffles (the DPP instructions: butterfly shuffle, 4x shuffle, etc. etc.). It seems like a highly-optimized (but non-flexible) shuffle is still ideal. (The Butterfly shuffle deserves its own optimization because of its application to the Fast Fourier Transform, and it seems to be a pattern in sorting networks, and map/reduce pattern).

Getting a faster butterfly shuffle, rather than a "generic shuffle" has proven itself useful in GPU-land. To the point that it requires its own instruction. Not that AVX512 implements butterfly shuffle yet (at least, last time I checked. AVX512 keeps adding new instructions...)... but... yeah. There's actually a lot of data-movement instructions at the byte-level that are very useful, and are still being defined today. See NVidia "shfl.bfly.b32" PTX instruction: that butterfly shuffle is very important!!

-------

AVX512 restarts SIMD instructions under the opcode-mask paradigm, finally allowing Intel to catch up to NVidia's 2008 era technology. Its... a bit late. But hey, I welcome AVX512's sane decisions with regards to SIMD-ISA design. Intel actually has gather/scatter instructions (though not implemented quickly yet), and has a baseline for future extensions (which I hope will serve as placeholders for important movement like butterfly shuffles). Intel needs to catch up: they are missing important and efficient instructions that NVidia figured out uses for and already deployed.

If anything: Intel is still missing an important set of data-movement instructions. Intel needs more instructions, not fewer instructions.
 
Last edited:
Joined
Jul 13, 2016
Messages
3,383 (1.09/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) JDS Element IV, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse PMM P-305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
lmao. catching up to Intel. it sure took amd long enough. DECADES in fact, with of course the Iranian bail out (massive stock buy in when it was 3 dollars per share) to save them.

Decade, not decades.

I would call it recovering, not catching up. Given that Intel only took the lead in the first place by paying OEMs to not carry AMD products, AMD isn't so much catching up as it is recovering from Intel's illegal market manipulation. The amount AMD got in court from Intel for cornering the market is paltry compared to the damages Intel caused to the market and to AMD.

In any case, everyone should be thankful that the market actually has competition. I remember some people actually believed Intel's marketing BS that 10% or greater IPC increases were a thing of the past and that TIM was superior to solder. Longer lasting my behind, I've had zero Intel 2000 series CPUs that have had issues with their solder, I've had to delid over a dozen 4000 series CPUs because the TIM lost contact with the IHS.
 
Joined
Feb 25, 2012
Messages
63 (0.01/day)
And while the ARM-guys are stuck in 128-bit land, Intel benefits from 4x-parallel (and AMD Zen3 already benefits from the 2x-parallel) versions of those instructions.
Wait. But you can issue 128-bit instruction four times and if you have four 128-bit ALU you get the same result. There is no need to extend an ISA, if an ISA is modern and effective. ARM can issue 4 vector instructions, but x86 can't. It's explains why ancient architecture becomes more ugly.
"stucked" ARM conquers the world. The leading x86 is dying out.
 
Joined
Apr 24, 2020
Messages
2,741 (1.59/day)
Wait. But you can issue 128-bit instruction four times and if you have four 128-bit ALU you get the same result. There is no need to extend an ISA, if an ISA is modern and effective. ARM can issue 4 vector instructions, but x86 can't. It's explains why ancient architecture becomes more ugly.
"stucked" ARM conquers the world. The leading x86 is dying out.

First: ARM requires TWO-instructions (AESE and AESMC) to do the same work as one x86 AESenc instruction

Second: That means you need EIGHT instructions (AESE AESMC) x 4 to do the same work as one 512-bit AESenc instruction.

Third: Intel can perform 4x instructions per clock tick. Which means in one clocktick, it can issue the 512-bit AESenc instruction AND 3 other instructions (add, multiply, whatever). In contrast, Apple M1 needs to spend all of its 8x decoder on that singular 512-bit operation.

Fourth: Intel / AMD are 4GHz processors executing these things every 0.25 nanoseconds. Apple ARM M1 is a 2.8GHz processor closer to 0.35ish ns.

Its pretty clear to me that the Intel / +(future) AMD AVX512 bit aesenc instruction is superior and better designed for this situation actually, compared to ARM's AESE + AESMC pair. Now I recognize that ARM systems are optimized to macro-op fuse AESE+AESMC pairs (so those 8-instructions only need 4-pipeline-cycles to execute), but I'm pretty sure that'd still take up the slot in the ARM decoder.

ARM's decision is cleaner in theory: AESE is all you need for the final iteration, while x86 requires a 2nd instruction, AESENCLAST for the final iteration. But in terms of performance, the x86 approach is clearly faster and better designed IMO, though probably uses more transistors. But more-and-more transistors are being given to encryption these days thanks to HTTPS's popularity, so intel's "more transistors" but more efficient design is superior over ARM's original design.
 
Last edited:
Joined
Sep 26, 2012
Messages
871 (0.19/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
My problem with AVX 512 is every implementation seen to date has the rest of the chip throttling heavily whilst running an AVX workload.

I'm not exactly sold on the requirement for large and wide SIMD on a CPU rather than something more appropriate for the package, but I'm happy to be proven wrong.
 
Joined
Feb 25, 2012
Messages
63 (0.01/day)
First: ARM requires TWO-instructions (AESE and AESMC) to do the same work as one x86 AESenc instruction

Second: That means you need EIGHT instructions (AESE AESMC) x 4 to do the same work as one 512-bit AESenc instruction.

Third: Intel can perform 4x instructions per clock tick. Which means in one clocktick, it can issue the 512-bit AESenc instruction AND 3 other instructions (add, multiply, whatever). In contrast, Apple M1 needs to spend all of its 8x decoder on that singular 512-bit operation.

Fourth: Intel / AMD are 4GHz processors executing these things every 0.25 nanoseconds. Apple ARM M1 is a 2.8GHz processor closer to 0.35ish ns.
1-2. How can AESE/MC pair discard the fact that the recent x86 cannot issue more than one AESENC?
Two RISC instructions per one AES round are irrelevant. ARM A78 can decode four, ARM X1 can take five, IBM Power9 can chaw eight. And nobody can give guarantee that ARM will (not) implement fused instruction.

3. Recent Intel's CPU can decode up to 5 instruction per clock, but 16-byte decode window and only one complex instruction per clock turn your dream into a fart.
Do you think this might have forced Intel to make the AVX-512?

4. Try to look at the situation from the point of view of energy efficiency. You can't compare architectures with different goals. If there is a real need for a high-performance ARM core, then just wait.

Its pretty clear to me that the Intel / +(future) AMD AVX512 bit aesenc instruction is superior and better designed for this situation actually, compared to ARM's AESE + AESMC pair. Now I recognize that ARM systems are optimized to macro-op fuse AESE+AESMC pairs (so those 8-instructions only need 4-pipeline-cycles to execute), but I'm pretty sure that'd still take up the slot in the ARM decoder.
cheap RISC decoder VS "superior" x86 monster that chokes on a single 9-byte instruction. Nice.

ARM's decision is cleaner in theory: AESE is all you need for the final iteration, while x86 requires a 2nd instruction, AESENCLAST for the final iteration. But in terms of performance, the x86 approach is clearly faster and better designed IMO, though probably uses more transistors. But more-and-more transistors are being given to encryption these days thanks to HTTPS's popularity, so intel's "more transistors" but more efficient design is superior over ARM's original design.
I think ARM had found a balance between decoder and ALU.
And you can not guarantee that ARM will (not) implement fused instruction.
 
Joined
Oct 22, 2014
Messages
14,179 (3.80/day)
Location
Sunshine Coast
System Name H7 Flow 2024
Processor AMD 5800X3D
Motherboard Asus X570 Tough Gaming
Cooling Custom liquid
Memory 32 GB DDR4
Video Card(s) Intel ARC A750
Storage Crucial P5 Plus 2TB.
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Eweadn Mechanical
Software W11 Pro 64 bit
Did somebody say KFC?
 
Top