AMD Ryzen 9 5950X Review 66

AMD Ryzen 9 5950X Review

Core Layout & Platform »

Architecture

The Zen 3 Microarchitecture


Since its 2017 debut, AMD has delivered a new iteration of its groundbreaking "Zen" CPU microarchitecture each year, each with IPC improvements. As mentioned earlier, the new "Zen 3" microarchitecture claims to offer a massive 19 percent IPC uplift over "Zen 2," its predecessor. This is accomplished by improvements at both the micro and macro level. We already detailed the macro (beyond the core) changes above. In this section, we talk about what's new inside each core. AMD talks about updates to practically all key components of the core, including its front end, fetch/decode, the integer and floating-point components, load-store, and dedicated caches.


Modern processors execute multiple instructions in parallel to improve performance. Computer programs consist of huge amounts of "if ... then ... else" instructions, which slow down the processor because it has to evaluate the condition first, before picking a branch to execute. In order to overcome this limitation, the branch predictor was invented, which is a piece of circuitry that takes a guess on what's the more likely outcome of the condition check and just speculatively executes that branch's instructions. Of course, there's a chance that the prediction is wrong, in which case a performance penalty is incurred from undoing the executions that were already executed. With "Zen 3," AMD uses an improved TAGE branch predictor, which is more accurate and recovers faster from mispredictions. They also changed the design to be "bubble free," which avoids inserting "wait for result" instructions in the instruction stream whenever a branch is encountered.

AMD generally increased ops/cycle—the front end now switches faster between the op and instruction caches. The 32 KB L1 instruction cache has been tweaked for better utilization through efficient tagging and pre-fetching. Streamlining was done to the Op cache. Improvements to the branch predictor and front end add up to nearly a quarter of the overall 19% generational IPC uplift.


The execution engine, or combination of the integer and floating-point execution units, is the main math muscle of the CPU core. The "Zen 3" microarchitecture features improvements to both over "Zen 2." Both the INT and FP issue queues, which feed work to the two engines, have been widened, and the execution window enlarged. This ensures that fewer units are idle in typical programs, which increases overall performance.


AMD worked to minimize latencies at every stage of the INT execution engine, and enlarged its key structures, including the integer scheduler (96 entry vs. 92 on "Zen 2"), physical register file (192 vs. 180 on "Zen 2"), and 10 issues per cycle, up from 7 on "Zen 2." Data picker bandwidth has been significantly increased despite the same number of ALUs. The floating point engine features the same 256-bit FPUs, but just as with the INT engine, the FP engine has latency and bandwidth improvements across the board, a faster 4-cycle FMAC, and a larger scheduler. The INT and FP improvements contribute around a fifth of the 19% overall IPC uplift.


With the "Zen 3" microarchitecture, AMD addressed many bottlenecks and "intelligence" issues with the Load/Store unit. The biggest has to be bandwidth. The entry store queue has been widened to 64 from 48 on "Zen 2," the L2 cache DTLB is 2K entries wide. The 32 KB L1 data cache has been made faster, with lower latencies. Memory dependence detection has been improved. Much like the front-end and scheduler, the load/store improvements contribute nearly a quarter of the 19% overall IPC uplift, meaning that by just optimizing the non-execution components of its core, AMD managed to pull off a vast 9% overall IPC uplift.

ISA and Security Changes


Each new microarchitecture heralds support for newer instruction sets and security hardening, and the same is the case with "Zen 3." However, a notable absentee is AVX-512. Granted, Intel has adopted a less than perfect method of proliferating AVX-512, with certain instructions being exclusive to enterprise-segment microarchitectures and only a handful client-relevant instructions on its "Ice Lake" and "Tiger Lake" architectures, but there's no movement from AMD in this direction.

You still do get 256-bit instructions from within the AVX2 set. Also missing in action is something to rival Intel's DLBoost, which is essentially a software exposure of fixed-function hardware that accelerates matrix multiplication, in effect AI deep-learning neural net building and training. A lot of client applications, particularly image manipulation and video editing, are leveraging edge AI, and some investment from AMD on this would have been nice. That said, "Zen 3" adds two new ISA instructions, MPK (memory protection keys) and AVX2 support for AES/APCLMulQD. AMD has been ahead of Intel with CPU core security vulnerability perception, and with "Zen 3," AMD is introducing CET, or control-flow enforcement, which should provide hardening against ROP-type attacks.
Next Page »Core Layout & Platform
View as single page
Dec 23rd, 2024 21:07 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts