Intel Core 12th Gen Alder Lake Preview 162

Intel Core 12th Gen Alder Lake Preview

Intel Z690 Chipset & Platform »

The Performance Core (P-core) and Efficiency Core (E-core)


On the previous pages, we described how the two CPU core types that make up the hybrid architecture are laid out on the Alder Lake-S silicon. On this page, we'll talk a bit about the each of the two core types. Alder Lake is not Intel's first processor to feature Hybrid. That title goes to Lakefield, a pioneering mobile processor that featured one Sunny Cove P-core and four Tremont E cores. With Alder Lake, Intel is taking this concept forward. The processor has four Golden Cove P-cores that are two generations ahead of Sunny Cove, and Gracemont E cores, which are one generation ahead of Tremont.

Intel claims that Golden Cove provides a massive 19% IPC uplift over the Cypress Cove cores driving 11th Gen Rocket Lake, and a whopping 28% IPC uplift over 10th Gen "Comet Lake" (Skylake cores). The engineering miracle, however, has to be Gracemont. With roughly a quarter of the die-area of a Golden Cove unshackled for power and at the right clock-speed, Gracemont is able to closely trail Skylake in IPC.


The Gracemont E-core receives massive upgrades to all three key components—the front-end, execution, and load-store—in an attempt to bridge the ISA gap between it and the P-core. The front-end is upgraded by a double-size 64 KB L1 instruction cache, a more powerful branch-prediction unit, and two sets of triple out-of-order decoders. The out-of-order engine features a wide 256-entry OoO window and 17 execution ports for more parallelism. The execution stage sees a near 33% increase in both scalar and vector execution stages, as well as double the load store. The core shares up to 4 MB of L2 cache with four other cores that together talk to the 30 MB L3 cache on Alder Lake.


Much of Intel's efforts to improve the performance of the E-core seems to have to do with ISA coherence between the two core types. The Gracemont core supports AVX2 and AVX-VNNI (256-bit) instruction sets, something Intel's "little" cores aren't supposed to have. The net result of Intel's effort sees Gracemont achieve 40% more performance at ISO power than a Skylake core, or 40% less power at ISO performance. Intel strikes a balance between the two by using the right power and clock-speeds to achieve some semblance of performance parity between the E-core as deployed on Alder Lake, and a Skylake core.


The eight Golden Cove performance cores are the main number-crunching muscle of Alder Lake-S. Intel claims these have 28% higher IPC than the Skylake core, and an impressive 19% IPC gain over the Cypress Cove core from the 12th Gen. Since Alder Lake is built on Intel's latest silicon fabrication process, the company took an uncompromised approach to the P-core, giving it all of the latest hardware upgrades to work toward the IPC increase. The front-end of Golden Cove sees a double-sized instruction-TLB, a "smarter" branch predictor, and double-wide decode unit.


There are numerical increments to the decode unit, micro-op queue, and micro-op cache. The out-of-order (OoO) engine sees similar increments with 6-wide allocation and 12-wide execution ports, compared to 5-wide and 10-wide execution ports for Cypress Cove respectively. The execution stage sees the addition of a 5th execution port and ALU, FMA with FP16 support, and an updated fast adder (FADD). Similar improvements are made to the cache and memory sub-system. 1.25 MB is the size of the dedicated L2 cache for client versions of Golden Cove, and 2 MB for server/HEDT versions. The star attraction here is the new Matrix Execution engine, which is fixed-function hardware that accelerates matrix functions and together with the new AMX instruction set enables acceleration of neural network building/training and other tensor ops. An interesting change with Golden Cove is that Intel has removed AVX-512. There's no client-relevant truncated version of AVX-512, either. This is probably because there isn't much demand for 512-bit AVX right now, and Intel wants some ISA coherency between the P-core and E-core.

Hybrid Architecture and Intel Thread Director


Intel Thread Director is a highly specialized middleware that interfaces with the operating system and software on one side and the two groups of CPU cores on the other. Its job is to analyze a workload and assist the OS scheduler in distributing it among the P-core or E-core clusters at a granular level (both process-level and thread-level). Windows 11 is required for this symbiosis to work, but Windows 10 should still work somewhat well because Alder Lake also includes support for "preferred cores." Windows 11 also introduces the concept of quality of service (QoS) for software. This essentially lets applications tell the OS what the nature of the workload is to provide a hint for the Windows Scheduler on whether it deserves the resources of P-cores or could be relegated and even confined to E cores.

Thread Director monitors processor operation with nanosecond precision to accomplish this. There is mostly ISA coherency between the two core types; however, some processes may request features only found on the P-cores, such as AMX or DLBoost. Thread Director ensures that such processes are allocated only to the P-cores. The "dialog" between Thread Director and the OS scheduler also ensures that processes in the background, or idling, are relegated to E cores.
Next Page »Intel Z690 Chipset & Platform
View as single page
Dec 27th, 2024 14:17 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts