Intel Core Ultra Arrow Lake Preview 159

Intel Core Ultra Arrow Lake Preview

Intel LGA1851 Platform & Z890 »

The CPU Cores of Arrow Lake

Lion Cove P-core


The new "Lion Cove" P-core replaces the "Raptor Cove" cores on the previous generation, as the main compute muscle of Arrow Lake. This is also the P-core on "Lunar Lake," and much like on that chip, the P-cores on Arrow Lake are arranged along a ring-bus with an L3 cache being shared among the P-cores. There is a slight difference, though, the dedicated L2 cache has been enlarged to 3 MB from 2.5 MB on the P-cores of Lunar Lake. The eight P-cores share a 36 MB L3 cache, compared to the 12 MB on "Lunar Lake."


Intel has come out with a new set of IPC gain claims for "Lion Cove." For "Arrow Lake," Intel claims a 9% IPC gain for "Lion Cove," compared to the "Raptor Cove" core of "Raptor Lake-S." In the "Lunar Lake" architecture reveal, the company claimed a 14% IPC gain, but that claim was made against the "Redwood Cove" P-core of a 15 W model of "Meteor Lake," since "Lunar Lake" is a one of a kind fully-integrated chip that has no predecessors except chips of a comparable power class from "Meteor Lake."

Much like on "Lunar Lake," the "Lion Cove" P-core of "Arrow Lake" lacks Hyper-Threading, Intel has physically removed the on-core components needed for SMT. This was done to reduce the size of the core, and with the idea that the IPC and energy-efficiency gains would overcome the need for SMT, together with physical threads on E-Cores. Having gained both die-area and power headroom from deprecating SMT, Intel moved on to overhauling the microarchitecture itself. There are improvements to all key components, including a redesigned front-end, with an 8 times larger branch prediction block. The fetch unit and decode bandwidth has been increased. The micro-op cache has increased in capacity, and Intel has introduced the concept of "nano-ops," which are groups of similar broken down micro-operation tasks that can be executed in tandem.


The Integer and Vector domains, which form the out-of-order execution engine, has now split, with individual access to the micro-op queue, and independent schedulers. The out-of-order engine sees the allocation rename/move elimination queue to now be 8-wide compared to 6-wide on Redwood Cove. The retirement queue is broadened 50% to 12-wide from 8-wide. The instruction window depth has increased from 512 to 576. Execution ports have gone up from 12 to 18.

The Integer ALUs have increased in number from 5 to 6, with 3 jump units instead of 2, 3 shift units instead of 2, and three MUL units instead of 1. The vector execution engine sees 4 SIMD ALUs instead of 3, two FMA units @ 4 cycle, and 2 divider units. The load-store subsystem sees 128 DTLB size, up from 96, and 3 STE address generators instead of 2.

Intel has also redone the core-level cache subsystem, with the introduction of an intermediate data cache between the 48 KB L1 and L2. The L1D cache is now referred to as the L0 D-cache, which retires to a 192 KB L1 D-cache. This talks to the L2 cache. On Arrow Lake, this core gets 3 MB (3,072 KB) of L2 dedicated L2 cache. Intel has deployed a new AI-based power management system for the P-cores, which is controlled by the SMU of the cores themselves. The CPU clock of the P-cores now have a fine 16.67 MHz granularity.

Skymont E-core


The new "Skymont" E-core was the single biggest talking point about "Lunar Lake." Intel achieved a 38% IPC uplift over the "Crestmont" E-cores of "Meteor Lake" in integer workloads, and a mammoth 68% IPC uplift over "Crestmont" in floating point workloads. There is a catch here. The E-cores on "Lunar Lake" are not part of a ring-bus with the P-cores (i.e. sharing their L3 cache), but separated out into a low-power island E-core cluster, and so Intel was making these IPC comparisons with the "Crestmont" low-power island E-cores of "Meteor Lake," located in the SoC tile. What's changed with "Arrow Lake" is that now all the "Skymont" E-core clusters are part of the Compute ring-bus, and share the L3 cache with the P-cores. The comparison is hence being made with the "Gracemont" E-cores of "Raptor Lake-S." All this said, the company claims a 32% IPC uplift over "Gracemont," which is still a mighty impressive generational gain.


This 32% IPC uplift of the "Skymont" E-cores on "Arrow Lake" play a crucial role in ensuring that the processor ends up posting generational gains in multithreaded productivity performance despite the P-cores no longer featuring SMT. The 16 E-cores on "Arrow Lake" are arranged in clusters of 4 cores each, and each cluster shares a 4 MB L2 cache among its cores. Intel claims to have doubled the bandwidth of this shared L2 cache compared to the "Gracemont" E-core clusters of "Raptor Lake-S."

Skymont's journey towards becoming something special begins with the fetch and branch prediction unit, which now looks 128 bytes ahead for possible branches, which speeds up the instruction fetch. Up to 96 instruction bytes are fetched in parallel. The front-end sees a new 9-wide decode (compared to 6-wide on Crestmont), support for nano-code (similar segments of microcode clumped together for greater parallelism), and a broader micro-op queue of 96 entries instead of 64 on the previous generation.


The out-of-order engine sees the meat of the updates. The allocation queue is 8-wide (up from 6-wide), and the retire queue is 16-wide (up from 8-wide). There is intelligence behind dependency breaking. The out-of-order window is broadened to 416 entries, up from 256, as are the physical register files, reservation stations, and load-store unit buffering.

There are 26 dispatch ports to the execution engine, leading to 8 integer ALUs, with 3 jump ports, and 3 loads per cycle (50% increase).

The vector engine features four 128-bit FPUs, doubling GigaFLOPS. The FMUL, FADD, and FMA latencies are reduced. FP rounding now sees native hardware acceleration. Additional execution units should also benefit AI performance. The load/store unit sees 33% to 50% increases in loads, and store address generation performance. The L2 TLB has increased in size to 4,192 entries, up from 3,096.

Intel 3rd Gen Thread Director


With "Arrow Lake," Intel is introducing the third generation of Thread Director, its hardware-based scheduler that ensures the right kind of software workload is handled by the right kind of CPU core. This new version comes with a more accurate hardware-based E-core performance feedback mechanism, which tells it the kind of E-core resources available. Intel also introduced new P-core performance telemetry for more accurate direction of threads to the P-cores. Intel also gave Thread Director its most accurate prediction model.


With "Skymont" cores having such significantly higher IPC, Thread Director on "Arrow Lake-S" prioritizes all non-gaming productivity workloads to the E-cores, and upgrades threads to the P-cores only as needed. Thread Director plays a significant role in improving the processor's overall power efficiency.
Next Page »Intel LGA1851 Platform & Z890
View as single page
Dec 22nd, 2024 02:04 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts