Intel Lunar Lake Technical Deep Dive - So many Revolutions in One Chip 115

Intel Lunar Lake Technical Deep Dive - So many Revolutions in One Chip

(115 Comments) »

Introduction

Intel Logo

Intel today unveiled its Core Ultra 200V mobile processor series, codenamed Lunar Lake. This processor marks a fundamental shift in the way Intel creates processors, through a high degree of aggregation, including memory-on-package (MoP). The Core Ultra 200V series targets a very specific kind of device—one that's thin-and-light, yet very capable to power the AI PC of today—including Microsoft's latest Copilot+ experiences—through on-device acceleration.

The PC industry is experiencing an explosion of client AI applications—from simple things like image manipulation, background blurring or live language translation, to complex content generative tasks. Microsoft decided that the NPU (neural processing unit) together with a certain amount of AI inferencing power, is what unlocks most of these AI experiences. CPU is a very inefficient way to accelerate AI, as are GPUs. All GPU vendors now integrate AI acceleration hardware with their main SIMD machinery, and for the kind of AI experiences Microsoft has planned, such as Windows Recall, which requires the AI to be always running; keeping a GPU running all the time would impose too much of a power cost. The answer is the NPU—a device that meets a flat performance target (in this case, the 40 TOPS required for Copilot+), at a significantly lower power draw than the CPU cores or the GPU.



Intel Core Ultra 200V Lunar Lake isn't just a processor, it is a system on a chip (SoC), there is no chipset beyond this package. There's no separate system memory, either—the processor comes with up to 32 GB of LPDDR5X memory. Intel has been making processors with hybrid CPU cores for years now. The idea behind hybrid has been to have two (or more) kinds of CPU cores that operate at different performance-per-watt bands so that the processor could better respond to software processing loads. The performance cores (or P-cores) are brought up to deal with intense compute workloads; while the efficiency cores (or E-cores) are prioritized to deal with most kinds of idle or low priority workloads. The Intel Thread Director is a hardware component that works the magic of making sure the right kind of CPU core deals with a given workload.

With Lunar Lake, Intel has updated the microarchitecture of all four key components of its SoC—the CPU compute complex introduces two new generations of CPU cores; the integrated graphics (iGPU) debuts a new graphics architecture; and the NPU has been both updated and supercharged to meet Copilot+ AI PC requirements. Besides these, there are many on-silicon updates Intel made. The decision to go with a memory-on-package design has to do with where the competition is—the Apple M3 and Qualcomm Snapdragon X Elite have tiny PCB footprints, and very tight power budgets, but are able to offer contemporary AI PC experiences in a thin-and-light form-factor. This is what Intel is after, and Lunar Lake builds on the innovation of both Meteor Lake and Lakefield series packaging and microarchitecture.

In this article, we bring you a technical deep-dive into the Core Ultra 200 MX processor, and the Lunar Lake microarchitecture diving it. As of this writing, Intel hasn't announced specific processor models. They probably will do so in their Computex reveal.

The Lunar Lake Package and Tiles


Lunar Lake sees a new way of organizing the various components of the SoC. It is essentially the Foveros technology behind Meteor Lake that's behind the processor's chiplet design, except Intel has moved around the various components. If you recall, Meteor Lake was a highly disaggregated processor, with the CPU cores located in the Compute Tile built on the Intel 4 node, the SoC and I/O tile on the 6 nm TSMC node, and the Graphics tile on the 5 nm node, all sitting on a 20 nm-class base tile that facilitates high-density microscopic wiring among the various tiles—like an interposer would.

With Lunar Lake, Intel has consolidated the Compute Tile, Graphics Tile, and most logic-heavy components of the former SoC tile, into a single Compute tile built on TSMC's 3 nm foundry node, while all the onboard controllers and I/O heavy components are disaggregated to the Platform I/O tile, which is built on the TSMC 6 nm process.


The base tile sits on the fiberglass substrate, which has fine high-density wiring to the two LPDDR5X memory chips that provide up to 32 GB of memory across two ranks, with a memory speed of LPDDR5X-8500. The presence of on-package memory lowers the memory physical-layer PHY power by 40% versus having memory chips on the motherboard or socketed as SO-DIMMs or CAMM2 modules.

The CPU Cores


Intel Core Ultra 200V isn't a direct successor to the entire Core Ultra 100 Meteor Lake family, but rather a new class of processors meant for thin-and-light notebooks—the same class that is powered by the Apple M3 or the Qualcomm Snapdragon Elite X. Given this, Intel has a very specific set of performance targets for its CPU, graphics, and AI acceleration performance, with the key driver being competitiveness in performance/Watt to the Apple and Qualcomm chips.

The Lunar Lake-MX CPU complex has a total of 8 CPU cores, four of these are the new Lion Cove performance cores (P-cores), and the other four are Skymont efficiency cores (E-cores). Unlike in Meteor Lake, or indeed all past generations of Intel hybrid processors, the P-cores and E-cores do not share an L3 cache or sit on a ringbus fabric. They do share the same die, and are part of the die's internal high-bandwidth fabric.


The four P-cores are part of a small ringbus network, with ring-stops along the four P-cores, and segments of a 12 MB L3 cache that's shared among the four. The E-core cluster, on the other hand, is an "island," much like the low-power island cores of Meteor Lake. The cluster's 4 MB L2 cache serves as the last-level cache for the four Skymont E-cores. Processing threads migrate between the P-core ring and the E-core island cluster seamlessly. Intel has made several improvements to Thread Director, which we'll get to in a bit.

Our Patreon Silver Supporters can read articles in single-page format.
Discuss(115 Comments)
Jan 2nd, 2025 13:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts