Intel Lunar Lake Technical Deep Dive

jallenlabs · Jun 4, 2024

Wondering why they didn't discuss the upcoming launch of desktop chips...

atomsymbol · Jun 4, 2024

AnotherReader said:
That claim for equivalent IPC is probably for low clocks. Skymont is unlikely to clock as high as Raptor Cove.

IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).

AnotherReader said:
Given that they are using TSMC's N3 process for Lunar Lake, the claim of equivalent IPC is plausible.

IPC doesn't depend on whether the process is 5nm or 3nm.

qcmadness · Jun 4, 2024

SL2 said:
Here are some numbers for Meteor.
View attachment 350047

For you information, one Zen 4c core with L1 / L2 cache is around 2.5 mm^2.

That means a 4-thread Zen 4c unit is smaller than 4-thread Redwood Cove cores.

AnotherReader · Jun 4, 2024

atomsymbol said:
IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).

IPC doesn't depend on whether the process is 5nm or 3nm.

Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors. Average IPC is also a property of the microarchitecture and it's influenced by the node as a denser node allows the designer to spend more transistors on branch prediction and wider structures. Skymont is an outstanding example of this: using the latest TSMC process allowed Intel to create a much wider core than they could have with Intel 7 or even Intel 4.

atomsymbol · Jun 4, 2024

AnotherReader said:
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.

If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.

dgianstefani · Jun 4, 2024

AnotherReader said:
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.

Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.

AnotherReader · Jun 4, 2024

dgianstefani said:
Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.

I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.

atomsymbol · Jun 4, 2024

AnotherReader said:
I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.

You can check whether GHz affects IPC in Linux with "perf stat -- program" by forcing your CPU into powersave mode. My Ryzen desktop CPU runs at 2.2 GHz when in powersave mode and at about 4.5 GHz in performance mode (single thread).

AnotherReader · Jun 4, 2024

atomsymbol said:
If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.

Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

atomsymbol · Jun 4, 2024

AnotherReader said:
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!

AnotherReader · Jun 4, 2024

atomsymbol said:
I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!

I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs. My claim was about the effective access time for a load which depends upon the latency of various caches as well as DRAM and the hit rates at each level.

atomsymbol · Jun 4, 2024

AnotherReader said:
I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs.

You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?

AnotherReader said:
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.

AnotherReader · Jun 4, 2024

atomsymbol said:
You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?

From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.

No, I didn't claim that CPUs in 1998 had higher IPC than CPUs in 2024. My claim is that the same CPU, when restricted to a lower clock speed, will have slightly higher IPC for programs with working sets larger than the last level cache. Previous studies have demonstrated that it's DRAM latency that's more impactful than bandwidth for most programs. On the other hand, there are some scientific workloads that benefit from higher bandwidth. In any case, this discussion isn't really pertinent to the Lunar Lake deep dive so if you wish to continue it, we can PM each other.

Daven · Jun 4, 2024

tfp said:
How is this any more expensive then any other CPU designer that doesn't own their own FABs? If anything it should be the same price or cheaper with Intel doing it's own packaging. While not ideal neither is Intel's current node or that nodes capacity.

This is business 101. The less parts in a BOM that doesn't include other company profits the cheaper. Any parts fabbed at third party fabs includes their profits in the pricing. For example, I make my own cakes in my kitchen and sell them versus I pay a bakery company to make my own cakes and then I sell them.

atomsymbol said:
Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).

Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.

atomsymbol · Jun 4, 2024

Daven said:
Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.

P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.

R0H1T · Jun 4, 2024

For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head

Denver · Jun 4, 2024

R0H1T said:
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head

No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.

ARF · Jun 4, 2024

R0H1T said:
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head

AMD stays on the old node.

AMD: CCD: TSMC N4; IOD: TSMC N6
intel: TSMC N3B + TSMC N6

The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.

Denver said:
No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.

Or the other option is that intel strikes back and releases a new Conroe

atomsymbol said:
P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.

They compare the 2024 E-core which is two generations newer than the 2021 P-core which is still the latest released in retail desktop CPUs.

R0H1T · Jun 4, 2024

Ok not exactly this gen but maybe zen5+ or zen6 next year?

THANATOS · Jun 4, 2024

The Xe-LPG iGPU powering graphics for Meteor Lake is impressive enough, as it beat the Radeon 780M RDNA 3 iGPU of competing Ryzen 7040 Phoenix processors of its time; and here, Intel is promising a 50% generational gain in iGPU performance on the back of the new Xe2 Battlemage architecture,

50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment L3JjHnTd4H8ZBfvLVQRJTG-1200-80.jpg.webp

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.

R0H1T · Jun 4, 2024

ARF said:
Zen 6 is definitely not next year.

Pretty confident AMD will release at least some version of zen6 next year, even if it's just a paper launch. Between Apple, QC, Intel & other ARM entrants like Nvidia AMD is again at a significant risk. They certainly wouldn't want to hand over the laptop market on a platter by sitting on their laurels like Intel a decade back. That would be their second biggest mistake after Dozer :slap:

Remember they pushed up Zen launch by half a year(1Q?) just because they wanted to get it out early.

ARF · Jun 4, 2024

R0H1T said:
Ok not exactly this gen but maybe zen5+ or zen6 next year?

Actually, they already compete on the same nodes.
Intel's Meteor Lake (Redwood Cove + Crestmont) is Intel 4 + TSMC N6 + TSMC N5
AMD's Raphael (Zen 4) is CCD: TSMC N5 & IOD: TSMC N6

R0H1T · Jun 4, 2024

Yeah I'm not keeping a close track of various flavors of 7/5/4/3/2 nm either, it's just marketing speak. The key point is they're all on TSMC ~

TSMC's Roadmap at a Glance: N3X, N2P, A16 Coming in 2025/2026

www.anandtech.com

ARF · Jun 4, 2024

R0H1T said:
Yeah I'm not keeping a close track of various flavors of 7/5/4/3/2 nm either, it's just marketing speak. The key point is they're all on TSMC ~

TSMC's Roadmap at a Glance: N3X, N2P, A16 Coming in 2025/2026

www.anandtech.com

What's important:

N5 to N3 = 10-15% more performance | N5 to N3E = 18% more performance

N3E to N3P = 5% more performance | N3P to N3X = 5% more performance
N3E to N2= 10-15% more performance
N3E to N2P = 15-20% more performance

N2P to A16 = 8-10% more performance

-----------------------------------

N5 to N3X = 28% more performance

DavidC1 · Jun 4, 2024

Thanks for the overview but,

You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction, while the P-cores are developed by "adding things" to a typical core, and are a product of addition. This is why Intel has a vast canvas to add capabilities and performance to its E-cores; and this is precisely why Skymont is a breakthrough.

What the heck are you saying? E cores aren't "P cores removed". They are a product of a well-executed and balanced team. 3x the core size difference with higher power efficiency and 14% gap between them should tell you. The fact that they are able to perform that well should be applauded, because they did it with that kind of a constraint but achieved very close.

The conclusion is a big disappointment.

I know from people who have insider info they said the P core IDC team is in shambles.

@atomsymbol

Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).

No. Just no. That's not why they did it.

System Name	Ferrari Evolv X
Processor	Intel 14600k @ 5.7/4.5
Motherboard	Asrock Steel Legend Z690 D4
Cooling	Custom H20-XSPC Raystorm Pro, 2x 360mm rads with XSPC 120mm fans push/pull, Tube Res, Gen GPU block
Memory	GSkill 32GB 3600 cas 14 b die Trident Z RGB
Video Card(s)	EVGA 1660ti
Storage	Sk Hynix p41 Pro for OS/Apps, Optane 900p x2 in R/0 scratch/temp, SK Hynix p31 Pro storage
Display(s)	TCL 43" microled 4k HDR
Case	Phanteks Evolv X in Rosso Red with all XSPC 120mm fans x 9 and 1x 140mm (x15mm), Few 92/80mm too
Audio Device(s)	Schitt DAC, Presonus Studio Monitors and Subwoofer
Power Supply	Corsair 1000w Plat, custom cables
Mouse	Glorious Model O
Keyboard	Havit mech
Software	Win 11 Pro

Processor	Ryzen 9 9950X
Motherboard	X670 chipset
Cooling	Arctic Liquid Freezer III 240
Memory	64 GiB
Video Card(s)	RX 7800XT
Storage	WD Black SN750, Seagate FireCuda 530, Crucial BX500, WD Blue HDD, Seagate IronWolf HDD
Display(s)	Samsung (4K, FreeSync)
Case	Phanteks NEO Air
Power Supply	EVGA 750 B5
Mouse	Eternico wireless mouse
Keyboard	HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software	Linux + KVM

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	Ryzen 9 9950X
Motherboard	X670 chipset
Cooling	Arctic Liquid Freezer III 240
Memory	64 GiB
Video Card(s)	RX 7800XT
Storage	WD Black SN750, Seagate FireCuda 530, Crucial BX500, WD Blue HDD, Seagate IronWolf HDD
Display(s)	Samsung (4K, FreeSync)
Case	Phanteks NEO Air
Power Supply	EVGA 750 B5
Mouse	Eternico wireless mouse
Keyboard	HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software	Linux + KVM

System Name	Silent/X1 Yoga/S25U-1TB
Processor	Ryzen 9800X3D @ 5.4ghz AC 1.18 V, TG AM5 High Performance Heatspreader/1185 G7/Snapdragon 8 Elite
Motherboard	ASUS ROG Strix X870-I, chipset fans replaced with Noctua A14x25 G2
Cooling	Optimus Block, HWLabs Copper 240/40 x2, D5/Res, 4x Noctua A12x25, 1x A14G2, Conductonaut Extreme
Memory	64 GB Dominator Titanium White 6000 MT, 130 ns tRFC, active cooled, TG Putty Pro
Video Card(s)	RTX 3080 Ti Founders Edition, Conductonaut Extreme, 40 W/mK 3D Graphite pads, Corsair XG7 Waterblock
Storage	Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s)	34" 240 Hz 3440x1440 34GS95Q LG MLA+ W-OLED, 31.5" 165 Hz 1440P NanoIPS Ultragear, MX900 dual VESA
Case	Sliger SM570 CNC Alu 13-Litre, 3D printed feet, TG Minuspad Extreme, LINKUP Ultra PCIe 4.0 x16 White
Audio Device(s)	Audeze Maxwell Ultraviolet w/upgrade pads & Leather LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply	SF1000 Plat, 13 A transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse	Razer Viper V3 Pro 8 KHz Mercury White w/Pulsar Supergrip tape, Razer Atlas, Razer Strider Chroma
Keyboard	Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerL60 V2, TLabs Leath/Suede
Software	Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores	Legendary

Intel Lunar Lake Technical Deep Dive

TPU Proofreader