• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Lunar Lake Technical Deep Dive

Joined
Jun 2, 2014
Messages
536 (0.14/day)
Location
Midwest USA
System Name Core
Processor Intel 12700k @ 5.2/4.0
Motherboard ASRock z690 Steel Legend
Cooling Artic Cooling Freezer 420 AiO
Memory GSkill 64GB 3200 cas 14 b die
Video Card(s) ASRock Intel ARC a750
Storage Optane 900p x2, SK Hynix p41 Pro, p31 Pro
Display(s) ACER 250hz 1080p 25" IPS display/AOC 22" display
Case Phanteks p500a with all Arctic/Thermaltake fans
Audio Device(s) Focusrite interface, Presonus Studio Monitors and Subwoofer
Power Supply Seasonic 850w plat with cable mod cables
Mouse Logitech G502 Hero
Keyboard Corsair mech k65
Software Win 11 Pro
Wondering why they didn't discuss the upcoming launch of desktop chips...
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
That claim for equivalent IPC is probably for low clocks. Skymont is unlikely to clock as high as Raptor Cove.

IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).

Given that they are using TSMC's N3 process for Lunar Lake, the claim of equivalent IPC is plausible.

IPC doesn't depend on whether the process is 5nm or 3nm.
 
Joined
Nov 26, 2021
Messages
1,600 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).



IPC doesn't depend on whether the process is 5nm or 3nm.
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors. Average IPC is also a property of the microarchitecture and it's influenced by the node as a denser node allows the designer to spend more transistors on branch prediction and wider structures. Skymont is an outstanding example of this: using the latest TSMC process allowed Intel to create a much wider core than they could have with Intel 7 or even Intel 4.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.

If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
4,990 (2.00/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.
Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.
 
Joined
Nov 26, 2021
Messages
1,600 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.
I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.

You can check whether GHz affects IPC in Linux with "perf stat -- program" by forcing your CPU into powersave mode. My Ryzen desktop CPU runs at 2.2 GHz when in powersave mode and at about 4.5 GHz in performance mode (single thread).
 
Joined
Nov 26, 2021
Messages
1,600 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

1717525255597.png
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!
 
Joined
Nov 26, 2021
Messages
1,600 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!
I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs. My claim was about the effective access time for a load which depends upon the latency of various caches as well as DRAM and the hit rates at each level.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs.

You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?

Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.
 
Joined
Nov 26, 2021
Messages
1,600 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?



From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.
No, I didn't claim that CPUs in 1998 had higher IPC than CPUs in 2024. My claim is that the same CPU, when restricted to a lower clock speed, will have slightly higher IPC for programs with working sets larger than the last level cache. Previous studies have demonstrated that it's DRAM latency that's more impactful than bandwidth for most programs. On the other hand, there are some scientific workloads that benefit from higher bandwidth. In any case, this discussion isn't really pertinent to the Lunar Lake deep dive so if you wish to continue it, we can PM each other.
 
Joined
Dec 12, 2016
Messages
1,751 (0.61/day)
How is this any more expensive then any other CPU designer that doesn't own their own FABs? If anything it should be the same price or cheaper with Intel doing it's own packaging. While not ideal neither is Intel's current node or that nodes capacity.
This is business 101. The less parts in a BOM that doesn't include other company profits the cheaper. Any parts fabbed at third party fabs includes their profits in the pricing. For example, I make my own cakes in my kitchen and sell them versus I pay a bakery company to make my own cakes and then I sell them.

Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).
Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.

P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.
 
Last edited:
Joined
Apr 12, 2013
Messages
7,473 (1.77/day)
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O
 
Joined
Oct 6, 2021
Messages
1,605 (1.43/day)
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O
No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.69/day)
Location
Ex-usa | slava the trolls
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O

AMD stays on the old node.

AMD: CCD: TSMC N4; IOD: TSMC N6
intel: TSMC N3B + TSMC N6

The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.

No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.

Or the other option is that intel strikes back and releases a new Conroe o_O

P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.

They compare the 2024 E-core which is two generations newer than the 2021 P-core which is still the latest released in retail desktop CPUs.
 
Last edited:
Joined
Jan 24, 2011
Messages
179 (0.04/day)
The Xe-LPG iGPU powering graphics for Meteor Lake is impressive enough, as it beat the Radeon 780M RDNA 3 iGPU of competing Ryzen 7040 Phoenix processors of its time; and here, Intel is promising a 50% generational gain in iGPU performance on the back of the new Xe2 Battlemage architecture,
50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment L3JjHnTd4H8ZBfvLVQRJTG-1200-80.jpg.webp

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.
 
Joined
Apr 12, 2013
Messages
7,473 (1.77/day)
Zen 6 is definitely not next year.
Pretty confident AMD will release at least some version of zen6 next year, even if it's just a paper launch. Between Apple, QC, Intel & other ARM entrants like Nvidia AMD is again at a significant risk. They certainly wouldn't want to hand over the laptop market on a platter by sitting on their laurels like Intel a decade back. That would be their second biggest mistake after Dozer :slap:

Remember they pushed up Zen launch by half a year(1Q?) just because they wanted to get it out early.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.69/day)
Location
Ex-usa | slava the trolls
Ok not exactly this gen but maybe zen5+ or zen6 next year?

Actually, they already compete on the same nodes.
Intel's Meteor Lake (Redwood Cove + Crestmont) is Intel 4 + TSMC N6 + TSMC N5
AMD's Raphael (Zen 4) is CCD: TSMC N5 & IOD: TSMC N6
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.69/day)
Location
Ex-usa | slava the trolls
Yeah I'm not keeping a close track of various flavors of 7/5/4/3/2 nm either, it's just marketing speak. The key point is they're all on TSMC ~

What's important:

N5 to N3 = 10-15% more performance | N5 to N3E = 18% more performance

N3E to N3P = 5% more performance | N3P to N3X = 5% more performance
N3E to N2= 10-15% more performance
N3E to N2P = 15-20% more performance

N2P to A16 = 8-10% more performance

-----------------------------------

N5 to N3X = 28% more performance
 
Joined
May 25, 2022
Messages
114 (0.13/day)
Thanks for the overview but,
You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction, while the P-cores are developed by "adding things" to a typical core, and are a product of addition. This is why Intel has a vast canvas to add capabilities and performance to its E-cores; and this is precisely why Skymont is a breakthrough.
What the heck are you saying? E cores aren't "P cores removed". They are a product of a well-executed and balanced team. 3x the core size difference with higher power efficiency and 14% gap between them should tell you. The fact that they are able to perform that well should be applauded, because they did it with that kind of a constraint but achieved very close.

The conclusion is a big disappointment.

I know from people who have insider info they said the P core IDC team is in shambles.

@atomsymbol
Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).
No. Just no. That's not why they did it.
 
Top