• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Lunar Lake Technical Deep Dive

Wondering why they didn't discuss the upcoming launch of desktop chips...
 
That claim for equivalent IPC is probably for low clocks. Skymont is unlikely to clock as high as Raptor Cove.

IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).

Given that they are using TSMC's N3 process for Lunar Lake, the claim of equivalent IPC is plausible.

IPC doesn't depend on whether the process is 5nm or 3nm.
 
IPC is a number that is independent from clock speed (because you are dividing by clock speed when computing IPC).



IPC doesn't depend on whether the process is 5nm or 3nm.
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors. Average IPC is also a property of the microarchitecture and it's influenced by the node as a denser node allows the designer to spend more transistors on branch prediction and wider structures. Skymont is an outstanding example of this: using the latest TSMC process allowed Intel to create a much wider core than they could have with Intel 7 or even Intel 4.
 
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.

If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.
 
Average IPC for most non-trivial programs is affected by DRAM latency which is higher (in terms of clock cycles) for higher clocked processors.
Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.
 
Higher clocked CPUs typically test lower AIDA latency with the same memory speed/timings. E.g. A 7700X will post lower memory ns than a 7800X3D with identical memory and timings because it clocks higher.
I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.
 
I think I was unclear. I am referring to DRAM latency in terms of CPU clock cycles. A Raptor Cove core clocking close to a hypothetical 4 GHz will see lower effective DRAM latency than one clocked at 6 GHz despite the higher latency in ns.

You can check whether GHz affects IPC in Linux with "perf stat -- program" by forcing your CPU into powersave mode. My Ryzen desktop CPU runs at 2.2 GHz when in powersave mode and at about 4.5 GHz in performance mode (single thread).
 
If the program was affected by DRAM latency or bandwidth then it *already* would be having an IPC lower than 0.5. Most programs are slowed down by DRAM latency and bandwidth by only a few percent if DRAM latency and bandwidth changes by a factor of 200%. In other words: caches work well.
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

1717525255597.png
 
Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!
 
I think you are missing the fact that DRAM latency (in nanoseconds) has been approximately the same since the first DDR introduced in year 1998: DDR, DDR2, DDR3, DDR4 and DDR5 have approximately the same latency in nanoseconds!
I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs. My claim was about the effective access time for a load which depends upon the latency of various caches as well as DRAM and the hit rates at each level.
 
I didn't claim that latency of DRAM has gone down, and in any case, that has nothing to do with the fact that misses to DRAM affect the average IPC for many programs.

You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?

Caches work well, but even with large caches there are enough misses to main memory that the impact of DRAM latency is non-negligible. This is the reason why the stacked cache Ryzens do so well in gaming; they reduce effective memory latency by reducing the number of misses to DRAM. On the other hand, the regular Zen 4 SKUs, despite clocking higher, see lower IPC.

View attachment 350085

From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.
 
You were, by extension, claiming that CPUs in year 1998 must have had much higher IPC than CPUs in year 2024! No?



From this chart, it is impossible to conclude whether it is DRAM latency or DRAM bandwidth, or both equally. L3 cache has lower latency than DRAM and higher bandwidth than DRAM.

Thus, the above chart doesn't prove your claim that DRAM latency matters for IPC.
No, I didn't claim that CPUs in 1998 had higher IPC than CPUs in 2024. My claim is that the same CPU, when restricted to a lower clock speed, will have slightly higher IPC for programs with working sets larger than the last level cache. Previous studies have demonstrated that it's DRAM latency that's more impactful than bandwidth for most programs. On the other hand, there are some scientific workloads that benefit from higher bandwidth. In any case, this discussion isn't really pertinent to the Lunar Lake deep dive so if you wish to continue it, we can PM each other.
 
How is this any more expensive then any other CPU designer that doesn't own their own FABs? If anything it should be the same price or cheaper with Intel doing it's own packaging. While not ideal neither is Intel's current node or that nodes capacity.
This is business 101. The less parts in a BOM that doesn't include other company profits the cheaper. Any parts fabbed at third party fabs includes their profits in the pricing. For example, I make my own cakes in my kitchen and sell them versus I pay a bakery company to make my own cakes and then I sell them.

Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).
Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.
 
Or Intel drops P-cores and only sells E-cores now that the performance is getting closer.

P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.
 
Last edited:
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O
 
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O
No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.
 
For some reason probably among the biggest has been missed ~ AMD/Intel will compete on the same nodes for the first time ever! Probably the truest test of their uarch, head to head o_O

AMD stays on the old node.

AMD: CCD: TSMC N4; IOD: TSMC N6
intel: TSMC N3B + TSMC N6

The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.

No, Lunar Lake is based on 3nm (TSMC) vs AMD Strix on 4nm. Intel will have an inherent efficiency advantage in the process.
It will be fun if they continue to lose in this regard to AMD.

Or the other option is that intel strikes back and releases a new Conroe o_O

P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.

They compare the 2024 E-core which is two generations newer than the 2021 P-core which is still the latest released in retail desktop CPUs.
 
Last edited:
The Xe-LPG iGPU powering graphics for Meteor Lake is impressive enough, as it beat the Radeon 780M RDNA 3 iGPU of competing Ryzen 7040 Phoenix processors of its time; and here, Intel is promising a 50% generational gain in iGPU performance on the back of the new Xe2 Battlemage architecture,
50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment L3JjHnTd4H8ZBfvLVQRJTG-1200-80.jpg.webp

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.
 
Zen 6 is definitely not next year.
Pretty confident AMD will release at least some version of zen6 next year, even if it's just a paper launch. Between Apple, QC, Intel & other ARM entrants like Nvidia AMD is again at a significant risk. They certainly wouldn't want to hand over the laptop market on a platter by sitting on their laurels like Intel a decade back. That would be their second biggest mistake after Dozer :slap:

Remember they pushed up Zen launch by half a year(1Q?) just because they wanted to get it out early.
 
Ok not exactly this gen but maybe zen5+ or zen6 next year?

Actually, they already compete on the same nodes.
Intel's Meteor Lake (Redwood Cove + Crestmont) is Intel 4 + TSMC N6 + TSMC N5
AMD's Raphael (Zen 4) is CCD: TSMC N5 & IOD: TSMC N6
 
Yeah I'm not keeping a close track of various flavors of 7/5/4/3/2 nm either, it's just marketing speak. The key point is they're all on TSMC ~

What's important:

N5 to N3 = 10-15% more performance | N5 to N3E = 18% more performance

N3E to N3P = 5% more performance | N3P to N3X = 5% more performance
N3E to N2= 10-15% more performance
N3E to N2P = 15-20% more performance

N2P to A16 = 8-10% more performance

-----------------------------------

N5 to N3X = 28% more performance
 
Thanks for the overview but,
You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction, while the P-cores are developed by "adding things" to a typical core, and are a product of addition. This is why Intel has a vast canvas to add capabilities and performance to its E-cores; and this is precisely why Skymont is a breakthrough.
What the heck are you saying? E cores aren't "P cores removed". They are a product of a well-executed and balanced team. 3x the core size difference with higher power efficiency and 14% gap between them should tell you. The fact that they are able to perform that well should be applauded, because they did it with that kind of a constraint but achieved very close.

The conclusion is a big disappointment.

I know from people who have insider info they said the P core IDC team is in shambles.

@atomsymbol
Skymont E-cores having the same IPC as Lunar Lake P-cores is most likely a temporary phenomenon. When Intel P-cores adopt the same front-end architecture as Zen5 and Skymont, P-cores will once again be outperforming E-cores in IPC by a significant margin. Note: Zen5 and Skymont actually have similar front-ends on a conceptual level, the only differences being (1) the lack of µop cache in Skymont compared to Zen5 and (2) Zen5 can fetch up to 2 basic blocks while Skymont can fetch up to 3 basic blocks (see https://en.wikipedia.org/wiki/Basic_block).
No. Just no. That's not why they did it.
 
Back
Top