Intel Lunar Lake Technical Deep Dive

atomsymbol · Jun 4, 2024

DavidC1 said:
No. Just no. That's not why they did it.

*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?

Daven · Jun 4, 2024

atomsymbol said:
P-cores (6 GHz) have 50% higher performance than E-cores (4 Ghz) if both have the same IPC (6/4=1.5).

Note: The slide comparing RaptorCove IPC with Skymont IPC says "Fixed frequency (iso)" which means that the P-core was running at about 4 GHz in that particular test.

According to TPU tests, RPL E cores run at 4.4 Ghz. I assume higher clocks on Arrow Lake.

atomsymbol · Jun 4, 2024

Daven said:
I assume higher clocks on Arrow Lake.

Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.

SL2 · Jun 4, 2024

DavidC1 said:
E cores aren't "P cores removed".

It never says so. Maybe you should read it again. :roll:

You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction,

while the P-cores are developed by "adding things" to a typical core, and are a product of addition.

^^These are two different subjects, the comma and the "while" should give you a hint.

I think your conclusion is a bigly disappointment.

I interpret the E core development as similar to the Pentum M development, ie removing things to get where you want to go. No, I don't have that quote from Intel.

DavidC1 · Jun 5, 2024

SL2 said:
It never says so. Maybe you should read it again.

It's basically what it says. Considering that back with the very first Atom(in-order) they tried new ideas, and did it again and again, while recent cores like Tremont, Gracemont, Skymont is wider and bigger in structures than the P core, it's really a bad conclusion. The P core team pretty much stalled since Sandy Bridge.

The Intel cores were criticized by many architects for many, many generations for having tiny L1 caches and little fetch bandwidth. It continues to today. The E cores surpassed those limits back in Gracemont. The P core team is basically expand, expand, expand. That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.

Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.

Yet it is according to a deleted leak that says 5.7GHz top Turbo for P, 5.4GHz all core, and 4.6GHz for Skymont on Arrowlake. This core is going to have big ramifications not just for Intel, but based on the Zen 5 reveal, AMD too.

atomsymbol said:
*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?

You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.

Again, it was the E core team that brought the revolutionary clustered decode design. Saying it is done by "removing things" is doing the team a diservice, because it's going to kick ass.

Darmok N Jalad · Jun 5, 2024

Intel didn't have much choice but to really up their game with the E cores, since Zen C cores give up much less in terms of features and performance. When AMD crammed that many C cores into a single server chip, that's when the big money writing on the wall showed up for Intel.

Still, I'm actually kinda excited to see this kind of effort from Intel. They had gotten pretty stale, but now they seem to be offering a pretty balanced mobile solution. Even if it's not the top performer, it has fewer weak points. Ironically, Apple has been doing 4P + 4E with no SMT, an NPU, and decent IGPU since 2020. No wonder Apple went their own way--it took Intel 4 years to get here.

atomsymbol · Jun 5, 2024

DavidC1 said:
You. I am referring to you,

You don't know me. How can you be referring to something you have little knowledge about?

Please refer to the text, not to people's minds.

DavidC1 said:
that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding

That clustered design in E-cores (and in Zen5) has very little to do with the complexity of x86 ISA decoding. Instead, it has to do with branch prediction.

DavidC1 said:
- where traditional increase results in quadratic rise in transistor usage in decoders

That exponential increase is already solved by µop caches (P-cores, Zen cores) and by the on-demand instruction length decoder in E-cores. AMD K8 (year 2003) already had an on-demand instruction length decoder and predecoded instructions were being stored in L1I cache just like in Skymont (if Skymont is retaining the OD-ILD from previous E-core designs)!

DavidC1 said:
And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Clustered decode is the way to go, but the primary reason is different from what you wrote.

DavidC1 said:
Skymont is better than both Lion Cove and Zen 5.

Skymont decode isn't universally better than Zen5. It is better than Zen5 only in a subset of scenarios.

Solaris17 · Jun 5, 2024

This was really well written thank you! Im honestly kind of excited for it. I like my current meteor lake laptop and I was impressed with the performance given what has come out of Intel pre meteor lake.

InVasMani · Jun 5, 2024

Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.

Minus Infinity · Jun 5, 2024

Daven said:
Last sentence in the conclusion:

“If you want to see Lion Cove, Skymont, Xe2 Battlemage, and NPU 4 in a more familiar package, you should look out for Arrow Lake, which not just covers other mobile form-factors, but also desktop.”

So where is Arrow Lake? Did Intel make one mention of it?

Arrow lake is not getting Xe2 at all, it's using Xe-Plus and tarted up Alchemist offering. Still going to be a piss weak iGPU and frankly pointless like AMD's piss weak RDNA3 2CU iGPU.

ARF said:
The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.

Not according to TSMC:

watzupken · Jun 5, 2024

I think I prefer to wait for official numbers to determine if this is a good chip. I do appreciate the fact that Intel is looking at more efficient CPUs, rather than the likes of Raptor Lake that draws obscene amounts of power at full load just to edge our competitors that are not that far off and using half or less than half of the power draw. But I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.

N/A · Jun 5, 2024

watzupken said:
I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.

Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.
The 4 E-cores take only 6% of the CPU tile that's 2.1mm2 per core. impressive. you could have a ton of them for free and it wouldn't hurt a fly.

Minus Infinity said:
Not according to TSMC:

And for the mixed bag consisting of 50% logic, 30% SRAM, and 20% analog that drops to 30% density and who knows what the actual mix is in CPUs.
N5 to N2 can't get more than 50%.

atomsymbol · Jun 5, 2024

N/A said:
Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.

NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.

JWNoctis · Jun 5, 2024

atomsymbol said:
NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.

Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

It's still remarkable that hardware offerings responded as quickly as they did, when the current local AI boom only took off as late as 2nd-half of 2022, (no) thanks to the likes of Stable Diffusion and LLAMA.

atomsymbol · Jun 5, 2024

JWNoctis said:
Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

I think a major obstacle might be that for a 3D game to behave in a natural way the 3D model of the NPCs would need to be in sync with the output of a large language model AI. I haven't seen anything like that anywhere yet, such a game engine seems not possible today nor in the near future, and I cannot imagine how to train an AI for such a scenario. OpenAI API key is pointless for this scenario because it doesn't output 3D models (nor 2D models). But if it is possible, somebody will eventually figure it out. Nevertheless, it is good to see that TPUs, albeit it is just an experimental technology, are becoming a standard part of PCs.

londiste · Jun 5, 2024

DavidC1 said:
That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.

Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.

DavidC1 said:
Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.

15% is a big difference though.
There seems to be a ceiling for IPC with a lot of things that have been treated as kind of "natural limits" - not going too wide, not going too complex in parts, widths of memory buses etc - that are being challenged to wring more performance out of architecture now that clock speeds are no longer increasing. Increasing caches has been a thing for a while, AMD's huge L3 caches seem to show there is a limit to its effectiveness in general purpose use. Widening is now constant, Apple went really wide in almost everything (and can get away with it largely thanks to their entire ecosystem being under their control). Apple went with wide memory buses and Intel seems to be following suit - others are likely to follow. And of course a bunch of other things.

But before getting stuck in trying to think of examples, the point I wanted to make was that the last 15% is hard. ARM, RISC-V and some other competitors are coming up fast because the path is known. Intel, AMD and others have already tried a bunch of stuff, found what works, what doesn't and why. New things are starting to crop up - Apple and M-s as the obvious example - but this is because easy wins are now depleted. And clock speeds are no longer increasing in mobile either.

DavidC1 said:
You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.

I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.

JustBenching · Jun 5, 2024

londiste said:
Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.
15% is a big difference though.

Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...

londiste · Jun 5, 2024

fevgatos said:
Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...

To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.

JustBenching · Jun 5, 2024

londiste said:
To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.

Yeah but the Pl2 last for 50 seconds or something. When you are actually using it, it drops to 35. Due to the IO die, that's just impossible for amd desktop chips.

kondamin · Jun 5, 2024

InVasMani said:
Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.

Yes, in articles like this links to basic explanations of terms would be nice.
its been decades since i had those classes and forgot a lot of them

atomsymbol · Jun 5, 2024

londiste said:
I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.

Skymont cannot "turn on 1/3 or 2/3 of decoders".

Denver · Jun 5, 2024

THANATOS said:
50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment 350096

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.

Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.

iNinja9K · Jun 5, 2024

Denver said:
Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.

More efficient, I agree, but "huge" difference in performance? I don't think so. Better? Yes, but not huge.

Daven · Jun 5, 2024

Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed

Denver · Jun 5, 2024

Daven said:
Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed

- Intel only showed slides that didn't say anything useful or exciting.
- Nvidia only talked about AI.
- Qualcomm maintained its smoke-and-mirrors approach.

I can't help saying that AMD, as the only company that actually showed benchmarks, was the best overall lol

Processor	Ryzen 9 9950X
Motherboard	X670 chipset
Cooling	Arctic Liquid Freezer III 240
Memory	64 GiB
Video Card(s)	RX 7800XT
Storage	WD Black SN750, Seagate FireCuda 530, Crucial BX500, WD Blue HDD, Seagate IronWolf HDD
Display(s)	Samsung (4K, FreeSync)
Case	Phanteks NEO Air
Power Supply	EVGA 750 B5
Mouse	Eternico wireless mouse
Keyboard	HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software	Linux + KVM

Processor	Ryzen 9 9950X
Motherboard	X670 chipset
Cooling	Arctic Liquid Freezer III 240
Memory	64 GiB
Video Card(s)	RX 7800XT
Storage	WD Black SN750, Seagate FireCuda 530, Crucial BX500, WD Blue HDD, Seagate IronWolf HDD
Display(s)	Samsung (4K, FreeSync)
Case	Phanteks NEO Air
Power Supply	EVGA 750 B5
Mouse	Eternico wireless mouse
Keyboard	HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software	Linux + KVM

System Name	Mac mini
Processor	Apple M1 8C
Motherboard	Mac mini logic board
Cooling	Mac mini cooler
Memory	16GB
Video Card(s)	M1 GPU
Storage	512GB
Display(s)	ASUS Pro Art 27"
Case	Mac mini enclosure
Power Supply	Apple 150W

Processor	Ryzen 9 9950X
Motherboard	X670 chipset
Cooling	Arctic Liquid Freezer III 240
Memory	64 GiB
Video Card(s)	RX 7800XT
Storage	WD Black SN750, Seagate FireCuda 530, Crucial BX500, WD Blue HDD, Seagate IronWolf HDD
Display(s)	Samsung (4K, FreeSync)
Case	Phanteks NEO Air
Power Supply	EVGA 750 B5
Mouse	Eternico wireless mouse
Keyboard	HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software	Linux + KVM

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid 5090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on Schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Wooting 80 HE White, Gateron Jades
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.

Intel Lunar Lake Technical Deep Dive

atomsymbol

Daven

atomsymbol

SL2

DavidC1

Darmok N Jalad

atomsymbol

Solaris17

Super Dainty Moderator

InVasMani

Minus Infinity

watzupken

N/A

atomsymbol

JWNoctis

atomsymbol

londiste

JustBenching

londiste

JustBenching

kondamin

atomsymbol

Denver

iNinja9K

New Member

Daven

Denver

Processor	E5-4627 v4
Motherboard	VEINEDA X99
Memory	32 GB
Video Card(s)	2080 Ti
Storage	NE-512
Display(s)	G27Q
Case	MATREXX 50
Power Supply	SF850L

System Name	Kuro
Processor	AMD Ryzen 7 7800X3D@65W
Motherboard	MSI MAG B650 Tomahawk WiFi
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s)	PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage	Crucial T500 2TB + WD Blue 8TB
Case	Lian Li LANCOOL 216
Power Supply	MSI MPG A850G
Software	Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores	17761 C23 Multi@65W

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Mean machine
Processor	AMD 6900HS
Memory	2x16 GB 4800C40
Video Card(s)	AMD Radeon 6700S