• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Lunar Lake Technical Deep Dive

Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
No. Just no. That's not why they did it.

*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
I assume higher clocks on Arrow Lake.

Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.
 

SL2

Joined
Jan 27, 2006
Messages
2,410 (0.35/day)
E cores aren't "P cores removed".
It never says so. Maybe you should read it again. :roll:
You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction,
while the P-cores are developed by "adding things" to a typical core, and are a product of addition.
^^These are two different subjects, the comma and the "while" should give you a hint.

I think your conclusion is a bigly disappointment.

I interpret the E core development as similar to the Pentum M development, ie removing things to get where you want to go. No, I don't have that quote from Intel.
 
Joined
May 25, 2022
Messages
114 (0.13/day)
It never says so. Maybe you should read it again. :roll:
It's basically what it says. Considering that back with the very first Atom(in-order) they tried new ideas, and did it again and again, while recent cores like Tremont, Gracemont, Skymont is wider and bigger in structures than the P core, it's really a bad conclusion. The P core team pretty much stalled since Sandy Bridge.

The Intel cores were criticized by many architects for many, many generations for having tiny L1 caches and little fetch bandwidth. It continues to today. The E cores surpassed those limits back in Gracemont. The P core team is basically expand, expand, expand. That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.
Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.
Yet it is according to a deleted leak that says 5.7GHz top Turbo for P, 5.4GHz all core, and 4.6GHz for Skymont on Arrowlake. This core is going to have big ramifications not just for Intel, but based on the Zen 5 reveal, AMD too.
*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?
You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.

Again, it was the E core team that brought the revolutionary clustered decode design. Saying it is done by "removing things" is doing the team a diservice, because it's going to kick ass.
 
Last edited:
Joined
Mar 16, 2017
Messages
2,063 (0.74/day)
Location
Tanagra
System Name Budget Box
Processor Xeon E5-2667v2
Motherboard ASUS P9X79 Pro
Cooling Some cheap tower cooler, I dunno
Memory 32GB 1866-DDR3 ECC
Video Card(s) XFX RX 5600XT
Storage WD NVME 1GB
Display(s) ASUS Pro Art 27"
Case Antec P7 Neo
Intel didn't have much choice but to really up their game with the E cores, since Zen C cores give up much less in terms of features and performance. When AMD crammed that many C cores into a single server chip, that's when the big money writing on the wall showed up for Intel.

Still, I'm actually kinda excited to see this kind of effort from Intel. They had gotten pretty stale, but now they seem to be offering a pretty balanced mobile solution. Even if it's not the top performer, it has fewer weak points. Ironically, Apple has been doing 4P + 4E with no SMT, an NPU, and decent IGPU since 2020. No wonder Apple went their own way--it took Intel 4 years to get here.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
You. I am referring to you,

You don't know me. How can you be referring to something you have little knowledge about?

Please refer to the text, not to people's minds.

that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding

That clustered design in E-cores (and in Zen5) has very little to do with the complexity of x86 ISA decoding. Instead, it has to do with branch prediction.

- where traditional increase results in quadratic rise in transistor usage in decoders

That exponential increase is already solved by µop caches (P-cores, Zen cores) and by the on-demand instruction length decoder in E-cores. AMD K8 (year 2003) already had an on-demand instruction length decoder and predecoded instructions were being stored in L1I cache just like in Skymont (if Skymont is retaining the OD-ILD from previous E-core designs)!

And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Clustered decode is the way to go, but the primary reason is different from what you wrote.

Skymont is better than both Lion Cove and Zen 5.

Skymont decode isn't universally better than Zen5. It is better than Zen5 only in a subset of scenarios.
 
Last edited:

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,845 (3.83/day)
Location
Alabama
System Name RogueOne
Processor Xeon W9-3495x
Motherboard ASUS w790E Sage SE
Cooling SilverStone XE360-4677
Memory 128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 2TB WD SN850X | 2x 8TB GAMMIX S70
Display(s) Odyssey OLED G9 (G95SC)
Case Thermaltake Core P3 Pro Snow
Audio Device(s) Moondrop S8's on schitt Modi+ & Valhalla 2
Power Supply Seasonic Prime TX-1600
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software Windows 11 Pro Workstation
Benchmark Scores I dont have time for that.
This was really well written thank you! Im honestly kind of excited for it. I like my current meteor lake laptop and I was impressed with the performance given what has come out of Intel pre meteor lake.
 
Joined
Mar 21, 2016
Messages
2,508 (0.80/day)
Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.
 
Joined
May 3, 2018
Messages
2,881 (1.21/day)
Last sentence in the conclusion:

“If you want to see Lion Cove, Skymont, Xe2 Battlemage, and NPU 4 in a more familiar package, you should look out for Arrow Lake, which not just covers other mobile form-factors, but also desktop.”

So where is Arrow Lake? Did Intel make one mention of it?
Arrow lake is not getting Xe2 at all, it's using Xe-Plus and tarted up Alchemist offering. Still going to be a piss weak iGPU and frankly pointless like AMD's piss weak RDNA3 2CU iGPU.

The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.
Not according to TSMC:

1717549637394.jpeg
 
Joined
Mar 28, 2020
Messages
1,742 (1.04/day)
I think I prefer to wait for official numbers to determine if this is a good chip. I do appreciate the fact that Intel is looking at more efficient CPUs, rather than the likes of Raptor Lake that draws obscene amounts of power at full load just to edge our competitors that are not that far off and using half or less than half of the power draw. But I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.
 
Joined
Dec 31, 2020
Messages
968 (0.69/day)
I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.
Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.
The 4 E-cores take only 6% of the CPU tile that's 2.1mm2 per core. impressive. you could have a ton of them for free and it wouldn't hurt a fly.

Not according to TSMC:
And for the mixed bag consisting of 50% logic, 30% SRAM, and 20% analog that drops to 30% density and who knows what the actual mix is in CPUs.
N5 to N2 can't get more than 50%.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.

NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.
 
Joined
May 22, 2024
Messages
407 (2.50/day)
System Name Kuro
Processor AMD Ryzen 7 7800X3D@65W
Motherboard MSI MAG B650 Tomahawk WiFi
Cooling Thermalright Phantom Spirit 120 EVO
Memory Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s) PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage Crucial T500 2TB + WD Blue 8TB
Case Lian Li LANCOOL 216
Power Supply MSI MPG A850G
Software Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores 17761 C23 Multi@65W
NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.
Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

It's still remarkable that hardware offerings responded as quickly as they did, when the current local AI boom only took off as late as 2nd-half of 2022, (no) thanks to the likes of Stable Diffusion and LLAMA.
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

I think a major obstacle might be that for a 3D game to behave in a natural way the 3D model of the NPCs would need to be in sync with the output of a large language model AI. I haven't seen anything like that anywhere yet, such a game engine seems not possible today nor in the near future, and I cannot imagine how to train an AI for such a scenario. OpenAI API key is pointless for this scenario because it doesn't output 3D models (nor 2D models). But if it is possible, somebody will eventually figure it out. Nevertheless, it is good to see that TPUs, albeit it is just an experimental technology, are becoming a standard part of PCs.
 
Joined
Feb 3, 2017
Messages
3,720 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.
Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.
Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.
15% is a big difference though.
There seems to be a ceiling for IPC with a lot of things that have been treated as kind of "natural limits" - not going too wide, not going too complex in parts, widths of memory buses etc - that are being challenged to wring more performance out of architecture now that clock speeds are no longer increasing. Increasing caches has been a thing for a while, AMD's huge L3 caches seem to show there is a limit to its effectiveness in general purpose use. Widening is now constant, Apple went really wide in almost everything (and can get away with it largely thanks to their entire ecosystem being under their control). Apple went with wide memory buses and Intel seems to be following suit - others are likely to follow. And of course a bunch of other things.

But before getting stuck in trying to think of examples, the point I wanted to make was that the last 15% is hard. ARM, RISC-V and some other competitors are coming up fast because the path is known. Intel, AMD and others have already tried a bunch of stuff, found what works, what doesn't and why. New things are starting to crop up - Apple and M-s as the obvious example - but this is because easy wins are now depleted. And clock speeds are no longer increasing in mobile either.
You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.
I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.
 
Joined
Jun 14, 2020
Messages
3,275 (2.05/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.
15% is a big difference though.
Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...
 
Joined
Feb 3, 2017
Messages
3,720 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...
To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.
 
Joined
Jun 14, 2020
Messages
3,275 (2.05/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.
Yeah but the Pl2 last for 50 seconds or something. When you are actually using it, it drops to 35. Due to the IO die, that's just impossible for amd desktop chips.
 
Joined
Jan 11, 2022
Messages
825 (0.80/day)
Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.
Yes, in articles like this links to basic explanations of terms would be nice.
its been decades since i had those classes and forgot a lot of them
 
Joined
Mar 17, 2017
Messages
97 (0.03/day)
Location
Europe
Processor Ryzen 9 9950X
Motherboard X670 chipset
Cooling SPC Fera 5
Memory 64 GiB
Video Card(s) RX 6700XT
Storage WD Black SN750, Seagate FireCuda 530, Samsung SSD 850 Pro, WD Blue HDD, Seagate IronWolf HDD
Display(s) Samsung (4K, FreeSync)
Power Supply EVGA 750 B5
Mouse Eternico wireless mouse
Keyboard HyperX Alloy Origins Core Aqua with Corsair Onyx Black keycaps
Software Linux + KVM
I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.

Skymont cannot "turn on 1/3 or 2/3 of decoders".
 
Joined
Oct 6, 2021
Messages
1,605 (1.43/day)
50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment 350096

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.
Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.
 

iNinja9K

New Member
Joined
May 27, 2022
Messages
26 (0.03/day)
Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.
More efficient, I agree, but "huge" difference in performance? I don't think so. Better? Yes, but not huge.
 
Joined
Dec 12, 2016
Messages
1,751 (0.61/day)
Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed
 
Joined
Oct 6, 2021
Messages
1,605 (1.43/day)
Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed
- Intel only showed slides that didn't say anything useful or exciting.
- Nvidia only talked about AI.
- Qualcomm maintained its smoke-and-mirrors approach.

I can't help saying that AMD, as the only company that actually showed benchmarks, was the best overall lol
 
Top