• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Lunar Lake Technical Deep Dive

No. Just no. That's not why they did it.

*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?
 
I assume higher clocks on Arrow Lake.

Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.
 
E cores aren't "P cores removed".
It never says so. Maybe you should read it again. :roll:
You have to understand that E-cores were developed by "removing things," from a typical core and are a frugal product of reduction,
while the P-cores are developed by "adding things" to a typical core, and are a product of addition.
^^These are two different subjects, the comma and the "while" should give you a hint.

I think your conclusion is a bigly disappointment.

I interpret the E core development as similar to the Pentum M development, ie removing things to get where you want to go. No, I don't have that quote from Intel.
 
It never says so. Maybe you should read it again. :roll:
It's basically what it says. Considering that back with the very first Atom(in-order) they tried new ideas, and did it again and again, while recent cores like Tremont, Gracemont, Skymont is wider and bigger in structures than the P core, it's really a bad conclusion. The P core team pretty much stalled since Sandy Bridge.

The Intel cores were criticized by many architects for many, many generations for having tiny L1 caches and little fetch bandwidth. It continues to today. The E cores surpassed those limits back in Gracemont. The P core team is basically expand, expand, expand. That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.
Assuming higher clocks on E-cores that are much wider than previous E-core designs is a questionable assumption.
Yet it is according to a deleted leak that says 5.7GHz top Turbo for P, 5.4GHz all core, and 4.6GHz for Skymont on Arrowlake. This core is going to have big ramifications not just for Intel, but based on the Zen 5 reveal, AMD too.
*Who* did *what*? Who/what are you referring to? There are multiple combinations - which combination do you mean?
You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.

Again, it was the E core team that brought the revolutionary clustered decode design. Saying it is done by "removing things" is doing the team a diservice, because it's going to kick ass.
 
Last edited:
Intel didn't have much choice but to really up their game with the E cores, since Zen C cores give up much less in terms of features and performance. When AMD crammed that many C cores into a single server chip, that's when the big money writing on the wall showed up for Intel.

Still, I'm actually kinda excited to see this kind of effort from Intel. They had gotten pretty stale, but now they seem to be offering a pretty balanced mobile solution. Even if it's not the top performer, it has fewer weak points. Ironically, Apple has been doing 4P + 4E with no SMT, an NPU, and decent IGPU since 2020. No wonder Apple went their own way--it took Intel 4 years to get here.
 
You. I am referring to you,

You don't know me. How can you be referring to something you have little knowledge about?

Please refer to the text, not to people's minds.

that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding

That clustered design in E-cores (and in Zen5) has very little to do with the complexity of x86 ISA decoding. Instead, it has to do with branch prediction.

- where traditional increase results in quadratic rise in transistor usage in decoders

That exponential increase is already solved by µop caches (P-cores, Zen cores) and by the on-demand instruction length decoder in E-cores. AMD K8 (year 2003) already had an on-demand instruction length decoder and predecoded instructions were being stored in L1I cache just like in Skymont (if Skymont is retaining the OD-ILD from previous E-core designs)!

And based on Tre/Grace/Sky's results, clustered decode is the way to go.

Clustered decode is the way to go, but the primary reason is different from what you wrote.

Skymont is better than both Lion Cove and Zen 5.

Skymont decode isn't universally better than Zen5. It is better than Zen5 only in a subset of scenarios.
 
Last edited:
This was really well written thank you! Im honestly kind of excited for it. I like my current meteor lake laptop and I was impressed with the performance given what has come out of Intel pre meteor lake.
 
Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.
 
Last sentence in the conclusion:

“If you want to see Lion Cove, Skymont, Xe2 Battlemage, and NPU 4 in a more familiar package, you should look out for Arrow Lake, which not just covers other mobile form-factors, but also desktop.”

So where is Arrow Lake? Did Intel make one mention of it?
Arrow lake is not getting Xe2 at all, it's using Xe-Plus and tarted up Alchemist offering. Still going to be a piss weak iGPU and frankly pointless like AMD's piss weak RDNA3 2CU iGPU.

The shrink from N5/N4 to N3 is larger and more substantial than the shrink from N7/N6 to N5/N4.
Not according to TSMC:

1717549637394.jpeg
 
I think I prefer to wait for official numbers to determine if this is a good chip. I do appreciate the fact that Intel is looking at more efficient CPUs, rather than the likes of Raptor Lake that draws obscene amounts of power at full load just to edge our competitors that are not that far off and using half or less than half of the power draw. But I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.
 
I still don't like the idea of Intel's P and E cores because as you can tell, Intel is charging consumers quite a fair bit for higher end chips with higher clockspeed and stupid amounts of E-cores.
Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.
The 4 E-cores take only 6% of the CPU tile that's 2.1mm2 per core. impressive. you could have a ton of them for free and it wouldn't hurt a fly.

Not according to TSMC:
And for the mixed bag consisting of 50% logic, 30% SRAM, and 20% analog that drops to 30% density and who knows what the actual mix is in CPUs.
N5 to N2 can't get more than 50%.
 
Well if that bothers you look at the size of the NPU, as big as 66% of 4 P-cores, and we still have no information on what is it good for at all, no idea.

NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.
 
NPU (in a CPU, or in a GPU) will enable you to talk to NCPs during gameplay in a natural way. NPCs will also have memory of what you did previously during gameplay and will act accordingly the next time you meet them. Logically, the hardware (NPU) has to be in PCs before games take advantage of the NPU. Such games are either in development (best case scenario), are in experimental stages (realistic scenario), or haven't been thought of yet (pessimistic scenario). Scripted dialogues in games will be a thing of the past.
Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

It's still remarkable that hardware offerings responded as quickly as they did, when the current local AI boom only took off as late as 2nd-half of 2022, (no) thanks to the likes of Stable Diffusion and LLAMA.
 
Game mods with such feature already exist, and I think there are a few prototype games that required an OpenAI API key to function as intended. I'm under the impression that they are currently more amusingly quirky and weird, and gets weirder if you use much less capable local models running as surrogate OpenAI API.

I think a major obstacle might be that for a 3D game to behave in a natural way the 3D model of the NPCs would need to be in sync with the output of a large language model AI. I haven't seen anything like that anywhere yet, such a game engine seems not possible today nor in the near future, and I cannot imagine how to train an AI for such a scenario. OpenAI API key is pointless for this scenario because it doesn't output 3D models (nor 2D models). But if it is possible, somebody will eventually figure it out. Nevertheless, it is good to see that TPUs, albeit it is just an experimental technology, are becoming a standard part of PCs.
 
That's why it's so bloated. It's a laughingstock and why AMD is kicking them in servers and power consumption in desktops so easily.
Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.
Skymont is better than both Lion Cove and Zen 5. It means per clock Lion Cove/Zen 5 will be less than 15% faster. Lion Cove is 3x the size with less efficiency. It's a done deal.
15% is a big difference though.
There seems to be a ceiling for IPC with a lot of things that have been treated as kind of "natural limits" - not going too wide, not going too complex in parts, widths of memory buses etc - that are being challenged to wring more performance out of architecture now that clock speeds are no longer increasing. Increasing caches has been a thing for a while, AMD's huge L3 caches seem to show there is a limit to its effectiveness in general purpose use. Widening is now constant, Apple went really wide in almost everything (and can get away with it largely thanks to their entire ecosystem being under their control). Apple went with wide memory buses and Intel seems to be following suit - others are likely to follow. And of course a bunch of other things.

But before getting stuck in trying to think of examples, the point I wanted to make was that the last 15% is hard. ARM, RISC-V and some other competitors are coming up fast because the path is known. Intel, AMD and others have already tried a bunch of stuff, found what works, what doesn't and why. New things are starting to crop up - Apple and M-s as the obvious example - but this is because easy wins are now depleted. And clock speeds are no longer increasing in mobile either.
You. I am referring to you, that said Zen 5 and Skymont is better because of the clustered decode design. It is a compromise to try to overcome limitations of x86 ISA decoding - where traditional increase results in quadratic rise in transistor usage in decoders(hence the neverending ARM vs x86 argument). When it comes to pure decoder performance, Golden Cove's single 6-wide is better. Of course when it comes to overall design as a core, saving great deal of transistors allow you to beef up other areas. And based on Tre/Grace/Sky's results, clustered decode is the way to go.
I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.
 
Bloated or even architecture has little to do with power consumption and AMD kicking them in servers in this case. AMD is manufacturing their CPUs on a node that is basically a full node ahead. That is a big difference. Same deal as Intel was doing constantly back yonder. We can compare the results of an architecture in terms of efficiency once they are on a comparable enough node.
15% is a big difference though.
Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...
 
Yeah, it always amazes me how people juts casually claim AMD kicking in power consumption. Intel is literally the only company that makes 35 watt desktop CPUs. Amd chips need that amount of power just to sit there idle. But sure they are kicking...
To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.
 
To be fair, Intel does not really make 35W desktop CPUs. The "35W TDP" T variant still run PL2 to 105W or something which is stupid. And AMD could if they wanted to, just not by limiting the chiplet CPUs but by limiting the G APUs. Either way, 35W should be too low in terms of efficiency for both.
Yeah but the Pl2 last for 50 seconds or something. When you are actually using it, it drops to 35. Due to the IO die, that's just impossible for amd desktop chips.
 
Windows Recall sounds like hibernation mode re-imagined by Microsoft to help steer people towards Co-Pilot for obvious reason like ad-revenue and ad-revenue along with generative gibberish marketing and ad-revenue.
Yes, in articles like this links to basic explanations of terms would be nice.
its been decades since i had those classes and forgot a lot of them
 
I bet the clustered decoder is not due to limitations in decoding - this is purely an efficiency boost. Decoder is pretty beefy so having ability to turn 1/3 or 2/3 of it off should be pretty nice.

Skymont cannot "turn on 1/3 or 2/3 of decoders".
 
50% increase was against 165U in TimeSpy, which has a weaker IGP.
View attachment 350096

Still, this LNL IGP according to the current graph is faster by an unknown amount than even MTL-H while consuming less.
It will be interesting how It will perform in reality.
Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.
 
Intel and its tricks. This iGPU has only half the shaders... I think LunarLake might even be more efficient than the Strix, but the difference in performance will be huge.
More efficient, I agree, but "huge" difference in performance? I don't think so. Better? Yes, but not huge.
 
Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed
 
Informal poll: What was the best Computex Launch/Teaser?

Lunar Lake
Strix Point
Granite Ridge
Sierra Forrest
Turin
Granite Rapids
Arrow Lake
Panther Lake
Whatever Nvidia showed
- Intel only showed slides that didn't say anything useful or exciting.
- Nvidia only talked about AI.
- Qualcomm maintained its smoke-and-mirrors approach.

I can't help saying that AMD, as the only company that actually showed benchmarks, was the best overall lol
 
Back
Top