Thursday, May 30th 2024
Intel's "Skymont" E-core Posts a Double-digit IPC Gain Over "Crestmont": Leaked Presentation
Amid all the attention the next-generation "Lion Cove" P-cores powering the upcoming "Lunar Lake" and "Arrow Lake" microarchitectures get as they compete with AMD's "Zen 5," it's easy to lose sight of the next-generation "Skymont" E-cores that will feature in both the upcoming Intel microarchitectures, and as standalone cores in the "Twin Lake" low-power processor. Pictures from an Intel presentation, possibly to PC OEMs, got leaked to the web. These are just thumbnails, we can't see the whole slides, but the person who took the pictures captioned them in a now-deleted social media post on the Chinese microblogging platform Weibo.
And now, the big reveal—the "Skymont" E-core is said to offer a double-digit IPC gain over the "Crestmont" E-core powering the current "Meteor Lake" processor, which in itself posted a roughly 4% IPC gain over the "Gracemont" E-cores found in the "Raptor Lake" and "Alder Lake" microarchitectures. Such an IPC gain over "Gracemont" should make the "Skymont" E-core match the IPC of the "Sunny Cove" or "Willow Cove" P-cores powering the "Ice Lake" and "Tiger Lake" microarchitectures, respectively, which were both within the 90th percentile of the AMD "Zen 3" core in IPC.Intel is achieving this double-digit IPC gain over "Crestmont" through an improved branch prediction unit, a broader 9-wide Decode unit compared to the 6-wide Decode unit of "Crestmont," and an 8-wide integer ALU, compared to 4 Integer ALU on its predecessor, a dependency optimization in the out-of-order engine, and deeper queuing across the engine. The E-cores might still be arranged in clusters that share an L2 cache among a certain number of cores.
Source:
HXL (Twitter)
And now, the big reveal—the "Skymont" E-core is said to offer a double-digit IPC gain over the "Crestmont" E-core powering the current "Meteor Lake" processor, which in itself posted a roughly 4% IPC gain over the "Gracemont" E-cores found in the "Raptor Lake" and "Alder Lake" microarchitectures. Such an IPC gain over "Gracemont" should make the "Skymont" E-core match the IPC of the "Sunny Cove" or "Willow Cove" P-cores powering the "Ice Lake" and "Tiger Lake" microarchitectures, respectively, which were both within the 90th percentile of the AMD "Zen 3" core in IPC.Intel is achieving this double-digit IPC gain over "Crestmont" through an improved branch prediction unit, a broader 9-wide Decode unit compared to the 6-wide Decode unit of "Crestmont," and an 8-wide integer ALU, compared to 4 Integer ALU on its predecessor, a dependency optimization in the out-of-order engine, and deeper queuing across the engine. The E-cores might still be arranged in clusters that share an L2 cache among a certain number of cores.
27 Comments on Intel's "Skymont" E-core Posts a Double-digit IPC Gain Over "Crestmont": Leaked Presentation
Improved iGPU…check
Improved AI…check
Improved E-cores…check
Improved P-cores…syntax error…
…recompiling…
Lower P core clocks…check
Less threads…check
Lower P-core IPC…………………………
But I left the question open.
That would be wild if true. Also a market share killer.
Given that lion cove will be fabbed on a more advanced node (than raptor lake), I also expect much better power efficiency. Intel needs to get its power consumption in check.
What I can make out:
-Flexible & Scalable: Shows Lunarlake and Arrowlake
-Increased IPC Gains: Left is likely Int and right is FP. FP gains are greater. Integer gains seem to be not 1.1n x but 1.2n x or 1.3n x.
-Two graphs of "something" Power and Performance compared to another core. Guessing one is for performance and other is for power(Arrowlake and Lunarlake). For the Performance graph it shows 2x performance with more power. At the same power it seems to be showing 1.5 or 1.7x. It also seems to say "1/3 power" at same performance. The low power graph shows massive gains(2.4x at the same power?), probably compared to Crestmont LP? Scales to 4x or 5x compared to Crestmont LP at higher power.
-Skymont uArch Goals
-"something" Decode: 9-wide(3x3), "Nanocache"?
-"something" Predict: 128 bytes, Faster
-"something" OoOE Engine: 8-wide and 16-wide, Dependency "streaming"?
-Deeper Queueing with More Resources
Predictions. 10% faster in Int and little bit behind FP compared to Golden Cove.
This should bolde well for the efficiency of lower clocked parts.
Isn't Zen 4 only 4 wide?
No idea what micro is, I haven't look at the diagrams for Zen 4, I'm sure it's around 6 or 8 though for micro, as it's usually double the macro.
Thier biggest mistake with RPL was Instead of increasing the cache amount they decided to clock the chips well beyond their capability. The ST performance of RPL core is amazing even at 5.3/5.2 ghz - if they just had more cache on that design they could have kept up in gaming at much lower wattage, and taken a very minor L in multithreading performance to the 7950x. Costs were probably the reason for this chioce, but still - would be nice to use the e core space for cache or hbm instead.
Typically just dropping E cores clock speeds a bit and pushing P cores a bit harder and/or ensuring they can boost longer w/o thermal problems is fine in practice. Most workloads don't need peak ST performance more than 8 cores anyway. I mean hell we use to live in a world where all we heard was four cores is all you need.
That's still fairly true a good amount of the time since most workloads aren't exactly pegging 8P cores to death or even more than 4P cores in many cases. Workloads vary of course and you can point to whichever data you wish to in order to illustrate or make most points of topic banter arguments.
I'm satisfied with my 14700K for what I got it for it was good deal. Is it perfect not exactly, but does it matter to me not really. Do I even notice in daily operation not all.