Monday, September 30th 2019

Intel Sunny Cove Successor Significantly Bigger: Jim Keller

Sunny Cove is codename for Intel's first truly new performance CPU core design since "Skylake," and made its debut with the company's 10 nm "Ice Lake" processors, packing the first tangible IPC increase in years. VLSI guru Jim Keller is leading the effort to build Intel's future CPU core designs, and dropped a big hint on what to expect, speaking at a gathering in U.C. Berkeley. It's unclear which specific core Keller is referring to. The immediate successor to "Sunny Cove" is codenamed "Willow Cove," and Intel's own public sketch hints at an incremental upgrade over Sunny Cove, with faster caches and process-level optimization. It's only with "Golden Cove," slated for 2021, that Intel speaks of its next round of IPC increases (dubbed "ST perf"). It's plausible that Keller is referring to this core since a 2021 launch would fit better with a 2018-19 design phase.

In his talk, Keller describes Intel's next big CPU core as being "significantly bigger" than "Sunny Cove," with its 800-wide instruction window, and "massive" data- and branch-predictors, to put Intel back on a linear performance growth trajectory between generations. Keller also commented on this being a "mindset change" at Intel, which over the past decade, only delivered minor IPC increments between generations, and focused on other areas, such as efficiency. In stark contrast, through the 1990s and 2000s, Intel delivered IPC leaps between generations, such as the one between "Netburst" and "Conroe," and onwards to "Nehalem." These were in-part helped by rapid process advancements that slowed in the 2010s as Intel approached the sub-10 nm scale.
The video presentation by U.C. Berkeley follows.

Add your own comment

59 Comments on Intel Sunny Cove Successor Significantly Bigger: Jim Keller

#26
Unregistered
Having only a 2 CPU pony race is bad enough and far from advantageous for consumers, why do so many comments from people want corporate monopolies?
#27
phanbuey
yakkHaving only a 2 CPU pony race is bad enough and far from advantageous for consumers, why do so many comments from people want corporate monopolies?
On some irrational level it makes them feel better.
Posted on Reply
#28
Mistral
FordGT90ConceptWelp, Intel is slated to make a come back in 2021. AMD better milk it while they can.
No worries; Keller will visit AMD to build a Zen successor after that cycle, it's just how that maniac operates... Intell will probably have a good time from 2021 to 2024.
Posted on Reply
#30
Chrispy_
Uh, biggest IPC gain since Skylake? Intel haven't made a proper architectural change since Sandy Bridge.

Anand did a Sandy/Ivy/Haswell/Broadwell/Skylake comparison, clocking all CPUs at 3.0GHz and generational improvements are in the 2-5% if you discard some outliers that were effectively fixed bugs rather than IPC improvements. (Here, if you want a nostalgic read)

AMD changed architectures from Bulldozer to Zen, and the result was something like a 50% IPC lift in a single jump. Intel has had far more time and money for R&D and in a decade has managed a measly 25% IPC average - an average that is heavily skewed by applications that take advantage of updated extensions like AVX and x264 encoders that older generations lack. It's not an apples-to-apples way to measure general-purpose IPC which is what Jim's talking about here, and it's why games and other applications that didn't use AVX or fixed-function hardware saw no IPC improvements whatsoever.

For gaming - especially those limited by single-core IPC, the only reason Intel gaming performance has improved since Sandy Bridge is higher clockspeeds and faster RAM.
Posted on Reply
#31
londiste
Anandtech has some tests that are not CPU-limited. Even with that, Haswell is a 10% jump from Ivy (and a bit more from Sandy). Skylake would be a noticeable jump if there was no Broadwell and Broadwell was fairly limited release and had eDRAM that effectively works like L4 cache which makes it a pretty special CPU. 5-6% from Haswell to Skylake isn't that bad either.

Jumps in performance and timeframes are very different here.
Remember that all these CPUs are basically a year apart (2011, 2012, 2014 and 2015). DDR4 is also a factor in case of Skylake, as it is for Zen.
Bulldozer is from 2011 and Excavator as its latest incarnation from 2015 is effectively same in IPC. Zen was released on 2017.

Sweclockers have run some tests on same clocks across a bunch of CPUs in their reviews for a while now:
www.sweclockers.com/test/23426-amd-ryzen-7-1800x-och-7-1700x/29
www.sweclockers.com/test/27760-amd-ryzen-9-3900x-och-7-3700x-matisse/28
Posted on Reply
#33
Midland Dog
TheGuruStudAnd Jim has lowered himself to marketing for Intel lol

All I hear is inadequacy like a truck jacked up 2ft.
This is so far away that newborns will be running around asking where it's at lol
i doubt that jim would stand behind something cove if he knew that his design was a flop
londisteZen2 is still Zen with minor improvements and it seems to be fine in 7nm.
GCN resulted in a power hungry monster at 14nm (and 28nm) as well :)
"fine on 7nm" if fine is telling us a clock and not giving it
Posted on Reply
#34
xkm1948
yakkHaving only a 2 CPU pony race is bad enough and far from advantageous for consumers, why do so many comments from people want corporate monopolies?
Or when there are 3rd contenders like the recent VIA/Zhaoxing(?) people would also bitch about “bad communism china evil”

So basically Intel evil, nvidia evil, foreign evil and etc. Only AMD is our lord and savior.


Some folks just want AMD. Only AMD.
Posted on Reply
#35
dicktracy
Icelake IPC is already ahead of competition and will most likely remain so until Zen 4. And remember, Intel destroyed AMD by focusing on mobile first.
Posted on Reply
#36
Unregistered
xkm1948Or when there are 3rd contenders like the recent VIA/Zhaoxing(?) people would also bitch about “bad communism china evil”

So basically Intel evil, nvidia evil, foreign evil and etc. Only AMD is our lord and savior.


Some folks just want AMD. Only AMD.
Underdog concept, not necessarily supporting a specific corporation.

Meaning somebody can support the underdog, the problem is there's only ONE underdog, in this case AMD. Because of the limited competition AMD becomes the only underdog choice in duopoly market situations.

If there were a pool of at least 4 to 6+ corporations making x86 CPUs, and 4 to 6+ corporations making GPUs then there would be multiple underdog possibilities, not just AMD.

Also reducing oligopoly price fixing possibilities would be nice.
Posted on Edit | Reply
#37
mouacyk
Jim said his friend can't rule out that Quantum computing is not a fraud! :toast:

Parallelism... leverage the compiler that Google built, accelerate profile-guided optimization with super computers for 90% of the coverage, then re-organize the single-threaded instructions for out-of-order execution = win for all legacy software that no programmer wants to touch (re-factor) with a 10-foot pole.
Posted on Reply
#38
MrPotatoHead
xkm1948Or when there are 3rd contenders like the recent VIA/Zhaoxing(?) people would also bitch about “bad communism china evil”

So basically Intel evil, nvidia evil, foreign evil and etc. Only AMD is our lord and savior.


Some folks just want AMD. Only AMD.
And some people want only intel, there's no difference you just happen to be in the intel camp whilst others are in the AMD one, the only saving grace Intel has right now is that it still holds a tiny overall performance lead in gaming, that's literally it and if it wasn't for the Ryzen 1600 and 1700 and then subsequent Ryzen+ etc intel would still be spewing out 4c/8t processors for $350-$450 for their higest clocked ones, people can now enjoy a 6 core even 8 core CPU at lower than this thanks only to AMD, that's a bloody fact as well as that's how it has been for 10 years and they restricted 6+ cores to HEDT whilst making slight clock changes and minute incremental changes to the desktop CPU's to keep people happy year on year releasing the same architecture and performance (save for clock speeds as mentioned) so what the hell is so bad routing for the underdog, supporting AMD and coming out and saying it, cause if it wasn't for AMD we would be even further behind then we were before Ryzen in terms of innovation.
Posted on Reply
#39
efikkan
londisteThere have been meaningful architectural upgrades in Intel CPUs. Haswell and Skylake are OKish updates. Since Skylake they have been stuck at 14nm which is their main problem. They seem to have architectures ready but 10nm was intended to be in full production in 2016. This did not happen and Intel has been limping along on Skylake and 14nm ever since.
Haswell and Skylake were major upgrades, but didn't focus much on IPC gains. They did spend a lot of their "transistor budget" on AVX2 and then dual AVX512 units, which have massive performance over Sandy Bridge in vector workloads. They also improved on multicore scaling. So these have been laying a lot of the ground work for future scaling, but did in my opinion not balance the priorities well enough, I would have preferred more a bit of "everything".
FordGT90ConceptHonestly, it all makes sense. Intel can't move to 10 nm because the architecture...which dates back to Nehalem...isn't meant for it.
That's incorrect. The 10nm node is holding back the architecture, not the other way around.
And Nehalem and Sunny Cove are not the "same" architectures. Performance is not how you determine if two things are the "same" or not, it's the architectural design that matters.
FordGT90ConceptArchitecture and process manufacturing go hand in hand.…
Sandy Bridge and Haswell both spanned across two nodes.
Architectures are tuned to node(s), but nodes are not tuned to architectures.
Chrispy_AMD changed architectures from Bulldozer to Zen, and the result was something like a 50% IPC lift in a single jump.
It was a major jump mainly because AMD were making up for the mistakes they did with Bulldozer. Remember that Bulldozer were competing with Sandy Bridge back in the days, and Sandy Bridge -> Skylake is only ~15% IPC gains for comparison.
Chrispy_For gaming - especially those limited by single-core IPC, the only reason Intel gaming performance has improved since Sandy Bridge is higher clockspeeds and faster RAM.
No actually not.
Firstly, while IPC is very important for gaming, it's only important up to the point where the CPU is no longer the bottleneck for the GPU. For Skylake this starts to happen around ~4 GHz with current games, at ~4.5 GHz becomes flat, and beyond that you only really improve stutter a tiny bit.

RAM speed matters little to nothing for gaming. Gaming is not bottlenecked by memory bandwidth, and "faster" memory really only improve bandwidth, not latency.

While Skylake have improved clocks a lot over Sandy Bridge, especially with "aggressive" boosting, the CPU front-end improvements have also helped a lot. It's important to remember that IPC is a measure of "arbitrary" workloads, and many things affect IPC. One of the reasons why Intel still have an edge in gaming is a stronger front-end, while AMD have higher peak ALU/FPU throughput in some cases, both of which affect IPC, but only the first really affect gaming.
Posted on Reply
#40
danbert2000
efikkanRAM speed matters little to nothing for gaming. Gaming is not bottlenecked by memory bandwidth, and "faster" memory really only improve bandwidth, not latency.
This isn't really true any more, especially with Ryzen systems. Take a look at this article comparing 3000 MHz to 3600 MHz

www.eurogamer.net/articles/digitalfoundry-2019-amd-ryzen-5-3600x-vs-core-i5-9600k-review?page=4

Sure it's just 5% or so, but compare 2400 MHz to 3600 MHz and you're going to see some pretty big gains from memory speed.
Posted on Reply
#41
efikkan
danbert2000This isn't really true any more, especially with Ryzen systems. Take a look at this article comparing 3000 MHz to 3600 MHz
That's not really the memory bandwidth causing a performance difference though, but how the memory controller is integrated with core-to-core communication etc.
Edit: The claim was that memory speeds have given Intel the advantage in gaming, which is not true. We see today that a Coffee Lake at 2666 MHz memory still beats a Zen 2 with memory overclocked to 3600 MHz.
Posted on Reply
#42
danbert2000
Chrispy_Uh, biggest IPC gain since Skylake? Intel haven't made a proper architectural change since Sandy Bridge.

Anand did a Sandy/Ivy/Haswell/Broadwell/Skylake comparison, clocking all CPUs at 3.0GHz and generational improvements are in the 2-5% if you discard some outliers that were effectively fixed bugs rather than IPC improvements. (Here, if you want a nostalgic read)

AMD changed architectures from Bulldozer to Zen, and the result was something like a 50% IPC lift in a single jump. Intel has had far more time and money for R&D and in a decade has managed a measly 25% IPC average - an average that is heavily skewed by applications that take advantage of updated extensions like AVX and x264 encoders that older generations lack. It's not an apples-to-apples way to measure general-purpose IPC which is what Jim's talking about here, and it's why games and other applications that didn't use AVX or fixed-function hardware saw no IPC improvements whatsoever.

For gaming - especially those limited by single-core IPC, the only reason Intel gaming performance has improved since Sandy Bridge is higher clockspeeds and faster RAM.
You say it hasn't changed much, but I'm seeing a lot of change between Ivy Lake, Haswell, Broadwell, and Skylake. Like up to 30% increase:





Posted on Reply
#43
FordGT90Concept
"I go fast!1!11!1!"
"We're working on a generation that's significantly bigger than [Sunny Cove] and closer to the linear curve on performance. This is a really big mindset change."

"We have a roadmap to 50x more transistors and huge steps to make on each piece of the stack."

He immediately goes into "discreet AI" here.

He refuses to say much on voltage and frequency scaling.

These two slides pretty much show Intel's problem:


Beating a dead horse (diminishing returns) doesn't get the accelerating returns curve everyone demands. Intel did this with Netburst derivatives and they're doing it again with Nehalem derivatives.

So yeah, Keller is working on something that isn't Nehalem-based. It's fundamentally an architecture problem, not a process problem.
Posted on Reply
#44
efikkan
FordGT90ConceptHe refuses to say much on voltage and frequency scaling.
Yeah, even when asked directly about it he said he didn't want to talk about it, just that they're working on it.
But we both know what this means; until completely different semiconductors are ready, clock speeds will be stagnant or declining. The best we can hope for until then is an unexpected breakthrough in alloys which allows them to maintain or slightly increase clocks. But even AMD expects clocks to decline going forward.
FordGT90ConceptBeating a dead horse (diminishing returns) doesn't get the accelerating returns curve everyone demands. Intel did this with Netburst derivatives and they're doing it again with Nehalem derivatives.

So yeah, Keller is working on something that isn't Nehalem-based. It's fundamentally an architecture problem, not a process problem.
Their future designs will of course derive from their previous ones, all of them technically derive from the old P6 architecture, it all depends on how far you stretch the definition. Their next one, Sapphire Rapids/Golden Cove, has been in the works for about five years, and mostly developed before Keller even joined the company.

I'm a little disappointed he spent most of the talk on anecdotes about how dense chips can become, we all get that, I wish he spent some time on what he would use these transistors for.
Posted on Reply
#45
Mephis
efikkanTheir future designs will of course derive from their previous ones, all of them technically derive from the old P6 architecture, it all depends on how far you stretch the definition. Their next one, Sapphire Rapids/Golden Cove, has been in the works for about five years, and mostly developed before Keller even joined the company.
Intel's current processor microarchitecture is more of a combination of the best things from the P6 and Netburst ever since Nehalem. I would be willing to venture a guess that whatever core comes after Golden Cove, the one Keller mentioned, will be a significant departure from past designs.
I'm a little disappointed he spent most of the talk on anecdotes about how dense chips can become, we all get that, I wish he spent some time on what he would use these transistors for.
I'm not. Don't forget that he works for Intel and would have to get the talk approved before he gave it. They have a recent history of being really tight lipped about details of future plans.
Posted on Reply
#46
FordGT90Concept
"I go fast!1!11!1!"
efikkanTheir future designs will of course derive from their previous ones, all of them technically derive from the old P6 architecture, it all depends on how far you stretch the definition. Their next one, Sapphire Rapids/Golden Cove, has been in the works for about five years, and mostly developed before Keller even joined the company.

I'm a little disappointed he spent most of the talk on anecdotes about how dense chips can become, we all get that, I wish he spent some time on what he would use these transistors for.
Look at the block diagrams between Nehalem and Skylake. There's been optimizations (increasing the width of things and more cache) and the like but they're fundamentally the same:


They can't really get significant gains anymore from further optimization/increasing cache. Here's Pentium Pro for comparison:

Which is so much leaner/simpler.


I can't shake the feeling that Keller might be working on an AI product, or at least a hybrid AI design where it has some (like 4) SMT cores + an AI engine. Those new SMT cores will be branched off into a non-AI product as their new x86 achitecture.
Posted on Reply
#47
efikkan
FordGT90ConceptLook at the block diagrams between Nehalem and Skylake. There's been optimizations (increasing the width of things and more cache) and the like but they're fundamentally the same:
<snip>
They can't really get significant gains anymore from further optimization/increasing cache.
By your logic, Skylake and Zen 2 much be closely related then, closer "related" than Nehalem, on a block diagram level they look very much "the same", but we both know they are not.

Especially the front-end have changed a lot in Sandy Bridge, Haswell and Skylake. It's still a CPU front-end, it still does decoding, prefetching, branch prediction etc., but these similarities are only on a superficial level. In the execution engine almost all modern CPU designs are fairly similar, the difference here is in the ALU, FPU, AGU, … configuration. This configuration is changed on every new microarchitecture.
FordGT90ConceptI can't shake the feeling that Keller might be working on an AI product, or at least a hybrid AI design where it has some (like 4) SMT cores + an AI engine. Those new SMT cores will be branched off into a non-AI product as their new x86 achitecture.
Intel is surely working on various "AI" related technologies, as they are already starting to face serious competition from AI ASICs. But if this will be more AI related instructions, tensor FPUs or separate "cores" remains to be seen.

I doubt Intel is moving in the SMT 4 direction, at least in the long term. I know they are researching what they call "threadlets", where dependencies between chains of instructions(including branching) are explicit, so the CPU can scale its superscalar abilities much better, and also not flush the entire pipeline every time there is a branch misprediction etc. I don't think this is right around the corner, but when it arrives it should give a massive boost to single threaded performance, even for fairly "poor" code, and will more or less "eliminate" those idle cycles that today are used for other threads with SMT.
Posted on Reply
#48
FordGT90Concept
"I go fast!1!11!1!"
efikkanBy your logic, Skylake and Zen 2 much be closely related then, closer "related" than Nehalem, on a block diagram level they look very much "the same", but we both know they are not.

Especially the front-end have changed a lot in Sandy Bridge, Haswell and Skylake. It's still a CPU front-end, it still does decoding, prefetching, branch prediction etc., but these similarities are only on a superficial level. In the execution engine almost all modern CPU designs are fairly similar, the difference here is in the ALU, FPU, AGU, … configuration. This configuration is changed on every new microarchitecture.
Front end: AMD uses dispatch where Intel uses Allocation Queues and a multiplexer and AMD uses Branch Table Buffers where Intel uses Decoded Stream Buffer.

Execution Engine: AMD has dedicated FP/SIMD and integer execution units where Intel has many different execution units with different capabilities (some just address generation, some do FP). They are very, very different in overhead (AMD divides and conquers with separate hardware resources where Intel divides and conquers with shared resources).
Posted on Reply
#49
Midland Dog
efikkanHaswell and Skylake were major upgrades, but didn't focus much on IPC gains. They did spend a lot of their "transistor budget" on AVX2 and then dual AVX512 units, which have massive performance over Sandy Bridge in vector workloads. They also improved on multicore scaling. So these have been laying a lot of the ground work for future scaling, but did in my opinion not balance the priorities well enough, I would have preferred more a bit of "everything".


That's incorrect. The 10nm node is holding back the architecture, not the other way around.
And Nehalem and Sunny Cove are not the "same" architectures. Performance is not how you determine if two things are the "same" or not, it's the architectural design that matters.


Sandy Bridge and Haswell both spanned across two nodes.
Architectures are tuned to node(s), but nodes are not tuned to architectures.


It was a major jump mainly because AMD were making up for the mistakes they did with Bulldozer. Remember that Bulldozer were competing with Sandy Bridge back in the days, and Sandy Bridge -> Skylake is only ~15% IPC gains for comparison.


No actually not.
Firstly, while IPC is very important for gaming, it's only important up to the point where the CPU is no longer the bottleneck for the GPU. For Skylake this starts to happen around ~4 GHz with current games, at ~4.5 GHz becomes flat, and beyond that you only really improve stutter a tiny bit.

RAM speed matters little to nothing for gaming. Gaming is not bottlenecked by memory bandwidth, and "faster" memory really only improve bandwidth, not latency.

While Skylake have improved clocks a lot over Sandy Bridge, especially with "aggressive" boosting, the CPU front-end improvements have also helped a lot. It's important to remember that IPC is a measure of "arbitrary" workloads, and many things affect IPC. One of the reasons why Intel still have an edge in gaming is a stronger front-end, while AMD have higher peak ALU/FPU throughput in some cases, both of which affect IPC, but only the first really affect gaming.
that ram speed speal is bs. any cpu will gain in 0.1% lows from good timings and a ram oc, haswell needs a good mem oc to get the most out of it in high refresh rate games, heck my g3258 went from a stuttery mess with 1400mhz ram to just being slow minus the stutters at 2133mhz ram
Posted on Reply
#50
Chrispy_
efikkanRAM speed matters little to nothing for gaming. Gaming is not bottlenecked by memory bandwidth, and "faster" memory really only improve bandwidth, not latency.

While Skylake have improved clocks a lot over Sandy Bridge, especially with "aggressive" boosting, the CPU front-end improvements have also helped a lot. It's important to remember that IPC is a measure of "arbitrary" workloads, and many things affect IPC. One of the reasons why Intel still have an edge in gaming is a stronger front-end, while AMD have higher peak ALU/FPU throughput in some cases, both of which affect IPC, but only the first really affect gaming.
Like Danbert and Midland, I strongly disagree with you on this. You're comparing Sandy (DDR3-1333MHz max spec) with modern platforms that manage a minimum of ~2.5x the bandwidth and significantly lower latency at the same time.

Typically, RAM bandwidth is one of the leading contributors to low minimum framerates and there is no shortage of articles and videos going back a decade or so that make this painfully obvious. I must have watched and read over a hundred mainstream videos on this topic alone.

Who cares about average framerates when their 1% low and 0.1% low framerates are absolutely tanking performance when it matters?
Posted on Reply
Add your own comment
Jul 22nd, 2024 07:29 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts