Monday, September 30th 2019
![Intel](https://tpucdn.com/images/news/intel-v1721205152158.png)
Intel Sunny Cove Successor Significantly Bigger: Jim Keller
Sunny Cove is codename for Intel's first truly new performance CPU core design since "Skylake," and made its debut with the company's 10 nm "Ice Lake" processors, packing the first tangible IPC increase in years. VLSI guru Jim Keller is leading the effort to build Intel's future CPU core designs, and dropped a big hint on what to expect, speaking at a gathering in U.C. Berkeley. It's unclear which specific core Keller is referring to. The immediate successor to "Sunny Cove" is codenamed "Willow Cove," and Intel's own public sketch hints at an incremental upgrade over Sunny Cove, with faster caches and process-level optimization. It's only with "Golden Cove," slated for 2021, that Intel speaks of its next round of IPC increases (dubbed "ST perf"). It's plausible that Keller is referring to this core since a 2021 launch would fit better with a 2018-19 design phase.
In his talk, Keller describes Intel's next big CPU core as being "significantly bigger" than "Sunny Cove," with its 800-wide instruction window, and "massive" data- and branch-predictors, to put Intel back on a linear performance growth trajectory between generations. Keller also commented on this being a "mindset change" at Intel, which over the past decade, only delivered minor IPC increments between generations, and focused on other areas, such as efficiency. In stark contrast, through the 1990s and 2000s, Intel delivered IPC leaps between generations, such as the one between "Netburst" and "Conroe," and onwards to "Nehalem." These were in-part helped by rapid process advancements that slowed in the 2010s as Intel approached the sub-10 nm scale.The video presentation by U.C. Berkeley follows.
In his talk, Keller describes Intel's next big CPU core as being "significantly bigger" than "Sunny Cove," with its 800-wide instruction window, and "massive" data- and branch-predictors, to put Intel back on a linear performance growth trajectory between generations. Keller also commented on this being a "mindset change" at Intel, which over the past decade, only delivered minor IPC increments between generations, and focused on other areas, such as efficiency. In stark contrast, through the 1990s and 2000s, Intel delivered IPC leaps between generations, such as the one between "Netburst" and "Conroe," and onwards to "Nehalem." These were in-part helped by rapid process advancements that slowed in the 2010s as Intel approached the sub-10 nm scale.The video presentation by U.C. Berkeley follows.
59 Comments on Intel Sunny Cove Successor Significantly Bigger: Jim Keller
Anand did a Sandy/Ivy/Haswell/Broadwell/Skylake comparison, clocking all CPUs at 3.0GHz and generational improvements are in the 2-5% if you discard some outliers that were effectively fixed bugs rather than IPC improvements. (Here, if you want a nostalgic read)
AMD changed architectures from Bulldozer to Zen, and the result was something like a 50% IPC lift in a single jump. Intel has had far more time and money for R&D and in a decade has managed a measly 25% IPC average - an average that is heavily skewed by applications that take advantage of updated extensions like AVX and x264 encoders that older generations lack. It's not an apples-to-apples way to measure general-purpose IPC which is what Jim's talking about here, and it's why games and other applications that didn't use AVX or fixed-function hardware saw no IPC improvements whatsoever.
For gaming - especially those limited by single-core IPC, the only reason Intel gaming performance has improved since Sandy Bridge is higher clockspeeds and faster RAM.
Jumps in performance and timeframes are very different here.
Remember that all these CPUs are basically a year apart (2011, 2012, 2014 and 2015). DDR4 is also a factor in case of Skylake, as it is for Zen.
Bulldozer is from 2011 and Excavator as its latest incarnation from 2015 is effectively same in IPC. Zen was released on 2017.
Sweclockers have run some tests on same clocks across a bunch of CPUs in their reviews for a while now:
www.sweclockers.com/test/23426-amd-ryzen-7-1800x-och-7-1700x/29
www.sweclockers.com/test/27760-amd-ryzen-9-3900x-och-7-3700x-matisse/28
So basically Intel evil, nvidia evil, foreign evil and etc. Only AMD is our lord and savior.
Some folks just want AMD. Only AMD.
Meaning somebody can support the underdog, the problem is there's only ONE underdog, in this case AMD. Because of the limited competition AMD becomes the only underdog choice in duopoly market situations.
If there were a pool of at least 4 to 6+ corporations making x86 CPUs, and 4 to 6+ corporations making GPUs then there would be multiple underdog possibilities, not just AMD.
Also reducing oligopoly price fixing possibilities would be nice.
Parallelism... leverage the compiler that Google built, accelerate profile-guided optimization with super computers for 90% of the coverage, then re-organize the single-threaded instructions for out-of-order execution = win for all legacy software that no programmer wants to touch (re-factor) with a 10-foot pole.
And Nehalem and Sunny Cove are not the "same" architectures. Performance is not how you determine if two things are the "same" or not, it's the architectural design that matters. Sandy Bridge and Haswell both spanned across two nodes.
Architectures are tuned to node(s), but nodes are not tuned to architectures. It was a major jump mainly because AMD were making up for the mistakes they did with Bulldozer. Remember that Bulldozer were competing with Sandy Bridge back in the days, and Sandy Bridge -> Skylake is only ~15% IPC gains for comparison. No actually not.
Firstly, while IPC is very important for gaming, it's only important up to the point where the CPU is no longer the bottleneck for the GPU. For Skylake this starts to happen around ~4 GHz with current games, at ~4.5 GHz becomes flat, and beyond that you only really improve stutter a tiny bit.
RAM speed matters little to nothing for gaming. Gaming is not bottlenecked by memory bandwidth, and "faster" memory really only improve bandwidth, not latency.
While Skylake have improved clocks a lot over Sandy Bridge, especially with "aggressive" boosting, the CPU front-end improvements have also helped a lot. It's important to remember that IPC is a measure of "arbitrary" workloads, and many things affect IPC. One of the reasons why Intel still have an edge in gaming is a stronger front-end, while AMD have higher peak ALU/FPU throughput in some cases, both of which affect IPC, but only the first really affect gaming.
www.eurogamer.net/articles/digitalfoundry-2019-amd-ryzen-5-3600x-vs-core-i5-9600k-review?page=4
Sure it's just 5% or so, but compare 2400 MHz to 3600 MHz and you're going to see some pretty big gains from memory speed.
Edit: The claim was that memory speeds have given Intel the advantage in gaming, which is not true. We see today that a Coffee Lake at 2666 MHz memory still beats a Zen 2 with memory overclocked to 3600 MHz.
"We have a roadmap to 50x more transistors and huge steps to make on each piece of the stack."
He immediately goes into "discreet AI" here.
He refuses to say much on voltage and frequency scaling.
These two slides pretty much show Intel's problem:
Beating a dead horse (diminishing returns) doesn't get the accelerating returns curve everyone demands. Intel did this with Netburst derivatives and they're doing it again with Nehalem derivatives.
So yeah, Keller is working on something that isn't Nehalem-based. It's fundamentally an architecture problem, not a process problem.
But we both know what this means; until completely different semiconductors are ready, clock speeds will be stagnant or declining. The best we can hope for until then is an unexpected breakthrough in alloys which allows them to maintain or slightly increase clocks. But even AMD expects clocks to decline going forward. Their future designs will of course derive from their previous ones, all of them technically derive from the old P6 architecture, it all depends on how far you stretch the definition. Their next one, Sapphire Rapids/Golden Cove, has been in the works for about five years, and mostly developed before Keller even joined the company.
I'm a little disappointed he spent most of the talk on anecdotes about how dense chips can become, we all get that, I wish he spent some time on what he would use these transistors for.
They can't really get significant gains anymore from further optimization/increasing cache. Here's Pentium Pro for comparison:
Which is so much leaner/simpler.
I can't shake the feeling that Keller might be working on an AI product, or at least a hybrid AI design where it has some (like 4) SMT cores + an AI engine. Those new SMT cores will be branched off into a non-AI product as their new x86 achitecture.
Especially the front-end have changed a lot in Sandy Bridge, Haswell and Skylake. It's still a CPU front-end, it still does decoding, prefetching, branch prediction etc., but these similarities are only on a superficial level. In the execution engine almost all modern CPU designs are fairly similar, the difference here is in the ALU, FPU, AGU, … configuration. This configuration is changed on every new microarchitecture. Intel is surely working on various "AI" related technologies, as they are already starting to face serious competition from AI ASICs. But if this will be more AI related instructions, tensor FPUs or separate "cores" remains to be seen.
I doubt Intel is moving in the SMT 4 direction, at least in the long term. I know they are researching what they call "threadlets", where dependencies between chains of instructions(including branching) are explicit, so the CPU can scale its superscalar abilities much better, and also not flush the entire pipeline every time there is a branch misprediction etc. I don't think this is right around the corner, but when it arrives it should give a massive boost to single threaded performance, even for fairly "poor" code, and will more or less "eliminate" those idle cycles that today are used for other threads with SMT.
Execution Engine: AMD has dedicated FP/SIMD and integer execution units where Intel has many different execution units with different capabilities (some just address generation, some do FP). They are very, very different in overhead (AMD divides and conquers with separate hardware resources where Intel divides and conquers with shared resources).
Typically, RAM bandwidth is one of the leading contributors to low minimum framerates and there is no shortage of articles and videos going back a decade or so that make this painfully obvious. I must have watched and read over a hundred mainstream videos on this topic alone.
Who cares about average framerates when their 1% low and 0.1% low framerates are absolutely tanking performance when it matters?