Friday, February 10th 2017

8th Gen Core "Cannon Lake" Over 15% Faster Than Kaby Lake: Intel
At an investor meeting in February, Intel touched upon its performance guidance for its 8th generation Core processor family due for later this year. Based on the 14 nm "Cannon Lake" silicon, these processors are expected to have a bigger performance gain over the preceding 7th gen Core "Kaby Lake" micro-architecture, than Kaby Lake had over its predecessor, the 6th gen Core "Skylake."
In a slide titled "advancing Moore's Law on 14 nm," Intel illustrated how Kaby Lake processors are on average 15 percent faster than Skylake parts, in SYSmark. While Kaby Lake has negligible IPC gains over Skylake, the newer chips are clocked significantly higher, making up Intel's performance targets. Unless Cannon Lake is a significantly newer micro-architecture than Kaby Lake, we could expect them to come with even higher clock speeds. Will the Core i7-8700K be a 5 GHz chip?
Source:
VideoCardz
In a slide titled "advancing Moore's Law on 14 nm," Intel illustrated how Kaby Lake processors are on average 15 percent faster than Skylake parts, in SYSmark. While Kaby Lake has negligible IPC gains over Skylake, the newer chips are clocked significantly higher, making up Intel's performance targets. Unless Cannon Lake is a significantly newer micro-architecture than Kaby Lake, we could expect them to come with even higher clock speeds. Will the Core i7-8700K be a 5 GHz chip?
97 Comments on 8th Gen Core "Cannon Lake" Over 15% Faster Than Kaby Lake: Intel
I've been reading for a while about graphene and using light to transfer information within CPUs and this and that, but the reality is that the cost to implement these kinds of changes is probably sky high.
Intel probably has the capital and financing to pretty much do whatever they have to, but then the question of WHY pops up.
An Intel shareholder or board member will want to know why Intel should bother investing to bring these advancements and changes into reality.
If Intel can milk these incremental and pitiful performance increases for the next few years, advancing NOTHING, but still making SOME profits, while NOT heavily increasing spending, why not?
They have what most of us would probably consider to be a massive R & D budget. They have cutting edge fabs and the smartest people in the world on their payroll. They have billions of dollars. What they don't have is motivation.
People have no idea of what future computer hardware and software might be capable of, so for the most part, they are satisfied with how things are, but what if you knew that a year from now artificial intelligence could be 10x as advanced as it is now? What if you knew that a Holodeck could be an affordable reality for every home in America?
Intel doesn't HAVE to ask these questions right now, so they won't. If you're a software company or a hardware company or a computer person, start thinking about these things and asking those questions because otherwise we're going to be sitting here, 10 years from now, in the same situation.
In fact, the IPC gains from Sandy Bridge to Skylake is about ~6-15% in total, when we speak of generic calculations. Intel always touts 10-15% IPC gains between each refresh, but that has not been true since Sandy Bridge. Intel's marketing IPC claims are bloated with special features like AES acceleration, which of course have no impact on most applications. The only real changes since Sandy Bridge have been very minor, with the biggest being more vector extensions(which of course mainly helps certain applications), prefetcher improvements, and cache and memory improvements. Fundamentally the architecture of Skylake and Sandy Bridge is the same, and until Intel adds more execution ports all new revisions should be considered minor refreshes. Cannon Lake, like Kaby Lake, is not going to feature anything significant, that will at least have to wait for Icelake or later. That is correct, minor tweaks all over. As I always say, the only reason to upgrade a Sandy Bridge or newer is to get more cores. Moore's law has never meant anything, and is nothing but an obscure quote. Moore's law has never said anything about performance, but simply "number of components per integrated circuit". Anyone with basic understanding of math will understand that a revolutionary new technology will have exponential growth in a limited period and will eventually flatten out. Moore's law is worthless and is not even worth a mention, especially since there are no real correlation between transistor count and performance. Yes, Sandy Bridge was the last overhaul of the architecture. It increased the number of execution ports (doing instructions) from two to three, and everything since has mostly being optimizing the front-end and memory/cache, so the throughput has not increased. It is possible to break through the current "IPC ceiling", but the IPC gains are going to be minor until Intel decides to add more execution ports. That's not accurate at all, the legacy part of x86 probably makes up less than 1% of the current transistor count. All modern x86 CPUs are implemented as a RISC architecture, and thereby attaining most of the advantages of other RISC based competitors.
If Intel were to replace x86 with ARM or something similar, almost all architectural features will still be almost the same; including the prefetcher, memory controller, the super-scalar execution ports, vector engines, special accelerated features, etc. The gain from replacing x86 with any of the competitors will probably be less than 2%, and considering all the software which needs to be optimized, it's not remotely worth it. x86 will remain until a revolutionary new architecture arrives, one which fundamentally adds/changes how we controls a CPU on microcode level.
You could argue the paging structure could use some tweaks, but none of that is blocked by the "legacy" from x86. A super-long pipeline which caused huge performance penalties from branch mispredictions.
What this means is they need to basically pull off architecture which would be running at sub 3GHz and be as or preferably more efficient than current Skylake/Kaby Lake at 4.2+ GHz. There is no other way going around this imo.
Frankly, I wonder if it is time to reinvent the x86 wheel again. It would be nice to see a flavor of x86 that can compete with ARM in terms of efficiency and simplicity. Perhaps a more modular approach to x86 is what is needed.
We do in fact need more performance, and there is no future in scaling clock frequencies. We are only get minor gains there, not because higher clocks don't work logically, but rather because of physical limitations with the production technology. That's why it's hard to scale beyond 4-5 GHz, and each 100 MHz is going to require more effort. The advances are no longer the way we experienced in the 80s and 90s, where clock frequencies keep doubling or so. Intel intended Netburst to scale to 6-8 GHz, but the energy inefficiency forced them to look for other options for performance scaling.
The "why" matters for technical discussions, because it's a clear indication of what gains we can have in the years to come. If Intel continues with no major overhaul the performance will continue to be flat. Yes they will continue to gain 100 MHz here and there, yes they will get a 0.5% performanche here and there, but none of that is going to be significant.
Even though scaling is harder now that we can't scale with clocks any more, it's still highy possible. In fact there are two ways to increase IPC looking forward; more execution ports and more vector extensions. In the future we are going to need more of both. AMD's Zen will in fact increase the superscalar abilities. While Intel has since Sandy Bridge used three execution ports (not including load/store, etc.), Zen will seemingly feature 4 ports with ALUs and 2 additional ports for FPUs. This will mean that AMD with a smaller core and less advanced front-end(prefetcher) still will have an advantage in some workloads, such as Blender and other benchmarks. Hopefully Intel will get their act together and do so too, especially adding more ALUs, since they are relatively cheap and can still help boosting IPC a lot. If we in ~10 years have a cores with 8 ALUs, some more FPUs and some improved vector engines we can get nearly twice the current IPC, of course a such increase will happen gradually.
Speaking of the mistakes of Netburst and Bulldozer. The problem for Netburst was not cache misses, the penalty of cache misses are pretty much constant, and does not scale with clock frequency. The problem were the length of the pipeline (31 stages vs. 12 for Athlon 64), which makes each branch misprediction vastly more costly. In fact Netburst got a better prefetcher than AMD, so Intel were better at guessing, but the cost of guessing wrong was so high that they lost a lot of performance. And since these CPUs are superscalar with two operations/clock, the penalty difference is actually twice as large as you might think; 31-12 = 19 clocks => 38 operations. This means that for every misprediction by both CPUs, AMD got 19 "free" clocks which Intel lost 19-38 operations.
Bulldozer did not only do the mistake of a long pipeline, but also sharing prefetcher resources and FPUs between two CPUs, which basically cripples the efficiency. Like I've mentioned, there is great potential in increasing the superscalar features.
Specialized features will play a part, but only for specific needs, like AES, video acceleration, etc.
Honestly, any fanboy of either brand and others who frequent sites like this needs to realize this.
Some red pills are really hard to swallow, mainly if they are about things you really like.