Friday, July 17th 2020
Windows 10 Scheduler Aware of "Lakefield" Hybrid Topologies, Benchmarked
A performance review of the Intel Core i5-L16G7 "Lakefield" Hybrid processor (powering a Samsung Galaxy S notebook) was recently published by Golem.de, which provides an in-depth look at Intel's ambitious new processor design that sets in motion the two new philosophies Intel will build its future processors on - packaging modularity provided by innovative new chip packaging technologies such as Foveros; and Hybrid processing, where there are two sets of CPU cores with vastly different microarchitectures and significantly different performance/Watt curves that let the processor respond to different kinds of workloads while keeping power-draw low. This concept was commercially proliferated first by Arm, with its big.LITTLE topology that took to the market around 2013. The "Lakefield" i5-L16G7 combines a high-performance "Sunny Cove" CPU core with four smaller "Tremont" cores, and Gen11 iGPU.
The Golem.de report reveals that Windows 10 thread scheduler is aware of the hybrid multi-core topology of "Lakefield," and that it is able to classify workloads at a very advanced level so the right kind of core is in use at any given time. The "Sunny Cove" core is called upon when interactive vast serial processing loads are in demand. This could even be something like launching applications, new tabs in a multi-process web-browser, or less-parallelized media encoding. The four "Tremont" cores keep the machine "cruising," handling much of the operational workload of an application, and is also better tuned to cope with highly parallelized workloads. This is similar to a hybrid automobile, where the combustion engine provides tractive effort from 0 kph, while the electric motor sustains a cruising speed.The Core i5-L16G7 has a rated SDP (scenario driven power) rating of 7 W. The package PL1 value is 7 W, too. Intel also gave the chip a PL2 value of 9.5 W, and a Tau value of 28 seconds. Some notebook vendors, however, are expected to set PL1 at 5 W. Raising it to 7 W will only be possible through a UEFI firmware update. Throughout Golem's testing, they observed that the "Sunny Cove" core kicks in during interactive workloads that require burst performance from the CPU, with the core typically clocked at 2.50 GHz, occasionally hitting 2.90 GHz. The smaller "Tremont" cores are typically clocked at 1.90 GHz during workloads, and can boost up to 2.70 GHz.
Perhaps the biggest dividend of topology-awareness by Windows OS scheduler is with the core rotation policy. By default, the Windows scheduler spreads a single-threaded workload across multiple cores (in sequence). AMD had to work with Microsoft to make Windows aware of the topology of its multi-CCX Ryzen processors, so workloads aren't spread between two CCX's if they don't have to. Similarly with "Lakefield," core rotation is localized to the "Tremont" cores.
Golem outlined the performance equation between the single "Sunny Cove" core and the four "Tremont" cores. A single "Sunny Cove" core has anywhere between 25-65% higher performance than a single "Tremont" core. On the other hand, the entire block of 4 "Tremont" cores offer 2x the performance of a single "Sunny Cove" core. This lends the two core blocks very different performance and power characteristics.The Core i5-L16G7 tests consistently ahead of the Qualcomm Snapdragon 8xC 8-core (4 big+4LITTLE) processor that has the same 7 W TDP. Golem tested the processor across 25 tests, comparing it with i7-1065G7 ICL-U 15 W, an i5-10210U "Comet Lake-U" 15 W processor, and a Pentium Silver N5000 SoC that has just "Tremont" cores. Raising the power limits appears to increase performance of the i5-L16G7 by anywhere between 40-60%.Much of what Intel learns from "Lakefield" will be implemented in future client-segment architectures such as "Meteor Lake," which will combine larger hybrid CPU core arrays to achieve high core counts. The i5-L16G7 allows notebook designers to make ultra portable devices with the power envelope of Snapdragon, but with the benefits of x86.
Find more benchmark results and commentary in the source link below.
Source:
Golem.de
The Golem.de report reveals that Windows 10 thread scheduler is aware of the hybrid multi-core topology of "Lakefield," and that it is able to classify workloads at a very advanced level so the right kind of core is in use at any given time. The "Sunny Cove" core is called upon when interactive vast serial processing loads are in demand. This could even be something like launching applications, new tabs in a multi-process web-browser, or less-parallelized media encoding. The four "Tremont" cores keep the machine "cruising," handling much of the operational workload of an application, and is also better tuned to cope with highly parallelized workloads. This is similar to a hybrid automobile, where the combustion engine provides tractive effort from 0 kph, while the electric motor sustains a cruising speed.The Core i5-L16G7 has a rated SDP (scenario driven power) rating of 7 W. The package PL1 value is 7 W, too. Intel also gave the chip a PL2 value of 9.5 W, and a Tau value of 28 seconds. Some notebook vendors, however, are expected to set PL1 at 5 W. Raising it to 7 W will only be possible through a UEFI firmware update. Throughout Golem's testing, they observed that the "Sunny Cove" core kicks in during interactive workloads that require burst performance from the CPU, with the core typically clocked at 2.50 GHz, occasionally hitting 2.90 GHz. The smaller "Tremont" cores are typically clocked at 1.90 GHz during workloads, and can boost up to 2.70 GHz.
Perhaps the biggest dividend of topology-awareness by Windows OS scheduler is with the core rotation policy. By default, the Windows scheduler spreads a single-threaded workload across multiple cores (in sequence). AMD had to work with Microsoft to make Windows aware of the topology of its multi-CCX Ryzen processors, so workloads aren't spread between two CCX's if they don't have to. Similarly with "Lakefield," core rotation is localized to the "Tremont" cores.
Golem outlined the performance equation between the single "Sunny Cove" core and the four "Tremont" cores. A single "Sunny Cove" core has anywhere between 25-65% higher performance than a single "Tremont" core. On the other hand, the entire block of 4 "Tremont" cores offer 2x the performance of a single "Sunny Cove" core. This lends the two core blocks very different performance and power characteristics.The Core i5-L16G7 tests consistently ahead of the Qualcomm Snapdragon 8xC 8-core (4 big+4LITTLE) processor that has the same 7 W TDP. Golem tested the processor across 25 tests, comparing it with i7-1065G7 ICL-U 15 W, an i5-10210U "Comet Lake-U" 15 W processor, and a Pentium Silver N5000 SoC that has just "Tremont" cores. Raising the power limits appears to increase performance of the i5-L16G7 by anywhere between 40-60%.Much of what Intel learns from "Lakefield" will be implemented in future client-segment architectures such as "Meteor Lake," which will combine larger hybrid CPU core arrays to achieve high core counts. The i5-L16G7 allows notebook designers to make ultra portable devices with the power envelope of Snapdragon, but with the benefits of x86.
Find more benchmark results and commentary in the source link below.
35 Comments on Windows 10 Scheduler Aware of "Lakefield" Hybrid Topologies, Benchmarked
Standing ovation to Intel for coming 7 year's late to the party.
Intel i5-L16G7 SoC at up to 7W
Consumer price: $281
OBS! This is the consumer price, NOT the manufacturer purchase price which is much lower.
Practically the whole computer is on this one chip including the memory and I/O (not the ssd though)
BUT you still have to pay $1000-1400 for a low performance computer with this CPU.
Ryzen 4800U - 8C/16T has a base clock of 1.8GHz at 10W. That's 8 full-fat cores with HT compared to Intel's single full-fat core and four Atoms at 7W.
Is that not on the charts because Intel would need to quadruple the scale just to fit it on the page?
single - 1.25x perf of snap 8xx at 0.7x power
looks like very good perf increase for single core loads without hurting multi
I'm not sure the 12 core follow up will be any better either.
I'm saying 4800U is rated down to 10W (AMD's official spec for 1.8GHz base clock with a cTDP) and the article is saying 7W typical and 9.5W boost for these Intel Hybrids. Sure, 10W is still bigger than 7W, but it's not the power gulf that you're trying to make out it is.
One SNC-core is 25 to 67 percent faster than one TNT-core, on the other side four TNT-cores offer roughly 2x performance of one SNC-core.
Sunny Cove should offer an average of 1.18x IPC compared to Skylake following intels marketing.
Tremont IPC vs Sunny Cove IPC
worst case 1: 1.67 = 60%
best case 1: 1.25 = 80%
This chart shows an average of 24% IPC-gain Sandy Bridge to Skylake.
IPC compared to Skylake:
Broadwell 97%
Haswell 94%
Ivy Bridge 85%
Sandy Bridge 81%
To wich Core µArchs IPC is the Tremont IPC similar?
Lets do the math:
Skylake to SunnyCove = x1.18
SunnyCove vs Tremont varies from x1.67 to x1.25
leads to
Worst case: 1.18 : 1.67 = 0.71 Tremont = 71% of Skylake IPC
Best case: 1.18 : 1.25 = 0.94 Tremont = 94% of Skylake IPC
Averaged between best and worst case, Tremont should perform with 82.5% of Skylakes IPC, wich is between SandyBridge and IvyBridge.
But the variance jumps from below SandyBridge to upto Haswell.
AMD Zen's CCX architecture is very new. I'm not sure if any other architecture has "localized" L3 cache to 4 physical cores with very slow (slower than DDR4) transfers between L3 clusters. I can very well see Microsoft having to start from scratch to implement a good scheduler on AMD Zen's architecture.
And the results are very impressive considering this is a first-gen attempt. At only ~71% of the 8CX's power (5W vs 7W) Lakefield is faster in all non-synthetic benchmarks, despite having a 3-core deficit.
In short, Windows-on-ARM just died, along with Qualcomm's ambitions for ultra-portables in the x86 space. Conversely, Microsoft now has a reason to try their hand at Windows Phone again.
This is the most exciting and most important thing to happen in the CPU industry since Ryzen. Yeah, it's pretty fast to support something when you've already added the code to support a long time ago. Funny how that works. Apparently you're incapable of doing basic math... 10W is ~42% more power than 7W and 100% more than 5W, which are massive proportions at this level. Especially when talking about devices with incredibly limited passive cooling capabilities.
4800U is rated 1.8GHz at 15W. On 10W, the frequencies will drop, and probably quite sharply. Renoir's (similar to Matisse's) power limits seem to be more akin to what Intel does with 35% over stated TDP (effectively given temperatures allow it) being common, at least in H models and the lower U models I have seen reviews of.
There is a pretty big difference here.
www.microsoft.com/en-us/research/wp-content/uploads/2012/05/main.pdf
docs.microsoft.com/en-us/windows-hardware/customize/power-settings/static-configuration-options-for-heterogeneous-power-scheduling
Microsoft has literally been working on heterogeneous schedulers for the last decade. I'm pretty sure this stuff was implemented far back as Windows8 (in some initial form, and probably was optimized over the years as big.LITTLE architectures came out).
Nowadays, people are full of hate. If Intel has had a rough time now, everyone is wishing their death. If Intel then comes and tries to create an interesting product everyone say it is shit and boring. If AMD would have come out with something similar, everyone would be praising it for how good it is. Dear people, remove the hate/love and try to see objectively what each company is doing. AMD has good products, I agree. But AMD also has bad products, very bad actually. Likewise for Intel, for Nvidia, etc. So lets learn to judge each situation separately and not introduce the same stupid hate on each product launch.
This product is a very interesting piece of design, silicon, power delivery, uArch, software, firmware, etc. It has some specific constraints that require a lot of effort, for example the very small package, like ARM SoCs. Stop being so ignorant and try to read every single piece of news with a clear mind. Intel will still launch great products in the future, don't get bogged into this 14nm+++++ hate. They didn't want to push it for so long, but making mistakes in the fab business is very costly and as we saw a multi year affair to fix. We should appreciate what AMD has done, but stop being so polarized and appreciate also what the competition does, because it might just be a good product. Yeah, another big difference is that they are in two different leagues altogheter in terms of package sizes, power requirements, PCB minimum size, etc. Intel also can drop their i7 15W CPUs to 4.5W (which they actually do for a very long time), but Lakefield is not about beating bechmarks, it is about packaging, big-little in x86 space, die stacking, etc.
- Lakefield is reportedly 82mm^2 but probably gets added cost from new-ish packaging stuff. 8cx is ~112mm^2, Renoir is in the neighborhood of 150mm^2.
- PCB minimum size is a question of target market. Intel wants Lakefield to compete with high end of ARM and had engineered and packaged Lakefield to do that. Renoir is a mobile chip but is simply not aimed that small or low.
- Intel has been doing ULV for a long while and the results have not been very good. Couple cores at 1GHz on 4.5W last I checked... meh. Atom is pretty competitive with these as well.
While the lines have been and are blurred, there are different target segments, 2-3W, ~5W, <10W, 15W, 35-45W seem to be the main ones in mobile. each with different requirements in addition to power limit.
Bottom line is that process is a key element in this industry. I've said it many times that AMD's current success is not in small part thanks to TSMC process advantage. Heck, ARM success nowadays is in big part thanks to TSMC being better. So Intel really needs to get back to full speed on their process or just license TSMC or whatever.
He's simply trying to take the worst/worst scenario to exaggerate his point by looking only at the extreme options in order to falsely prop up his argument.
The article clearly states it's 7W, not 5W.
"The Core i5-L16G7 has a rated SDP (scenario driven power) rating of 7 W"
AMD's offical cTDP values for a 4800U are 10-25W, and those official figures guarantee the 1.8GHz base clock assuming adequate cooling is provided. If your 4800U doesn't achieve 1.8GHz at 10W, RMA it because it's out of spec, ergo faulty.
10W compared to 7W may still be a sizeable 42% increase, but it's not the 200% increase he's trying to make it out to be. I'd certainly wager that a 10W 4800U is more than 42% faster than this 7W Core i5-L16G7....