Monday, May 4th 2020

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

May 4th, 2020 05:39 Discuss (40 Comments)

Intel is preparing lots of interesting designs for the future and it is slowly shaping their vision for the next generation of computing devices. Following the big.LITTLE design principle of Arm, Intel decided to try and build its version using x86-64 cores instead of Arm ones, called Lakefield. And we already have some information about the new Alder Lake CPUs based on Lakefield design that are set to be released in the future. Thanks to a report from Chrome Unboxed, who found the patches submitted to Chromium open-source browser, used as a base for many browsers like Google Chrome and new Microsoft Edge, there is a piece of potential information that suggests Alder Lake CPUs could arrive very soon.

Rumored to feature up to 16 cores, Alder Lake CPUs could present an x86 iteration of the big.LITTLE design, where one pairs eight "big" and eight "small" cores that are activated according to increased or decreased performance requirements, thus bringing the best of both worlds - power efficiency and performance. This design would be present on Intel's 3D packaging technology called Foveros. The Alder Lake CPU support patch was added on April 27th to the Chrome OS repository, which would indicate that Intel will be pushing these CPUs out relatively quickly. The commit message titled "add support for ADL gpiochip" contained the following: "On Alderlake platform, the pinctrl (gpiochip) driver label is "INTC105x:00", hence declare it properly." The Chrome Unboxed speculates that Alder Lake could come out in mid or late 2021, depending on how fast Intel could supply OEMs with enough volume.

Sources: @chiakokhua (Twitter), Chrome Unboxed

Add your own comment

40 Comments on Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

#26

Tom Yum

AssimilatorCPUs have been doing this for over a decade already...

I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

theoneandonlymrkThat's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.

#27

ARF

ARM's marketing material promises up to a 75% savings in power usage for some activities.[2]

Vayra86Serious answers are not available and its the same sentiment everywhere right? We really don't know anything other than 'it uses Big little'. We can speculate :)

More cores equals more power used. And from that conclusion... its easy to draw other conclusions. Such as:
1. Windows scheduler and good allocation of workloads will be the key to gaining an advantage over other products
2. Intel's goal must be: faster when its needed (it can turbo high), fall back on little when possible (big cores can cool down and clear TDP budget for a new boost). Any other approach is not feasible, because then they are not competitive against stripped AND full fat performance cores.
4. A new reduction of base clocks on the BIG cores is likely, to clear more TDP headroom for turbo. Or maybe even dial back entirely to idle clock, some 800 mhz, and just have a turbo on top of that. Or maybe fully shut down, but I'm then thinking of latency problems.

So, using the cores at the same time will bring what advantage exactly? I'm not seeing it, do you? For this product to be viable, it needs to be better than either variant of the cores used in it. 8 fast and 8 slow cores are still worse than 16 regular ones at base clock, I reckon...

Interesting stuff indeed :) What I personally think is that Alder Lake is a way to get 10nm dies out that were planned anyway, and still keep competitive product across the whole stack. Forget 'glued together', Intel is going full scrapyard dive. It also confirms yet again that 10nm scales like shit into performance territory.

There are 3 ways of arranging and using BIG.little:
1. Clustered switching - the described by you - either big cores or small cores and never at the same time;
2. In-kernel switcher - when a big and a small cores are coupled into pairs, so with 8 + 8 you would have something like 8 big cores + hyper-threading enabled;
and the third:
3. Heterogenous multi-processing (global scheduling):

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[10][11]
This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),[12][13] and Apple mobile application processors starting with the Apple A11.[14]

en.wikipedia.org/wiki/ARM_big.LITTLE#cite_note-14

en.wikipedia.org/wiki/ARM_big.LITTLE

The thing also which should be considered is that you have frequency wall on the 14nm process, so no matter the approach, more performance would not be possible.

And the whole approach will still be inferior to Zen 3 and Zen 4, especially with 16 big cores (or double) with SMT.

#28

TheoneandonlyMrK

Tom YumI think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.

I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.

#29

ARF

theoneandonlymrkI think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.

Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.

#30

tabascosauz

Tom YumI think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That just exposes their process and architectural failings in the past few years. Intel was the first in Skylake and Kaby to push for a smarter, more responsive core with Speed Shift. Yet, AMD completely showed them up with Matisse and how much a practical difference can be made with a core that responds to loads in 2ms. Renoir's monolithic die enabled a dynamically clocked IF, and that combined with Zen 2's signature CPPC2 features resulted in the 50-100% improvement in battery life in direct comparison to similarly specced Coffee Lake, in light and moderate workloads. You don't see AMD resorting to Jaguar on half the die to maintain efficiency at low loads, or power gate half of a Zen 2 core to gimp it down to Puma-level performance just to save 2/10ths of a watt at low loads. Matisse and Renoir already have the best of both worlds, they don't need to be that desperate.

This "allowing a Skylake core to morph into a Goldmont core" isn't happening. Intel moved to a considerably larger core with Sunny Cove (and if the rumors are true, even bigger in Willow Cove) in order to leverage that performance over traditional Core, to stay competitive. All these Alder Lake rumors reek of Intel engineers finally giving up on trying to optimize this larger Core for efficiency because their 10nm+ process still isn't worth a damn and 7nm is nowhere in sight, and instead turning to shitty Atom for the lower end of the power spectrum.

Intel can forget trying to turn off half a core, running cores at 25MHz, or juggling Atom and Core on the same substrate, if they can't even get their own Speed Shift technology down to where it rivals AMD's CPPC2. That's a prerequisite to all this nonsense. And if they do in fact perfect that concept, that would just enable Tiger Lake to perform in an adaptive manner as Renoir does, so then what's the point of using Goldmont? Mainstream consumers want an thin and light notebook that draws power like it's not even on when at idle, but ramps up to provide the requisite performance at a moment's notice. What Renoir is capable of, hits that nail right on the head.

And then there's the Windows scheduler, the worst cockblock of all.

#31

Caring1

The way I see it Intel will use Cluster Switching to achieve low clocks/ power consumption during idle low intensity usage, switcing to Big cores during high usage.
I could be wrong but I don't think all cores will be usable at he same time.

#32

TheoneandonlyMrK

ARFYup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.

They already do that too.

A lot of the obvious stuff has already been done.

#33

ARF

theoneandonlymrkThey already do that too.

A lot of the obvious stuff has already been done.

Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

#34

Mussels

Freshwater Moderator

ARFReally? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)

#35

ARF

MusselsDisabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)

Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

#36

Mussels

Freshwater Moderator

ARFYes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.

#37

ARF

MusselsProbably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.

It is not that extremely low - it drains my battery like no tomorrow. And it has only roughly 40-50% of the efficiency achieved in Renoir.
Renoir is the benchmark which we should compare everything else with.

Actually, laptops have much larger batteries than phones and despite this, the phones can last for weeks in standby, while poor laptops in the best case can last half a day in standby.

#38

T1beriu

There's no way Alder Lake is using 3D die stacking.

#39

ARF

T1beriuThere's no way Alder Lake is using 3D die stacking.

But Lakefield with 22nm/10nm is exactly this. 22nm base field, 10nm compute field.
1 big Sunny Cove core + 4 small Tremont cores.
This is the so called non-symmetric grouping heterogeneous multi-core of BIG.little cores.

en.wikichip.org/wiki/intel/microarchitectures/lakefield

#40

T1beriu

Just because this uses big+small cores does it mean it's a 3d stacked chip? Come on. Good luck cooling a +95W die stacked CPU. Lakefield is only 5-7W TDP so there's no problem in cooling stacked dies at this low power.

Add your own comment

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

40 Comments on Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

Related News

40 Comments on Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts