• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
A big.little strategy could work for x86 but the devil will be in the details about how quickly the CPU can transition processes between cores when there is such a performance disparity between the little and big cores. Big.little works as a power saving measure because leakage current scales with transistor numbers, so very large cores have much higher leakage current than smaller cores. This puts a floor in how low processors can drop their power consumption during idle, and this effect gets worse with smaller process nodes. If the presence of small cores allows the processor to completely power down the larger cores during light usage scenarios, then power consumption during light usage will be lower. But, for highly variable loads like gaming, the time it takes to move processes from small to large cores will likely lead to degraded performance and prevent any measurable power saving.

A more sophisticated way would be to allow each core to power off certain elements within the core when not required. For example, powering off a FPU unit when not required, or half the L2 and L3 cache when not needed. But that doesn't allow marketing to scream 'MOAR CORES!' so that option is off the table.
That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.
 
Joined
Apr 29, 2020
Messages
141 (0.08/day)
CPUs have been doing this for over a decade already...

I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
ARM's marketing material promises up to a 75% savings in power usage for some activities.[2]



Serious answers are not available and its the same sentiment everywhere right? We really don't know anything other than 'it uses Big little'. We can speculate :)

More cores equals more power used. And from that conclusion... its easy to draw other conclusions. Such as:
1. Windows scheduler and good allocation of workloads will be the key to gaining an advantage over other products
2. Intel's goal must be: faster when its needed (it can turbo high), fall back on little when possible (big cores can cool down and clear TDP budget for a new boost). Any other approach is not feasible, because then they are not competitive against stripped AND full fat performance cores.
4. A new reduction of base clocks on the BIG cores is likely, to clear more TDP headroom for turbo. Or maybe even dial back entirely to idle clock, some 800 mhz, and just have a turbo on top of that. Or maybe fully shut down, but I'm then thinking of latency problems.

So, using the cores at the same time will bring what advantage exactly? I'm not seeing it, do you? For this product to be viable, it needs to be better than either variant of the cores used in it. 8 fast and 8 slow cores are still worse than 16 regular ones at base clock, I reckon...

Interesting stuff indeed :) What I personally think is that Alder Lake is a way to get 10nm dies out that were planned anyway, and still keep competitive product across the whole stack. Forget 'glued together', Intel is going full scrapyard dive. It also confirms yet again that 10nm scales like shit into performance territory.


There are 3 ways of arranging and using BIG.little:
1. Clustered switching - the described by you - either big cores or small cores and never at the same time;
2. In-kernel switcher - when a big and a small cores are coupled into pairs, so with 8 + 8 you would have something like 8 big cores + hyper-threading enabled;
and the third:
3. Heterogenous multi-processing (global scheduling):

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[10][11]
This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),[12][13] and Apple mobile application processors starting with the Apple A11.[14]
https://en.wikipedia.org/wiki/ARM_big.LITTLE#cite_note-14

1588627347859.png



The thing also which should be considered is that you have frequency wall on the 14nm process, so no matter the approach, more performance would not be possible.

And the whole approach will still be inferior to Zen 3 and Zen 4, especially with 16 big cores (or double) with SMT.
 
Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.



Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.
I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.


Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,182 (2.36/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That just exposes their process and architectural failings in the past few years. Intel was the first in Skylake and Kaby to push for a smarter, more responsive core with Speed Shift. Yet, AMD completely showed them up with Matisse and how much a practical difference can be made with a core that responds to loads in 2ms. Renoir's monolithic die enabled a dynamically clocked IF, and that combined with Zen 2's signature CPPC2 features resulted in the 50-100% improvement in battery life in direct comparison to similarly specced Coffee Lake, in light and moderate workloads. You don't see AMD resorting to Jaguar on half the die to maintain efficiency at low loads, or power gate half of a Zen 2 core to gimp it down to Puma-level performance just to save 2/10ths of a watt at low loads. Matisse and Renoir already have the best of both worlds, they don't need to be that desperate.

This "allowing a Skylake core to morph into a Goldmont core" isn't happening. Intel moved to a considerably larger core with Sunny Cove (and if the rumors are true, even bigger in Willow Cove) in order to leverage that performance over traditional Core, to stay competitive. All these Alder Lake rumors reek of Intel engineers finally giving up on trying to optimize this larger Core for efficiency because their 10nm+ process still isn't worth a damn and 7nm is nowhere in sight, and instead turning to shitty Atom for the lower end of the power spectrum.

Intel can forget trying to turn off half a core, running cores at 25MHz, or juggling Atom and Core on the same substrate, if they can't even get their own Speed Shift technology down to where it rivals AMD's CPPC2. That's a prerequisite to all this nonsense. And if they do in fact perfect that concept, that would just enable Tiger Lake to perform in an adaptive manner as Renoir does, so then what's the point of using Goldmont? Mainstream consumers want an thin and light notebook that draws power like it's not even on when at idle, but ramps up to provide the requisite performance at a moment's notice. What Renoir is capable of, hits that nail right on the head.

And then there's the Windows scheduler, the worst cockblock of all.
 
Last edited:
Joined
Oct 22, 2014
Messages
14,170 (3.81/day)
Location
Sunshine Coast
System Name H7 Flow 2024
Processor AMD 5800X3D
Motherboard Asus X570 Tough Gaming
Cooling Custom liquid
Memory 32 GB DDR4
Video Card(s) Intel ARC A750
Storage Crucial P5 Plus 2TB.
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Eweadn Mechanical
Software W11 Pro 64 bit
The way I see it Intel will use Cluster Switching to achieve low clocks/ power consumption during idle low intensity usage, switcing to Big cores during high usage.
I could be wrong but I don't think all cores will be usable at he same time.
 
Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.
They already do that too.

A lot of the obvious stuff has already been done.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
They already do that too.

A lot of the obvious stuff has already been done.


Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.91/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)


Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.91/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.


It is not that extremely low - it drains my battery like no tomorrow. And it has only roughly 40-50% of the efficiency achieved in Renoir.
Renoir is the benchmark which we should compare everything else with.


Actually, laptops have much larger batteries than phones and despite this, the phones can last for weeks in standby, while poor laptops in the best case can last half a day in standby.
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
There's no way Alder Lake is using 3D die stacking.


But Lakefield with 22nm/10nm is exactly this. 22nm base field, 10nm compute field.
1 big Sunny Cove core + 4 small Tremont cores.
This is the so called non-symmetric grouping heterogeneous multi-core of BIG.little cores.

 
Joined
May 3, 2016
Messages
137 (0.04/day)
Just because this uses big+small cores does it mean it's a 3d stacked chip? Come on. Good luck cooling a +95W die stacked CPU. Lakefield is only 5-7W TDP so there's no problem in cooling stacked dies at this low power.
 
Top