• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Alder Lake" CPU Core Segmentation Sketched

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,300 (7.53/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel's 12th Gen Core "Alder Lake-S" desktop processors in the LGA1700 package could see the desktop debut of Intel's Hybrid Technology that it introduced with the mobile segment "Lakefield" processor. Analogous to Arm big.LITTLE, Intel Hybrid Technology is a multi-core processor topology that sees the combination of high-performance CPU cores with smaller high-efficiency cores that keep the PC ticking through the vast majority of the time/tasks when the high-performance cores aren't needed and hence power-gated. The high-performance cores are woken up only as needed. "Lakefield" combines one "Sunny Cove" high-performance core with four "Tremont" low-power cores. "Alder Lake-S" will take this concept further.

According to Intel slides leaked to the web by HXL (aka @9550pro), the 10 nm-class "Alder Lake-S" silicon will physically feature 8 "Golden Cove" high-performance cores, and 8 "Gracemont" low-power cores, along with a Gen12 iGPU that comes in three tiers - GT0 (iGPU disabled), GT1 (some execution units disabled), and GT2 (all execution units enabled). In its top trim with 125 W TDP, "Alder Lake-S" will be a "16-core" processor with 8 each of "Golden Cove" and "Gracemont" cores enabled. There will be 80 W TDP models with the same 8+8 core configuration, which are probably "locked" parts. Lastly, there the lower wrungs of the product stack will completely lack "small" cores, and be 6+0, with only high-performance cores. A recurring theme with all parts is the GT1 trim of the Gen12 iGPU.



Intel is innovating a way to reconcile the vast feature-set and ISA differences between its "big" and "small" cores. The big "Golden Cove" core supports certain AVX-512 instructions, besides TSX-NI (tensor operations, matrix multiplication), and FP16 (half precision floating point). The smaller "Gracemont" core lacks these instruction sets. So whenever the OS sends traffic that requires these instructions, the processor will be forced to wake up a "Golden Cove" core, and additional such cores as needed.

A quick reminder of the LGA1700 socket - this platform could see Intel introducing PCI-Express 5.0 I/O. There's also a possibility of DDR5 unbuffered memory support. The significant increase in pin-count for the mainstream-desktop segment is probably attributable to a Ryzen-like nucleation of platform I/O over from the PCH to the CPU socket, along with more CPU-attached PCIe lanes.

View at TechPowerUp Main Site
 
Joined
Feb 15, 2019
Messages
1,666 (0.78/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
What is the point of big.LITTLE on Desktop Platform ?
 
Joined
Mar 23, 2016
Messages
4,844 (1.52/day)
Processor Core i7-13700
Motherboard MSI Z790 Gaming Plus WiFi
Cooling Cooler Master RGB something
Memory Corsair DDR5-6000 small OC to 6200
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500GB,,WD850N 2TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse Logitech G502 Hero
Keyboard Logitech G G413 Silver
Software Windows 11 Professional v23H2
Joined
Mar 28, 2020
Messages
1,761 (1.02/day)
Energy efficiency?

This may be true, but is not a need for desktop processors that pulls power from the mains. To me there may be 2 issues here,

(1) Another layer of software optimization required for switching between high and low performance cores, which may cause issues due to buggy driver/ OS
(2) Consumer are forced to pay Intel extra for the Tremont cores that they don't need

Maybe to offer "More cores" in PR materials to compete with AMD?
I was thinking the same thing as well. Just so that they can advertise as having "up to 16 cores".
 
Joined
Feb 3, 2017
Messages
3,822 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
That "available only when the big cores are enabled" sounds suspiciously like it should say when only big cores are enabled.
 
Joined
Jun 29, 2018
Messages
542 (0.23/day)
That "available only when the big cores are enabled" sounds suspiciously like it should say when only big cores are enabled.

That's my interpretation as well. I highly doubt that it will dynamically switch execution to big cores when AVX-512 is used. This would end in applications that try to detect AVX-512 at startup being locked to the big cores forever. Also I'm not sure if there's any OS capable of scheduling processes in an environment like that (different cores having different instruction sets), but I might be wrong ;)
 
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
That's my interpretation as well. I highly doubt that it will dynamically switch execution to big cores when AVX-512 is used. This would end in applications that try to detect AVX-512 at startup being locked to the big cores forever. Also I'm not sure if there's any OS capable of scheduling processes in an environment like that (different cores having different instruction sets), but I might be wrong ;)
Not sure why that'd be an issue, it really doesn't depend on the OS as much as the application & if Intel try to shoehorn this (AVX512) into something like a Lakefiled it'll fail harder than ever!
 
Joined
Jun 29, 2018
Messages
542 (0.23/day)
Not sure why that'd be an issue, it really doesn't depend on the OS as much as the application & if Intel try to shoehorn this (AVX512) into something like a Lakefiled it'll fail harder than ever!

Well it's the OS' responsibility to schedule processes onto cores. So either the OS becomes aware of differing instruction sets or somehow it passes this responsibility to the hardware. Either way will require OS modification and will make older systems unable to utilize this new CPU fully.
 
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
It will that's a given & if the application supports it (AVX512) then ideally it ought to load the big core anyway, unless power or thermal constrained. We've had this same debate what 2 or 3 years back with the AIDA64 developer(?) & the consensus was similar to what I see in this thread, of course almost everyone agreed that big.LITTLE won't come to pass on x86 & which obviously didn't turn out so well. As for application being locked to a single core, that's the job of the scheduler & I don't remember anything abut instruction sets having a say in how processes & threads runs on an(y) OS.
 
Joined
Jun 10, 2014
Messages
2,995 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
I'm not a fan of this hybrid technology. I don't believe it belongs in desktop computers, and will contribute to making scheduling harder, especially with different ISA features on the various cores. It's hard to tell, this could turn out okay, or very bad (like Itanium).

Well it's the OS' responsibility to schedule processes onto cores. So either the OS becomes aware of differing instruction sets or somehow it passes this responsibility to the hardware. Either way will require OS modification and will make older systems unable to utilize this new CPU fully.
It's not a problem to query a core to find out all the supported ISA features. I just hope the executables have this flagged as well.
 
Joined
Feb 3, 2017
Messages
3,822 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
The implementation will likely be the same as ARM's BIG.little - same ISA features across all cores. This makes scheduling a lot easier. I don't really see Intel having a choice especially when Lakefield has already shown scheduling on it (at least on Windows) is tricky and needs refinement.
 
Joined
Apr 24, 2020
Messages
2,723 (1.60/day)
I'm trying to figure out how the CPU decides if a thread / process needs AVX512 or not.

I know that there's "vzeroupper", which helps differentiate between SSE and AVX code. If a 256-bit register is "half full" of zeros and flagged by vzeroupper, then Windows knows to only save 128-bits instead of 256-bits between context switches. (https://community.intel.com/t5/Inte...-is-the-status-of-VZEROUPPER-use/td-p/1098375).

I'd imagine that a similar flag is used for AVX512. Saving 1/2 or 1/4th the registers is certainly a noble goal and probably already implemented in Linux and Windows. I'm not expert enough to know if that's the case for sure... but that'd be my guess for what Intel is going for here.
 
Joined
Jul 7, 2019
Messages
141 (0.07/day)
Big Little for desktop, the reason is pretty clear, their x86 uArch innovation stagnated along with their lithography node R&D. Rocket Lake leaks give us hints already - Ring Bus. RKL has odd HT vs Physical Core design, that's the only thing which comes to my mind, they do not have that Ring Bus scaling with their post Skylake Architectures so they are relying on less cores with high clock speed scaling and ST performance at the loss of HT performance, probably to keep them relevant in gaming. But this is going to hit them in the SMT performance again, with Consoles going Zen 2 based CPUs and more people buying high core parts, this is not good at all, AMD's SMT is already very strong, Ryzen 4000 will probably decimate Intel Z400 and Z500, esp the Z500 doesn't have the damn Gen 4 lanes from Chipset. Horrible, since X570 did it 1 year back.

This doesn't have any damned benefit in the Desktop LGA processors, even in the Alienware Area51M series or Clevo P870DM series LGA notebooks nobody gives a fuck about the damn big little like phones, where the li-ion battery power sipping increases rapidly by the higher performance cores in the ARM SoC along with ton of other dedicated modules for RF/GPU/Memory etc. Maybe their Mobile might benefit but still at the loss of powerful cores it's a hogwash, when AMD's BGA processors are beating Intel BGA lineup at perf/efficiency, loss - loss unless the ST performance of 8 physical cores is higher along with those 8 or 4 HT cores (RKL has 4 cores HT disabled as per rumors) .

This requires a lot of OS work AGAIN, AMD's NUMA processors had already seen their lack of adoption even AMD abandoned them, X399 didn't have support for the TR3000, and afaik only Milan moved more parts to the powerful cloud service providers like AWS. So Apple also probably thought along with their R&D cash into A series ARM processors a huge waste of money to put into OS rewrite when their Mac sales are also just 10% of their profit cut better spend it on their own x86-ARM translation and A series SoC since many users are into ultra thin and light and don't care about BGA BS or not.

This doesn't paint a good picture as Intel doesn't have any confidence in their lineup also this looks like a temporary band aid again on the LGA1700. I hope AMD doesn't chase this bullshit and stay true to their Desktop performance x86 leadership. TBH This won't make to Xeon for sure, having a cheap arse crappy cores on the Xeon means server OS / Software / HW changes NO ONE wants to do that. Esp when Ryzen is piledriving and steamrolling with their EPYC and RYZEN CPUs on both Server and Consumer DIY.
 
Joined
Aug 7, 2020
Messages
10 (0.01/day)
Intel's oneAPI could conceptually be extended to enable requesting a device of a different CPU type as easily as requesting a GPU, NNP or FPGA accelerator.
 
Joined
Apr 24, 2020
Messages
2,723 (1.60/day)
AMD's NUMA processors had already seen their lack of adoption even AMD abandoned them,

Note: Zen2 is still NUMA. Its UMA-mode however is just far faster than Zen 1 or Zen+ was. As such, it is acceptable to run Zen2 in its UMA mode.

Netflix still uses Zen2 in its NUMA mode for far faster performance. See this for details: https://people.freebsd.org/~gallatin/talks/euro2019.pdf

Many programmers are unaware of the benefits of NUMA. So what Zen2 proves is that your UMA-mode needs to be reasonably fast, but not necessarily as fast as your NUMA mode. For the few programmers willing to go the extra mile and NUMA-optimize their code, NUMA will likely remain the fastest way of doing things as chiplets continue to become more common.

----------

Intel also does SubNumaClustering (SNC) for a similar effect on Skylake / Cascade Lake systems. When you have 16, 32, 64 cores... it turns out that some RAM locations are "closer" than others depending on the core you're using. The reality of NUMA is inevitable as we get more and more cores.

The question is if UMA-emulation (by round-robin distributing the data across all memory controllers) will remain fast enough that we can ignore the difficulties of NUMA in typical workloads. (IE: Zen2). But NUMA is the underlying ground truth of the physics and reality of these chips.
 
Last edited:
Joined
Aug 7, 2020
Messages
10 (0.01/day)
2) Consumer are forced to pay Intel extra for the Tremont cores that they don't need


I was thinking the same thing as well. Just so that they can advertise as having "up to 16 cores".

Gracemont adds avx2, I believe. It will be interesting to see if the avx2 operates comparatively well vs the AMD avx2. If so, then Intel should be able to match AMD simd processing results but with reduced chip area and power vs use of Intel large cores.

Tremont cores were about 1/4 the area of Sunny Cove cores on Lakefield, based on die pictures I've seen posted.
 
Last edited by a moderator:
Top