Wednesday, May 5th 2021
Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration
Intel's upcoming Alder Lake generation of processors is going to be the first iteration of heterogeneous x86 architecture. That means that Intel will for the first time combine smaller, low-power cores, with some big high-performance cores to provide the boost to all the workloads. If a task doesn't need much power, as some background task, for example, the smaller cores are used. And if you need to render something or you want to fire up a game, big cores are used to provide the power needed for the tasks. Intel has decided to provide such an architecture on the advanced 10 nm SuperFin, which represents a major upgrade over the existing 14 nm process.
Today, we got some information from Igor's Lab, showing the leaked specification of the Intel Core-1800 processor engineering sample. While this may not represent the final name, we see that the leaked information shows that the processor is B0 stepping. That means that the CPU will see more changes when the final sample arrives. The CPU has 16 cores with 24 threads. Eight of those cores are big ones with hyperthreading, while the remaining 8 are smaller Atom cores. They are running at the base clock of 1800 MHz, while the boost speeds are 4.6 GHz with two cores, 4.4 GHz with four cores, and 4.2 GHz with 6 cores. When all cores are used, the boost speed is locked at 4.0 GHz. The CPU has a PL1 TDP of 125 Watts, while the PL2 configuration boosts the TDP to 228 Watts. The CPU was reportedly running at 1.3147 Volts during the test. You can check out the complete datasheet below.
Sources:
Igor's LAB, via VideoCardz
Today, we got some information from Igor's Lab, showing the leaked specification of the Intel Core-1800 processor engineering sample. While this may not represent the final name, we see that the leaked information shows that the processor is B0 stepping. That means that the CPU will see more changes when the final sample arrives. The CPU has 16 cores with 24 threads. Eight of those cores are big ones with hyperthreading, while the remaining 8 are smaller Atom cores. They are running at the base clock of 1800 MHz, while the boost speeds are 4.6 GHz with two cores, 4.4 GHz with four cores, and 4.2 GHz with 6 cores. When all cores are used, the boost speed is locked at 4.0 GHz. The CPU has a PL1 TDP of 125 Watts, while the PL2 configuration boosts the TDP to 228 Watts. The CPU was reportedly running at 1.3147 Volts during the test. You can check out the complete datasheet below.
46 Comments on Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration
Atom cores? hmmmm Not sure How I feel about that...
Bigger issue here is that we've only had one prior preview of this big-little setup in Lakefield and it was pathetic, because Intel treated the big core as an addon to the Atoms. Say what you will about the exponential difference in TDP between Lakefield and Alder Lake etc. but regardless, Alder Lake both needs to be much more appreciative of its Golden Cove cores, and the Windows scheduler also needs to accommodate that and avoid a repeat of Lakefield.
Even though it doesn't do big-little, Ryzen has also taken on some of these ideas as its scheduler has improved over time. Dual CCD Ryzen 5000 segregates the two groups of cores pretty clearly in practice. Even though CCD1 always has the 2 CPPC priority cores, the Ryzens seem to offload a surprising amount of light background processing to one or two preferred CCD2 cores when lightly loaded.
big.LITTLE can only work effectively in low power mobile devices where you're fine with things running sub optimally when the device idles or stuff like that. On a desktop you typically want high performance all the time.
Having stuff like maybe the browser running on the low power cores sounds good but it almost never works like it should. Because how do you know that ? You can do stuff like like maybe target code that only contains 32 bit instructions on the small cores and code that contains SIMD on the big cores but it's complicated and it's not gonna work most of the time because applications mix and match different kinds of workloads.
The scheduler needs time, and the purported release schedule for Alder Lake doesn't allow for a lot of that. Intel is better positioned than AMD to have MS listen to its needs, but the Rocket Lake launch clearly proved that Intel isn't always 100% prepared on microcode either.
Even though Lakefield might technically have served as Intel's first foray into hybrid arch, if Intel simply follows Lakefield's philosophy then like I said Alder Lake is going to be one hell of a ride.
I feel like this is the exact same situation, I suspect that the engineers know that this architecture makes no sense on a desktop but the management wants a marketable product with many cores because the competition is totally crushing them in that department.
I just wonder how CPU-Z and Task Manager will report on it, really Task Manager should sepparate it in 3 sets of graphs for each.
P.S. It will be reported as a Freak :D
Besides the point that I think hybrid cores are a nonsense feature on desktops, I think Intel made a huge mistake by having different ISA support on the cores. It would be much better if the slow had all the same features, including AVX-512, but implemented them using more clock cycles if needed. Then the scheduler wouldn't have to worry, and only use heuristics to move around threads. I haven't yet found details on how schedulers will determine whether an application will use AVX or not. Pretty much any user application would have to run on the fast cores, especially the web browser. Otherwise the user experience will be painfully slow. The engineers at Intel & AMD don't write the scheduler of Windows.
Also remember that this scheduler has to be flexible enough to work with all the supported microarchitectures.
125 W PL1, while PL2 is up in the sky again. If it was really such a major upgrade, we'd se way over 5 GHz turbo speeds with this power consumption... unless IPC has drastically increased over Rocket Lake, which I highly doubt. Even in that case, I'd much rather see more modest clock speeds with more realistic power targets (and no PL2 - nobody plays games or does any work in 56-second intervals). So far, I'm not impressed.
Barely know anything but latency could be atrocious, unless Intel push Microsoft in their bidding to be aware of smaller cores. On top of that, read ahead on this version of Windows is abysmal at worst, so they might overhaul Windows entirely of just launch a new version of it.
-= edited=-
That 10 nm SuperFin really less appealing when compared to regular 12c24t or 16c32t at 14nm.
But with windows machines, you're talking a fraction of the power it uses. Even with the atom cores these alder lake chips are going to suck down power on the desktop, and in mobile AMD's arch is still better, and zen 4 is going to pull further ahead.
This is a solution to a problem nobody wanted. If they could produce atom cores with skylake performance, they could have just made atom processors that performed as well as regular coffee lake CPUs without the huge power draw. The fact they didnt makes me very ssuspect of how capable these cores really are.
I'm thinking the opposite 16c/24t is an actual selling point vs 5900x or 5950x for Intel ~ the chance of this SKU (125W TDP) making it to notebooks is close to zero!
Why does a thread running on small core even need to run AVX, if it doesn't support it?
I might have to get one of the hybrid ones though, but only because I might have to verify some software. By software(?) emulation I presume you mean that the CPU frontend will translate it into different instructions (hardware emulation), which is what modern x86 microarchitectures does already; all FPU, MMX, SSE instructions are converted to AVX. This is also how legacy instructions are implemented.
But there will be challenges when there isn't a binary compatible translation, e.g. FMA operations. Doing these separately will result in rounding errors. There are also various other shuffling etc. operations in AVX which will require a lot of instructions to achieve. In such cases I do wonder if the CPU will just freeze the thread and ask the scheduler to move it, because this detection has to happen on the hardware side.
One additional aspect to consider, is that Linux distributions are moving to shipping versions where the entire software repositories are compiled with e.g. AVX2 optimizations, so virtually nothing can use the weak cores, so clearly Intel made a really foolish move here. Developers have fairly little control over this in normal Windows or Linux environments. They can control how many threads are spawned, set attributes like affinity and priority, and of course detect CPU features such as core count, SMT, etc. at runtime. But the actual management of this is done by the OS scheduler.
Browsers like Chrome already spawn an incredible amount of threads. Normally, very few of these have any major load, but the browser may still need to synchronize them, so this can be an issue if important low-load threads end up cores which are slow to respond. I know Chrome gets slow due to high thread count long before CPU load with high tab count. Windows and the default Linux kernel have very little x86 specific code, and even less specific to particular microarchitectures. While you certainly can compile your own Linux kernel with a different scheduler, compile time arguments and CPU optimizations, this is something you have to do yourself and keep repeating every time you want kernel patches.
So with a few exceptions, the OS schedulers are running mostly generic code.
They do however as the dragon tamer said in your link, do a lot of heuristics and adjustments in runtime, including moving threads around for distributing heat. Whether these algorithms are "optimal" or not depends on the workload. We'll see if this changes when Intel and AMD releases hybrid designs, you better prepare for a bumpy ride.