Wednesday, May 5th 2021

Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

May 5th, 2021 03:14 Discuss (46 Comments)

Intel's upcoming Alder Lake generation of processors is going to be the first iteration of heterogeneous x86 architecture. That means that Intel will for the first time combine smaller, low-power cores, with some big high-performance cores to provide the boost to all the workloads. If a task doesn't need much power, as some background task, for example, the smaller cores are used. And if you need to render something or you want to fire up a game, big cores are used to provide the power needed for the tasks. Intel has decided to provide such an architecture on the advanced 10 nm SuperFin, which represents a major upgrade over the existing 14 nm process.

Today, we got some information from Igor's Lab, showing the leaked specification of the Intel Core-1800 processor engineering sample. While this may not represent the final name, we see that the leaked information shows that the processor is B0 stepping. That means that the CPU will see more changes when the final sample arrives. The CPU has 16 cores with 24 threads. Eight of those cores are big ones with hyperthreading, while the remaining 8 are smaller Atom cores. They are running at the base clock of 1800 MHz, while the boost speeds are 4.6 GHz with two cores, 4.4 GHz with four cores, and 4.2 GHz with 6 cores. When all cores are used, the boost speed is locked at 4.0 GHz. The CPU has a PL1 TDP of 125 Watts, while the PL2 configuration boosts the TDP to 228 Watts. The CPU was reportedly running at 1.3147 Volts during the test. You can check out the complete datasheet below.

Sources: Igor's LAB, via VideoCardz

Add your own comment

46 Comments on Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

#26

lexluthermiester

1d10tOn top of that, read ahead on this version of Windows is abysmal at worst, so they might overhaul Windows entirely of just launch a new version of it.

Very doubtful. There is no need for a new version of Windows as all existing modern versions of Windows have drivers for and know how to use both sets of the CPU's in question.

voltageThe biggest mistake Intel can make for this release is not releasing Alder Lake with DDR5 from the start.

Disagreed. DDR4 is perfectly acceptable.

#27

TheUn4seen

Heterogeneous cores?! It's a freeeeak, grab the pitchforks!

The way I see it, it's great for mobile, gives even more flexibility as far as power usage goes. For desktops, well, meh. My old and certainly not the fastest or most power efficient 9700k idles at around 10W (reported "package power") and doesn't really cross 40W during normal work, so the Mugen 5 handles it almost passively. I recently started using the 9600k and it idles even lower, even when running Windows with all the data stealing shenanigans going on in the background.
Now let's just hope that Microsoft can get their lazy asses to work on a reasonable scheduler. Linux hippies figured it out years ago.

#28

WhitetailAni

TheUn4seenNow let's just hope that Microsoft can get their lazy asses to work on a reasonable scheduler. Linux hippies figured it out years ago.

Maybe they'll have to do the custom power plans again like they did for Ryzen.

Anyway, for this CPU on desktop.
Idle power consumption isn't a big deal on desktop - 10W vs 15W is basically nothing.

Also, I can't wait for people complaining that "my CPU isn't being used 100%". Gotta add another copy+paste reply to my repository.

Here's my view on big.LITTLE.
It's a great idea for laptops. Less power consumption = less heat = better laptop because you aren't burning people's legs/hands/etc.
Not so much desktops. There's absolutely no point, really.

#29

Tablet

the whole is greater than the sum of the parts

#30

efikkan

RealKGBAnyway, for this CPU on desktop.
Idle power consumption isn't a big deal on desktop - 10W vs 15W is basically nothing.

Here's my view on big.LITTLE.
It's a great idea for laptops. Less power consumption = less heat = better laptop because you aren't burning people's legs/hands/etc.
Not so much desktops. There's absolutely no point, really.

Totally agree.
But you have to realize that these are primarily designed for the OEM market. Dell, HP, Lenovo etc. will love to sell 16c/24t "5 GHz" 65W TDP CPUs in their tiny boxes with undersized cooling and PSU.

Power users should probably be looking at the HEDT segment anyways, and not just to get more unleashed CPU cores, but also IO like more SSDs etc.

#31

Axaion

On the other hand, id rather have ddr4 for alder lake due to maturity, ddr5 is not really yet yet, maybe it is in a year or so

#32

ADB1979

Vya DomusEven if they have similar IPC to Skylake, they're gonna run at lower clocks and still be pretty slow. No matter how much you mess with the scheduler the small cores will range from worthless (the scheduler never prioritizes them) to detrimental (the scheduler places the wrong threads on them).

big.LITTLE can only work effectively in low power mobile devices where you're fine with things running sub optimally when the device idles or stuff like that. On a desktop you typically want high performance all the time.

Having stuff like maybe the browser running on the low power cores sounds good but it almost never works like it should. Because how do you know that ? You can do stuff like like maybe target code that only contains 32 bit instructions on the small cores and code that contains SIMD on the big cores but it's complicated and it's not gonna work most of the time because applications mix and match different kinds of workloads.

A lot of your points were talked about by Dr Ian Cutress, the scheduler is going to be a real problem with these new CPU's and will take a while to get ironed out, Ian also talks about various other things as well, interesting and useful video.

#33

iBruceypoo

Wondering how large the Alder Lake DDR5 latency penalty will be?

RKL is about 10ns - I'll be very happy to get an RKL DDR4 AIDA 64 latency timing somewhere within the 40ns - 50ns range. :ohwell:

Right now, I'm really loving the 11600K (moving from an 8086K) just discovering it's potential, the IPC increase 8th gen to 11th gen is extremely apparent - so I can wait for Raptor Lake next year and give DDR5 some time to mature, before buying.

#34

Makaveli

iBruceypooWondering how large the Alder Lake DDR5 latency penalty will be?

RKL is about 10ns - I'll be very happy to get an RKL DDR4 AIDA 64 latency timing somewhere within the 40ns - 50ns range. :ohwell:

Think I'll wait for Raptor Lake next year and give DDR5 some time to mature, before buying.

I would be more concerned about actual performance than AIDA 64 latency numbers.

Comet Lake has better latency than Zen 3 in Aida yet is slower. And with all first gen memory it will most likely be slower than DDR4 at the start. The main thing of DDR5 is to bring more bandwidth.

#35

iBruceypoo

MakaveliI would be more concerned about actual performance than AIDA 64 latency numbers.

Comet Lake has better latency than Zen 3 in Aida yet is slower. And with all first gen memory it will most likely be slower than DDR4 at the start. The main thing of DDR5 is to bring more bandwidth.

Agree 100%.

I'm kinda a lover of low latency and track racer responsiveness at low Qdepth, thus the Optane SSD in my build and my work apps never exceed 9threads maximum (light load). :)

So 6cores 12faster threads offer more for my workflow. "I have no problem buying an i5 when Intel doesn't bring their A-game"

Loving the 11600K Air-Cooled. :love:

------

AMD is doing such an amazing job with IPC, my hat is off to them, this is an amazing and exciting time for CPU development.

I'm all in for both camps red and blue! :)

#36

efikkan

MakaveliI would be more concerned about actual performance than AIDA 64 latency numbers.

Yes, synthetics are for technical discussions.
Buying decisions on the other hand should be dictated by real world performance.

I haven't studied the differences in the signaling protocols between DDR4 and DDR5, and all the various latencies involved, but I believe it doubles the banks. So there might be access patterns which are faster and some that are slower. Time will tell.

I'm actually more concerned about price and availability. What will a lot of you do if DDR5 is scarce when Alder Lake ships?

#37

Makaveli

efikkanYes, synthetics are for technical discussions.
Buying decisions on the other hand should be dictated by real world performance.

I haven't studied the differences in the signaling protocols between DDR4 and DDR5, and all the various latencies involved, but I believe it doubles the banks. So there might be access patterns which are faster and some that are slower. Time will tell.

I'm actually more concerned about price and availability. What will a lot of you do if DDR5 is scarce when Alder Lake ships?

It being scarce will not be a problem for me personally as I don't intend on upgrading my Rig any time soon. When that time arrives I will not be looking at all the first gen products. For someone looking to build a new rig in the coming months it will be a concern I guess.

#38

Minus Infinity

I'm not sure anyone should be rushing out to get first gen DDR5 if first gen DDR4 was anything to go on. Those original DDR4 chips had pretty bad latencies and it took a good year or so to start getting memory that was noticeably better than the best of the fastest DDR3. Also there will a huge premium for early adapters most likely.

I'm sure Alder Lake will be competitive especially with it's promised 100% multithreaded performance uplift (I presume compared to Skylake) and the Gracemont cores are pretty good according to Moore's Law is Dead, about 2/3rds the peformance of Skylake, The PL2 power state seems rather poor though, was expecting much better, but let's wait and see.

#39

chrcoluk

Vya DomusThey can handle it, I just pointed out that the best they can do is prevent the small cores from tanking performance.

It doesn't take an army of engineers to know that there is no "correct" solution to this. And you're making a wrong assumption here, even if the engineers know better the end product can still be a failure. I am sure the engineers knew how to build a better processor back in the day when they came up with Netburst but the end result was obviously terrible because the upper management wanted a marketable product with more Ghz on the box than the competition. See, it's not that simple.

I feel like this is the exact same situation, I suspect that the engineers know that this architecture makes no sense on a desktop but the management wants a marketable product with many cores because the competition is totally crushing them in that department.

I think you have nailed it, it seems a marketing solution not an engineering one, even on phones it doesnt really work well, but kind of just accepted as there is a recognition of the constraints been worked with and the need to stretch out battery life. Usually on my phone I root it and adjust scheduler to stop using small cores.

I expect on the PC we will get people trying to disable the small cores as much as possible. Might even be affected by profile so e.gl in "high performance" profile it doesnt schedule anything to small cores.

#40

1d10t

lexluthermiesterVery doubtful. There is no need for a new version of Windows as all existing modern versions of Windows have drivers for and know how to use both sets of the CPU's in question.

Given that Intel is closer to Microsoft rather than AMD, and also considering that Windows 5 year cycle has overdue, I still think its possible. Microsoft also baked their own in-house chip based on ARM, and I don't think they're reckless enough to not provide an OS that natively supports it.

#41

Sabishii Hito

iBruceypooI'm all in for both camps red and blue! :)

#42

Unregistered

Sabishii Hito

We should do that here.

#43

Wirko

1d10tMicrosoft also baked their own in-house chip based on ARM, and I don't think they're reckless enough to not provide an OS that natively supports it.

If they are any clever, they're now using that sur-snap-face-dragon as a great learning tool, and they will figure out what (mostly) proper scheduling looks like by 2022.

efikkanBy software(?) emulation I presume you mean that the CPU frontend will translate it into different instructions (hardware emulation), which is what modern x86 microarchitectures does already; all FPU, MMX, SSE instructions are converted to AVX. This is also how legacy instructions are implemented.

But there will be challenges when there isn't a binary compatible translation, e.g. FMA operations. Doing these separately will result in rounding errors. There are also various other shuffling etc. operations in AVX which will require a lot of instructions to achieve.

I mean emulation in software. The basic mechanism exists in all modern CPUs: if the decoder encounters an unknown instruction, an interrupt is triggered and the interrupt handler can do calculations instead of that instruction. Obviously, AVX-512 registers have to be replaced by data structures in memory. That's utterly inefficient but would prevent the thread and the process from dying.

efikkanIn such cases I do wonder if the CPU will just freeze the thread and ask the scheduler to move it, because this detection has to happen on the hardware side.

Is that possible in today's CPUs?

efikkanOne additional aspect to consider, is that Linux distributions are moving to shipping versions where the entire software repositories are compiled with e.g. AVX2 optimizations, so virtually nothing can use the weak cores, so clearly Intel made a really foolish move here.

Small cores are supposed to have AVX2 (but it's still a guess).

efikkanWindows and the default Linux kernel have very little x86 specific code, and even less specific to particular microarchitectures. While you certainly can compile your own Linux kernel with a different scheduler, compile time arguments and CPU optimizations, this is something you have to do yourself and keep repeating every time you want kernel patches.

So with a few exceptions, the OS schedulers are running mostly generic code.
They do however as the dragon tamer said in your link, do a lot of heuristics and adjustments in runtime, including moving threads around for distributing heat. Whether these algorithms are "optimal" or not depends on the workload.

I certainly don't understand much of the description of Linux CFS (this one or others) but it seems to be pretty much configurable, with all those "domains" and "groups" and stuff. The code itself can be universal but can still account for specifics by means of parameters, like those than can be obtained by the cpuinfo command.

efikkanWe'll see if this changes when Intel and AMD releases hybrid designs, you better prepare for a bumpy ride.

Yes sir. I just believe there will be a smooth ride after a period of bumpy ride. (And later, some turbulent flight when MS inevitably issues a Windows update that's been thoroughly alpha-tested.)

chrcolukI think you have nailed it, it seems a marketing solution not an engineering one

The engineering (and business) decision that we see in every generation of CPUs is to have as few variants of silicon as possible. It looks like the design and validation and photomask production and such things, those that need to be done repeatedly for each variant and each stepping, are horribly expensive. So Intel may decide to bake only two variants, for example, 8 big + 8 small + IGP and 4 big + 8 small + IGP, and break down these two into a hundred different laptop and desktop chips.

chrcolukI expect on the PC we will get people trying to disable the small cores as much as possible. Might even be affected by profile so e.gl in "high performance" profile it doesnt schedule anything to small cores.

Who knows, we may even get a BIOS option to disable them.

#44

efikkan

WirkoI mean emulation in software. The basic mechanism exists in all modern CPUs: if the decoder encounters an unknown instruction, an interrupt is triggered and the interrupt handler can do calculations instead of that instruction.

I'm skeptical about the feasibility of this.

WirkoObviously, AVX-512 registers have to be replaced by data structures in memory. That's utterly inefficient but would prevent the thread and the process from dying.

Actually, that's the one part that's trivial.
You shouldn't need to change the memory at all. AVX data is just packed floats or ints, so if you can split the AVX operation up into e.g. 16 individual ADD/SUB/MUL/DIV operations, you can just use a pointer with an offset.

The real challenge with your approach is to inject the replacement code. Machine code works with pointer addresses, so if you add more instructions in the middle all addresses would have to be offset. Plus there could be side effects from the usage of registers in the injected code. So I'm not convinced about your approach.

WirkoIs that possible in today's CPUs?

Perhaps. Currently, if a CPU encounters an invalid opcode, the thread is normally terminated. I haven't studied what happens on the low level if it's possible for the OS to move it before the cleanup.

#45

Wirko

efikkanI'm skeptical about the feasibility of this.

Actually, that's the one part that's trivial.
You shouldn't need to change the memory at all. AVX data is just packed floats or ints, so if you can split the AVX operation up into e.g. 16 individual ADD/SUB/MUL/DIV operations, you can just use a pointer with an offset.

The real challenge with your approach is to inject the replacement code. Machine code works with pointer addresses, so if you add more instructions in the middle all addresses would have to be offset. Plus there could be side effects from the usage of registers in the injected code. So I'm not convinced about your approach.

It has nothing to do with translation or replacement of code. An invalid opcode triggers an exception #6: Invalid opcode, and an exception handler is then run (in the OS kernel, I presume). The exception handler saves the registers, then it reads the offending instruction and its parameters. If it recognizes an AVX-512 instruction, it performs the operations that this instruction should perform. It doesn't operate on AVX registers because there are none but rather on 32 x 512 bits of data stored in memory, which is not part of the user process space. The exception handler then restores the values of registers and returns control to the user process, which then continues instead of being terminated.
I can find very few resources on that (Anand forums, MIT courses) ... it doesn't seem to be very common.

#46

AusWolf

chrcolukI think you have nailed it, it seems a marketing solution not an engineering one, even on phones it doesnt really work well, but kind of just accepted as there is a recognition of the constraints been worked with and the need to stretch out battery life. Usually on my phone I root it and adjust scheduler to stop using small cores.

I expect on the PC we will get people trying to disable the small cores as much as possible. Might even be affected by profile so e.gl in "high performance" profile it doesnt schedule anything to small cores.

To be honest, I don't really need large cores in my phone as I'm only using it to check my messages every now and then. I'm really not into the modern smartphone gaming / social media culture.

Similarly, I don't need small cores in my PC. I only need large cores with decent power management to keep temperatures in check.

Big.LITTLE is a waste of die area in all platforms in my opinion.

Add your own comment

Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

46 Comments on Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

Related News

46 Comments on Intel Core-1800 Alder Lake Engineering Sample Spotted with 16C/24T Configuration

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts