Friday, February 2nd 2024
Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support
An interesting Intel document leaked out last month—it contained detailed pre-release information that covered their upcoming 15th Gen Core Arrow Lake-S desktop CPU platform, including a possible best scenario 8+16+1 core configuration. Thorough analysis of the spec sheet revealed a revelation—the next generation Core processor family could "lack Hyper-Threading (HT) support." The rumor mill had produced similar claims in the past, but the internal technical memo confirmed that Arrow Lake's "expected eight performance cores without any threads enabled via SMT." These specifications could be subject to change, but tipster—InstLatX64—has uprooted an Arrow Lake-S engineering sample: "I spotted (CPUID C0660, 24 threads, 3 GHz, without AVX 512) among the Intel test machines."
The leaker had uncovered several pre-launch Meteor Lake SKUs last year—with 14th Gen laptop processors hitting the market recently, InstLatX64 has turned his attention to seeking out next generation parts. Yesterday's Arrow Lake-S find has chins wagging about the 24 thread count aspect (sporting two more than the fanciest Meteor Lake Core Ultra 9 processor)—this could be an actual 24 core total configuration—considering the evident lack of hyper-threading, as seen on the leaked engineering sample. Tom's Hardware reckons that the AVX-512 instruction set could be disabled via firmware or motherboard UEFI—if InstLatX64's claim of "without AVX-512" support does ring true, PC users (demanding such workloads) are best advised to turn to Ryzen 7040 and 8040 series processors, or (less likely) Team Blue's own 5th Gen Xeon "Emerald Rapids" server CPUs.
Sources:
InstLatX64, Tom's Hardware, VideoCardz
The leaker had uncovered several pre-launch Meteor Lake SKUs last year—with 14th Gen laptop processors hitting the market recently, InstLatX64 has turned his attention to seeking out next generation parts. Yesterday's Arrow Lake-S find has chins wagging about the 24 thread count aspect (sporting two more than the fanciest Meteor Lake Core Ultra 9 processor)—this could be an actual 24 core total configuration—considering the evident lack of hyper-threading, as seen on the leaked engineering sample. Tom's Hardware reckons that the AVX-512 instruction set could be disabled via firmware or motherboard UEFI—if InstLatX64's claim of "without AVX-512" support does ring true, PC users (demanding such workloads) are best advised to turn to Ryzen 7040 and 8040 series processors, or (less likely) Team Blue's own 5th Gen Xeon "Emerald Rapids" server CPUs.
42 Comments on Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support
Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores
Zen 5 is gonna wipe the floor with this thing.
The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.
From the Intel AVX10 paper: 256-bit as baseline with 512-bit for P-cores. Further it clarifies that 512-bit length is on processors only containing P-cores, so most likely only Xeons: It would be nice if they allowed disablement of E-cores to make the CPU "fully P-core" to enable 512-bit vector registers, but we'll have to see. Intel wasn't very happy with early Alder Lake BIOS switches to do this.
You won't be able to use current AVX-512 software on AVX10 E-cores without recompilation either, and if they use 512-bit vectors you will need to make changes in code: Again, on P-core CPUs (Xeons) it will work without recompilation.
The GCC documentation also confirms that 512-bit register support is a separate feature.
AVX10 is bringing a lot of AVX-512 goodness to E-core designs, but it's not seamless nor fully backwards compatible with current AVX-512 software.
- Extended features of AVX10-256 instructions over AVX2-256 instructions are more important for a desktop CPU like Arrow Lake than the lack of AVX10-512
- Zen 5 presumably won't have APX which is an instruction set extension more important for performance of general-purpose codes than AVX10-512 because most general-purpose codes cannot be vectorized with AVX10 I think the previous post suggesting that E-cores might implement AVX10-512 meant that a large part of the AVX10-512 instruction set could (in theory) be implemented by 256-bit ALUs on E-cores.
In either case, the past failure of heterogeneous x86 Intel CPUs is purely a software failure (operating systems, compilers). Intel isn't holding back the industry. The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.
Implementing support for APX will touch every aspect of software, from operating systems through compilers to (specific) libraries. I'm not sure if it will be a success for Intel. AVX-512 software only relatively recently started picking up, and with the Intel consumer SKUs not supporting it after Rocket/Ice/Tiger Lakes did looks like Intel can't stick to its own technology. It wouldn't be the first time either - SGX and TSX also were removed. Sure they can, that's what Centaur's CHA microarchitecture did, AMD's implementation in Zen 4 is a bit more complex with more of the CPU being 512-bit optimized. Adjusting the decoding part for AVX10 should be relatively cheap area-wise, but only Intel knows if it's feasible for sure. Not sure why you're blaming compilers when it's Intel that is responsible for their development and wiring up support for their microarchitectures. As for operating systems, it's kind of the same - they did work closely with Microsoft to implement support in Windows 11, and still it fails to assign threads correctly sometimes. Linux support was also Intel's to complete, yet it's still not done with no equivalent for Intel Thread Director support.
Do you know of any mainstream operating system that actually supports heterogenous ISAs? As far as I know it isn't done even on ARM - all the SoCs spotting big.LITTLE (and "midDLE" nowadays) cores always support the same ARM specification levels on all of them, so that processes can be migrated - the same as Intel E-/P-core designs. The problem here is proper scheduling. That's not enough as you need modifications to the lowest levels of the OS kernel. Intel outlines what's needed in their documentation. In order to do that you need hardware in the hands of kernel developers. In order to support APX software you also need the hardware to develop/port them in the first place. It's a common problem with all new technology. Yes, but the history of Intel's additions to x86, and their subsequent removals, can make potential developers wary of supporting APX in the first place. This is also the case for AVX10. Yes, and that's what Intel's Clear Linux does: Another method would be dynamic dispatching dependent on runtime detection of CPU flags, like in Intel IPP or common math libraries.
On the other hand if AMD is on board and has been implementing APX, AVX10 and X86S into their ~Zen 5/6 it's going to bring good additions to x86 in general. They do have cross-licensing agreements (after Intel lost in court). Yeah, there are many potential software solutions, and unfortunately all of them have downsides. IMO the proper way would have been equipping E-cores with AVX-512 capabilities, even if it's executed on 256-bit registers like Centaur's CHA. I'm not knowledgeable enough to see what it would take to modify Atom cores to do it, but Intel did not think its worth it. They continue to think that given the fact that even Arrow/Lunar/Panther Lake E-cores do not have AVX-512 (via GCC).
Now almost every new instruction set patches are released a few months early to the public and you can just download a patch/update for your OS, software, games, compilers etc
the downside is of course that nothing is being tested anymore, although it's a different discussion on that
AVX10 is crap, they should just let avx512 be the last "extension" (and both start working on x86-S or APX, or the original idea behind AMD Fusion/HSA)
For AMD/Intel adding a few tiny ARM accelerators on chiplets/tiles is a matter of cost-benefit, Qualcomm/Samsung/Apple on the other hand cannot use x86 at all, AMD already had ARM co-processors on-chip since FX era
Then again maybe they both believe they're too big to fall, I guess we'll see in 5-10 years
Edit: Unless you count Sapphire Rapids Xeon W series which are already out. Newer process nodes don’t automatically guarantee higher clocks at first. Case in point, the various TSMC internode versions that are optimized for different specs: power, clocks and density.
Arrow Lake will probably closely align itself with Meteor Lake. Meteor Lake p cores on its Intel 4 node has lower IPC than Raptor Lake p cores on its Intel 7 node. The reason for this specific drop is that processor design is all about transistor real estate. CPUs can have higher IPC on smaller nodes because there is room to add more functional units, cache, etc. Intel is choosing to allocate the extra real estate of smaller nodes to the NPU, better e cores and the iGPU instead.
I believe Arrow Lake p cores are on the Intel 4 node just like Meteor Lake’s p cores. However, they could go Intel 3 which is due at the end of this year but it seems that Intel is saving its most cutting edge capacity for third party chip designers in order to boost IFS. The Intel 20A node is expected by the end of 2025 unless there are delays.