Friday, February 2nd 2024

Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Feb 2nd, 2024 10:19 Discuss (42 Comments)

An interesting Intel document leaked out last month—it contained detailed pre-release information that covered their upcoming 15th Gen Core Arrow Lake-S desktop CPU platform, including a possible best scenario 8+16+1 core configuration. Thorough analysis of the spec sheet revealed a revelation—the next generation Core processor family could "lack Hyper-Threading (HT) support." The rumor mill had produced similar claims in the past, but the internal technical memo confirmed that Arrow Lake's "expected eight performance cores without any threads enabled via SMT." These specifications could be subject to change, but tipster—InstLatX64—has uprooted an Arrow Lake-S engineering sample: "I spotted (CPUID C0660, 24 threads, 3 GHz, without AVX 512) among the Intel test machines."

The leaker had uncovered several pre-launch Meteor Lake SKUs last year—with 14th Gen laptop processors hitting the market recently, InstLatX64 has turned his attention to seeking out next generation parts. Yesterday's Arrow Lake-S find has chins wagging about the 24 thread count aspect (sporting two more than the fanciest Meteor Lake Core Ultra 9 processor)—this could be an actual 24 core total configuration—considering the evident lack of hyper-threading, as seen on the leaked engineering sample. Tom's Hardware reckons that the AVX-512 instruction set could be disabled via firmware or motherboard UEFI—if InstLatX64's claim of "without AVX-512" support does ring true, PC users (demanding such workloads) are best advised to turn to Ryzen 7040 and 8040 series processors, or (less likely) Team Blue's own 5th Gen Xeon "Emerald Rapids" server CPUs.

Sources: InstLatX64, Tom's Hardware, VideoCardz

Add your own comment

42 Comments on Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Hyderz

so 8p + 16e?

Daven

My prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

trparky

DavenZen 5 is gonna wipe the floor with this thing.

Agreed.

pressing on

DavenMy prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

The point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.

phints

I think Zen 5 is going to be killer, especially if AMD is able to use TSMC 3nm, but do hope Intel is able to bring some competition with their Intel 4 node to HEDT. It's not sounding like they will right now but it's too early to know.

TumbleGeorge

pressing onThe claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.

The claim? Hmm, the claim is "in future P-cores and E-cores". Not mentioned exact time and series of CPU's.

ncrs

pressing onThe point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.

Unfortunately it doesn't work like that. Even with AVX10.2 you still have to choose the vector width at compile time, it's not like ARM Scalable Vector Extensions which is vector register width-independent.

From the Intel AVX10 paper:

The converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an
AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and
new versions of 256-bit instructions supporting embedded rounding. This converged version will be supported on
both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length, Intel AVX10
itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines, supporting the
key instructions, vector and mask register lengths, and capabilities that have comprised the ISA to date. Future P-
core based Xeon processors will continue to support all Intel AVX-512 instructions ensuring that legacy applications
continue to run without impact.

256-bit as baseline with 512-bit for P-cores. Further it clarifies that 512-bit length is on processors only containing P-cores, so most likely only Xeons:

[...] with 128-bit and 256-bit vector lengths being supported across all processors, and 512-bit vector
lengths additionally supported on P-core processors.

It would be nice if they allowed disablement of E-cores to make the CPU "fully P-core" to enable 512-bit vector registers, but we'll have to see. Intel wasn't very happy with early Alder Lake BIOS switches to do this.
You won't be able to use current AVX-512 software on AVX10 E-cores without recompilation either, and if they use 512-bit vectors you will need to make changes in code:

Existing Intel AVX-512 applications, many of them already using maximum 256-bit vectors, should see the same
performance when compiled to Intel AVX10/256 at iso-vector length. For applications that can leverage greater
vector lengths, Intel AVX10/512 will be supported on Intel P-cores, continuing to deliver the best-in-class perfor-
mance for AI, scientific, and other high-performance codes.

Again, on P-core CPUs (Xeons) it will work without recompilation.

The GCC documentation also confirms that 512-bit register support is a separate feature.

AVX10 is bringing a lot of AVX-512 goodness to E-core designs, but it's not seamless nor fully backwards compatible with current AVX-512 software.

mtosev

Is Intel planning on releasing new generation HEDT CPUs?

trparky

ncrsIt would be nice if they allowed disablement of E-cores to make the CPU "fully P-core" to enable 512-bit vector registers, but we'll have to see. Intel wasn't very happy with early Alder Lake BIOS switches to do this.
You won't be able to use current AVX-512 software on AVX10 E-cores without recompilation either, and if they use 512-bit vectors you will need to make changes in code:

And I can't help but think that Intel is really holding back the rest of the industry with these kinds of shenanigans. We could have universal AVX-512 support but we can't because... Intel.

#10

atomsymbol

DavenMy prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

Just some notes:

- Extended features of AVX10-256 instructions over AVX2-256 instructions are more important for a desktop CPU like Arrow Lake than the lack of AVX10-512

- Zen 5 presumably won't have APX which is an instruction set extension more important for performance of general-purpose codes than AVX10-512 because most general-purpose codes cannot be vectorized with AVX10

ncrsUnfortunately it doesn't work like that. Even with AVX10.2 you still have to choose the vector width at compile time, it's not like ARM Scalable Vector Extensions which is vector register width-independent.

I think the previous post suggesting that E-cores might implement AVX10-512 meant that a large part of the AVX10-512 instruction set could (in theory) be implemented by 256-bit ALUs on E-cores.

In either case, the past failure of heterogeneous x86 Intel CPUs is purely a software failure (operating systems, compilers).

trparkyAnd I can't help but think that Intel is really holding back the rest of the industry with these kinds of shenanigans. We could have universal AVX-512 support but we can't because... Intel.

Intel isn't holding back the industry. The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.

#11

ncrs

atomsymbolJust some notes:

- Extended features of AVX10-256 instructions over AVX2-256 instructions are more important for a desktop CPU like Arrow Lake than the lack of AVX10-512

Agreed, however this fragments the ecosystem even further.

atomsymbol- Zen 5 presumably won't have APX which is an instruction set extension more important for performance of general-purpose codes than AVX10-512 because most general-purpose codes cannot be vectorized with AVX10

IMO Intel is playing a dangerous game with APX. This looks like the similar attempt which was made during the 32-bit to 64-bit transition. Itanium (ia64) was supposed to be the 64-bit architecture, obviously under Intel/HP control, while x86 remained 32-bit. The industry wasn't happy about such prospect and chose the amd64 extension to x86 instead which retained 100% software compatibility.

Implementing support for APX will touch every aspect of software, from operating systems through compilers to (specific) libraries. I'm not sure if it will be a success for Intel. AVX-512 software only relatively recently started picking up, and with the Intel consumer SKUs not supporting it after Rocket/Ice/Tiger Lakes did looks like Intel can't stick to its own technology. It wouldn't be the first time either - SGX and TSX also were removed.

atomsymbolI think the previous post suggesting that E-cores might implement AVX10-512 meant that a large part of the AVX10-512 instruction set could (in theory) be implemented by 256-bit ALUs on E-cores.

Sure they can, that's what Centaur's CHA microarchitecture did, AMD's implementation in Zen 4 is a bit more complex with more of the CPU being 512-bit optimized. Adjusting the decoding part for AVX10 should be relatively cheap area-wise, but only Intel knows if it's feasible for sure.

atomsymbolIn either case, the past failure of heterogeneous x86 Intel CPUs is purely a software failure (operating systems, compilers).

Intel isn't holding back the industry. The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.

Not sure why you're blaming compilers when it's Intel that is responsible for their development and wiring up support for their microarchitectures. As for operating systems, it's kind of the same - they did work closely with Microsoft to implement support in Windows 11, and still it fails to assign threads correctly sometimes. Linux support was also Intel's to complete, yet it's still not done with no equivalent for Intel Thread Director support.

#12

atomsymbol

ncrsNot sure why you're blaming compilers when it's Intel that is responsible for their development and wiring up support for their microarchitectures. As for operating systems, it's kind of the same - they did work closely with Microsoft to implement support in Windows 11, and still it fails to assign threads correctly sometimes. Linux support was also Intel's to complete, yet it's still not done with no equivalent for Intel Thread Director support.

It is an operating system's choice whether to support or not to support some form of dynamic recompilation as a core feature of its architecture. This choice cannot be made by a CPU designed and manufactured to be heterogeneous. The "top killer" or "alpha predator" of heterogeneous x86 CPUs is the architecture of operating systems.

#13

Upgrayedd

DavenMy prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

Where's the lower IPC and clocks come from? Isn't it on the new Intel 20A?

#14

atomsymbol

ncrsImplementing support for APX will touch every aspect of software, from operating systems through compilers to (specific) libraries.

In Linux, the existing infrastructure for distributing software packages could be used to get APX binaries. In Windows, adoption of APX might be slower than in Linux.

ncrsI'm not sure if it will be a success for Intel. AVX-512 software only relatively recently started picking up, and with the Intel consumer SKUs not supporting it after Rocket/Ice/Tiger Lakes did looks like Intel can't stick to its own technology. It wouldn't be the first time either - SGX and TSX also were removed.

The curvature of the adoption rate of APX cannot be inferred from the historical record of AVX-512 adoption rate.

#15

trparky

atomsymbolThe architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.

But wouldn't that ultimately lead to bigger executable binaries and associated DLLs since there would have to be two code paths? One for AVX-512 equipped CPUs and another for everything else.

atomsymboldynamic recompilation as a core feature of its architecture.

That's a possibility but more disk space would be required since you essentially would want to cache the recompiled binary.

#16

ncrs

atomsymbolIt is an operating system's choice whether to support or not to support some form of dynamic recompilation as a core feature of its architecture. This choice cannot be made by a CPU designed and manufactured to be heterogeneous. The "top killer" or "alpha predator" of heterogeneous x86 CPUs is the architecture of operating systems.

Sorry, I'm having problems understanding what you're trying to say. It's the CPU's vendor obligation to provide support, not the other way around.
Do you know of any mainstream operating system that actually supports heterogenous ISAs? As far as I know it isn't done even on ARM - all the SoCs spotting big.LITTLE (and "midDLE" nowadays) cores always support the same ARM specification levels on all of them, so that processes can be migrated - the same as Intel E-/P-core designs. The problem here is proper scheduling.

atomsymbolIn Linux, the existing infrastructure for distributing software packages could be used to get APX binaries. In Windows, adoption of APX might be slower than in Linux.

That's not enough as you need modifications to the lowest levels of the OS kernel. Intel outlines what's needed in their documentation. In order to do that you need hardware in the hands of kernel developers. In order to support APX software you also need the hardware to develop/port them in the first place. It's a common problem with all new technology.

atomsymbolThe curvature of the adoption rate of APX cannot be inferred from the historical record of AVX-512 adoption rate.

Yes, but the history of Intel's additions to x86, and their subsequent removals, can make potential developers wary of supporting APX in the first place. This is also the case for AVX10.

trparkyBut wouldn't that ultimately lead to bigger executable binaries and associated DLLs since there would have to be two code paths? One for AVX-512 equipped CPUs and another for everything else.

Yes, and that's what Intel's Clear Linux does:

To fully use the capabilities in different generations of CPU hardware, Clear Linux OS will perform multiple builds of libraries with CPU-specific optimizations. For example, Clear Linux OS builds libraries with Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512). Clear Linux OS can then dynamically link to the library with the newest optimization based on the processor in the running system. Runtime libraries used by ordinary applications benefit from these CPU specific optimizations.

Another method would be dynamic dispatching dependent on runtime detection of CPU flags, like in Intel IPP or common math libraries.

#17

atomsymbol

ncrsSorry, I'm having problems understanding what you're trying to say. It's the CPU's vendor obligation to provide support, not the other way around.

Just to put this in another perspective: The idea "It's the CPU's vendor obligation to provide [OS] support" would seam crazy around year 1980. Z80 CPU vendor should be responsible for providing software support to machines built with Z80? ---- Year 2024 isn't the end of history.

ncrsDo you know of any mainstream operating system that actually supports heterogenous ISAs?

How does non-existence of such operating systems invalidate my previous claim that the operating system architecture is the top "alpha predator" of heterogeneous CPUs?

ncrsThat's not enough as you need modifications to the lowest levels of the OS kernel. Intel outlines what's needed in their documentation. In order to do that you need hardware in the hands of kernel developers. In order to support APX software you also need the hardware to develop/port them in the first place. It's a common problem with all new technology.

AVX10.1 support has already been posted to the gcc compiler. I don't know whether the developers who posted it have access to a physical CPU with AVX10.1.

ncrsYes, but the history of Intel's additions to x86, and their subsequent removals, can make potential developers wary of supporting APX in the first place. This is also the case for AVX10.

No. The fact is that there hasn't been any such x86 ISA extension since introduction of amd64. APX is the first ever extension on top amd64 for general-purpose computations.

trparkyThat's a possibility but more disk space would be required since you essentially would want to cache the recompiled binary.

Do you know what the size of Vulkan shader caches is on a gaming machine?

#18

Wirko

ncrsAnother method would be dynamic dispatching dependent on runtime detection of CPU flags, like in Intel IPP or common math libraries.

There are also other techniques available, and I'm amazed that Intel didn't implement them in their P+E CPUs. If an E core encounters an AVX-512 instruction, and given proper OS support, it can do one of two things that don't kill the process: either emulate that instruction or suspend execution so the scheduler can migrate the thread to a P core. That's not how you gain performance of course, but you get stable execution, the processor can still run code that only some of its cores are able to execute, and large areas of silicon are not wasted.

#19

trparky

atomsymbolDo you know what the size of Vulkan shader caches is on a gaming machine?

No, and a part of me is afraid to ask.

#20

Minus Infinity

DavenMy prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

Arrow Lake will have p-core clocks ~1GHz lower than Raptor Lake. However the real killer blow will be the pricing which is said to be very high for MB's. AMD will obliterate them on performance per dollar with Zen 5. Zen 5 is already said to be faster than Zen 4 X3D in gaming.

#21

DemonicRyzen666

pressing onThe point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.

that promote SSE to Avx thing hasn't been working as well in resent years. Since E-cores don't have AVX.

#22

InVasMani

One thing they could do is a 8P cores with HT along with 8 E cores not clusters. They should just keep the E core shared cache the same and reduce them to pairs of 2 per cluster. It will allow them more flexibility to insert more actual P cores something many have been desiring which is a opportunity and at the same time would then provide more shared cache per E core cluster than current designs. It would also reduce power and improve thermals if running fewer cores in total, but getting better ST across more cores and higher efficiency in exchange. It's not a bad trade off. The E cores would also have more consistent latency response with more cache per cluster to access.

#23

ncrs

atomsymbolJust to put this in another perspective: The idea "It's the CPU's vendor obligation to provide [OS] support" would seam crazy around year 1980. Z80 CPU vendor should be responsible for providing software support to machines built with Z80? ---- Year 2024 isn't the end of history.

From Wikipedia:

The first samples were returned from Mostek on 9 March 1976. By the end of the month, they had also completed an assembler-based development system.

So... it was Z80's creators that provided support after all. Other operating systems obviously used this and documentation to implement support, but it is the CPU vendor's job to provide the initial support, development environments and documentation.

atomsymbolHow does non-existence of such operating systems invalidate my previous claim that the operating system architecture is the top "alpha predator" of heterogeneous CPUs?

There is no such operating system because nobody actually created a CPU like that, as far as I know.

atomsymbolAVX10.1 support has already been posted to the gcc compiler. I don't know whether the developers who posted it have access to a physical CPU with AVX10.1.

If you actually looked into it you'd know who those developers were: Intel employees who obviously have access to hardware. They are the ones who always do enablement of new parts in the Linux kernel and GCC/LLVM.

atomsymbolNo. The fact is that there hasn't been any such x86 ISA extension since introduction of amd64. APX is the first ever extension on top amd64 for general-purpose computations.

I guess we'll have to see how far APX can go. There's still a risk that software vendors will simply not bother, and continue to support amd64 only or invest in ARM/RISC-V instead as the "next big thing". This is the danger I wrote about before. When Intel tried this with Itanium they have been in a much stronger market position than they are in now, yet it still failed.
On the other hand if AMD is on board and has been implementing APX, AVX10 and X86S into their ~Zen 5/6 it's going to bring good additions to x86 in general. They do have cross-licensing agreements (after Intel lost in court).

WirkoThere are also other techniques available, and I'm amazed that Intel didn't implement them in their P+E CPUs. If an E core encounters an AVX-512 instruction, and given proper OS support, it can do one of two things that don't kill the process: either emulate that instruction or suspend execution so the scheduler can migrate the thread to a P core. That's not how you gain performance of course, but you get stable execution, the processor can still run code that only some of its cores are able to execute, and large areas of silicon are not wasted.

Yeah, there are many potential software solutions, and unfortunately all of them have downsides. IMO the proper way would have been equipping E-cores with AVX-512 capabilities, even if it's executed on 256-bit registers like Centaur's CHA. I'm not knowledgeable enough to see what it would take to modify Atom cores to do it, but Intel did not think its worth it. They continue to think that given the fact that even Arrow/Lunar/Panther Lake E-cores do not have AVX-512 (via GCC).

#24

marios15

While the move from 32->64bit was slow, it was also restricted due to limited internet infrastructure at the time, we usually had service packs etc, on CDs, so backwards compatibility was very important.
Now almost every new instruction set patches are released a few months early to the public and you can just download a patch/update for your OS, software, games, compilers etc

the downside is of course that nothing is being tested anymore, although it's a different discussion on that
AVX10 is crap, they should just let avx512 be the last "extension" (and both start working on x86-S or APX, or the original idea behind AMD Fusion/HSA)

For AMD/Intel adding a few tiny ARM accelerators on chiplets/tiles is a matter of cost-benefit, Qualcomm/Samsung/Apple on the other hand cannot use x86 at all, AMD already had ARM co-processors on-chip since FX era

Then again maybe they both believe they're too big to fall, I guess we'll see in 5-10 years

#25

Daven

mtosevIs Intel planning on releasing new generation HEDT CPUs?

No

Edit: Unless you count Sapphire Rapids Xeon W series which are already out.

UpgrayeddWhere's the lower IPC and clocks come from? Isn't it on the new Intel 20A?

Newer process nodes don’t automatically guarantee higher clocks at first. Case in point, the various TSMC internode versions that are optimized for different specs: power, clocks and density.

Arrow Lake will probably closely align itself with Meteor Lake. Meteor Lake p cores on its Intel 4 node has lower IPC than Raptor Lake p cores on its Intel 7 node. The reason for this specific drop is that processor design is all about transistor real estate. CPUs can have higher IPC on smaller nodes because there is room to add more functional units, cache, etc. Intel is choosing to allocate the extra real estate of smaller nodes to the NPU, better e cores and the iGPU instead.

I believe Arrow Lake p cores are on the Intel 4 node just like Meteor Lake’s p cores. However, they could go Intel 3 which is due at the end of this year but it seems that Intel is saving its most cutting edge capacity for third party chip designers in order to boost IFS. The Intel 20A node is expected by the end of 2025 unless there are delays.

Add your own comment

Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

42 Comments on Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Related News

42 Comments on Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts