Wednesday, October 16th 2024

What the Intel-AMD x86 Ecosystem Advisory Group is, and What it's Not

AVX-512 was proposed by Intel more than a decade ago—in 2013 to be precise. A decade later, the implementation of this instruction set on CPU cores remains wildly spotty—Intel implemented it first on an HPC accelerator, then its Xeon server processors, then its client processors, before realizing that hardware hasn't caught up with the technology to execute AVX-512 instructions in an energy-efficient manner, before deprecating it on the client. AMD implemented it just a couple of years ago with Zen 4 with a dual-pumped 256-bit FPU on 5 nm, before finally implementing a true 512-bit FPU on 4 nm. AVX-512 is a microcosm of what's wrong with the x86 ecosystem.

There are only two x86 CPU core vendors, the IP owner Intel, and its only surviving licensee capable of contemporary CPU cores, AMD. Any new additions to the ISA introduced by either of the two have to go through the grind of their duopolistic competition before software vendors could assume that there's a uniform install base to implement something new. x86 is a net-loser of this, and Arm is a net-winner. Arm Holdings makes no hardware of its own, except continuously developing the Arm machine architecture, and a first-party set of reference-design CPU cores that any licensee can implement. Arm's great march began with tiny embedded devices, before its explosion into client computing with smartphone SoCs. There are now Arm-based server processors, and the architecture is making inroads to the last market that x86 holds sway over—the PC. Apple's M-series processors compete with all segments of PC processors—right from the 7 W class, to the HEDT/workstation class. Qualcomm entered this space with its Snapdragon Elite family, and now Dell believes NVIDIA will take a swing at client processors in 2025. Then there's RISC-V. Intel finally did something it should have done two decades ago—set up a multi-brand Ecosystem Advisory Group. Here's what it is, and more importantly, what it's not.
On Tuesday, 15th October, Intel and AMD jointly announced the x86 Ecosystem Advisory Group. The two companies are equals in this group as x86 processor vendors. There are a few founding members that are big names in the tech industry, and a couple of eminent industry leaders. These include Dell, Broadcom, Google Cloud, HP, HPE, Lenovo, Meta, Microsoft, Oracle, and Red Hat. The luminaries include Linus Torvalds, the creator of Linux, and Tim Sweeney of Epic Games. You can categorize the above list of founding members and luminaries into client-relevant and enterprise-relevant. Tim Sweeney is one of the biggest names in the gaming industry, with Unreal Engine dominating all gaming platforms. Linux is predominantly an enterprise OS—no, Android is not a Linux distribution, it's a highly differentiated OS with its own APIs, which happens to use the Linux kernel.

What the x86 Ecosystem Advisory Group is
It is a special interest group consisting of Intel, AMD (hardware vendors), founding members, and industry luminaries, making sure x86 is consistent as a machine architecture, and there's two-way communication among the hardware vendors and the members of the group, to shape the future of x86. Put simply, it aims to create and implement standards in architectural interfaces, and most importantly, the ISA—or instruction sets.

We began our write-up by going into the test case of AVX-512. The x86 Ecosystem Advisory Group is being set up to prevent exactly that from happening, where 11 years into its conception, AVX-512 has a wildly inconsistent implementation within Intel and AMD, and their product stacks, and so ISVs would rather not implement it. x86 suffers competitiveness in performance against other machine architectures and their instruction-sets.

The Advisory Group's main aim is to ensure the latest ISA and hardware interfaces are jointly developed, implemented, and there is compatibility across the ecosystem, so future technology is more predictable, and the ISVs can respond better to them.

What the Ecosystem Advisory Group is Not
Intel "Arrow Lake" and AMD "Granite Ridge" are nothing alike on the hardware level—they are two completely different pieces of silicon, with a different chip design, and their CPU cores are nothing alike at a hardware level. The only things common between them is the x86 ISA, and a few industry-standard platform interfaces such as the memory and PCIe. And yet, despite such vast amounts of differentiation in hardware design, Intel and AMD processors end up with performance deltas within 5% in a given price segment. This diversity of hardware design is not going to change.

The Ecosystem Advisory Group does not aim to standardize the x86 core, just the ISA. It is a means for the ISV ecosystem to constantly tell Intel and AMD what they expect, and for the two companies to deliver on them. "Here's our CPU core, it can handle the same instructions as our competitor's core, but with better performance and efficiency"—this would be the end-goal of the Ecosystem as far as the hardware vendors are concerned. For the ISV, it's the assurance that by year 2029, the next new instruction-set will be generally available from both Intel and AMD, and they could plan their software product development roadmap to align with that.

What's Next? Is this Enough?
Setting up of this Ecosystem Advisory Group could not have been possible without Intel, which is the IP owner for x86. AMD probably got on board because it sees the value in having such an ecosystem, and a more equitable sharing of technologies with Intel concerning instruction sets and architecture interfaces. But is this enough to go up against Arm and RISC-V? Arm has had a two-decade head-start in having an architecture review board, and the list of hardware vendors implementing Arm dwarfs x86 by a factor of 20. Even someone like MediaTek, which primarily focuses on smartphone SoCs, can develop a new server processor in under a year. x86 needs fresh blood in the hardware vendor space, but this can only happen if Intel and AMD are willing to give up some market share.

The x86 machine architecture is in serious need of housekeeping, and x86S is its future. A 64-bit only version of x86, which sheds 32-bit application compatibility, the standardization of x86S could be sped up with the setting up of the Ecosystem Advisory Group. x86S sheds the 16-bit real-mode, 32-bit protected mode, and v86 (virtual 8086) mode, gets rid of legacy task-switching mechanism, vastly simplifies interrupt handling, enhances security by dropping ring-1 and ring-2 privilege levels (leaving just ring-0 and ring-3 user mode), and improved memory management by eliminating non-long mode paging structures. These changes vastly simplify x86, improve security, and makes x86 more future-ready. The transition to x86S will prove crucial for the future of x86, and something like the x86 Ecosystem Advisory Group couldn't have come at a better time. There are other allied forward-facing developments, such as UCIe, which makes designing disaggregated chips easier, OpenSIL on-chip hardware initialization (a microcode standardization).

In conclusion, the Intel-AMD x86 Ecosystem Advisory Group is nice to have, there is finally something to mitigate the harmful effects of an intensely competitive duopoly and ensure x86 can face Arm better into the next couple of decades, by smoothening out the much-needed transition to x86S, OpenSIL, and other future technologies. This does not hamper innovation, and there remains sufficient incentive for Intel and AMD to keep pushing for faster and more efficient microarchitectures.
Add your own comment

18 Comments on What the Intel-AMD x86 Ecosystem Advisory Group is, and What it's Not

#1
JWNoctis
I thought it is more nuanced and tangled-up than "Intel is the IP owner for x86." There are more patents and cross-licencing agreements between the two remaining major players than any non-lawyer could parse, since and after x86-64.
Posted on Reply
#2
user556
Yeah, dreaming. x86 is no weak guy here. Intel did themselves in. The fact that, maybe, in the future, there could potentially be an alternative to the PC is good news. Sadly, M$'s still dominating though. Get rid of Windoze and then maybe we're talking.
Posted on Reply
#3
Chaitanya
Ian Cutress did a nice job explaining this announcement earlier today.
Posted on Reply
#4
londiste
JWNoctisI thought it is more nuanced and tangled-up than "Intel is the IP owner for x86." There are more patents and cross-licencing agreements between the two remaining major players than any non-lawyer could parse, since and after x86-64.
This. To create and sell a x86-64 CPU you will need a licensing agreement with AMD as much as with Intel. AMD and Intel have very-VERY extensive cross-licensing agreements on a whole lot of things.
Posted on Reply
#5
Dirt Chip
Intel chip with AMD logo. Lol
Posted on Reply
#6
ncrs
The x86 machine architecture is in serious need of housekeeping, and x86S is its future. A 64-bit only version of x86, which sheds 32-bit application compatibility, the standardization of x86S could be sped up with the setting up of the Ecosystem Advisory Group.
It does not remove 32-bit application compatibility. It eliminates 32-bit kernel level support - 32-bit operating systems and 32-bit drivers in 64-bit OS. You can still run 32-bit applications under a 64-bit OS.

(source)

As for AVX-512 it should be mentioned that even on ARM advanced instruction support in the AArch64 ISA is split into ARMv8.x-A and ARMv9.x-A. Some vendors only implement the former without Scalable Vector Extension (SVE) or Scalable Matrix Extension (SME) which are roughly equivalent to the AVX family and Intel AMX.

The plans for future Intel implementations of AVX-512 with AVX10.x are still problematic. AVX10.x-256 is supposed to target all processors with 256-bit width and AVX10.x-512 would be for servers. They are not binary compatible as in you can't run AVX10.x-512 on a 256-bit width CPU - it has to be recompiled for AVX10.x-256 or use some kind of dynamic targeting (just like AVX2/AVX-512 now). It still brings desirable features to 256-bit width SKUs, and guarantees backward compatibility in versions, so it is a step forward.
Posted on Reply
#7
csendesmark
Dirt ChipIntel chip with AMD logo. Lol
They had the license D8086-1 - AMD
AMD isn't the only whom produced it, bot not everyone had/purchased the rights, see Soviet Union:
KR580VM80A
Posted on Reply
#8
mikesg
The +10% average boost on Windows 11 24H2 via branch prediction improvements says more about the future of x86 than anything else, with or without back compatibility removal/optimisations.

If x86 is about a high number of hardware implemented instructions (vs ARM/RISC), shouldn't engineers extend and dictate how code and compilers should be written more... Instead of predicting what comes next, where's the hardware array/list functions... one cycle instructions... the hard work.
Posted on Reply
#9
Wirko
Dirt ChipIntel chip with AMD logo. Lol
Here's another example:

The 80186 was (roughly) the microcontroller version of the 80286. Interestingly, I can find pics of them marked with Ⓜ AMD © INTEL, or © AMD, or Ⓜ © INTEL (still made by AMD), or Ⓜ © AMD © INTEL. AMD also used both type numbers, 80186 and Am186. This probably hints at their magnificent army of lawyers, engineers, reverse engineers, and reverse lawyers.

The M in cirlce means ... this?
ncrsThe plans for future Intel implementations of AVX-512 with AVX10.x are still problematic. AVX10.x-256 is supposed to target all processors with 256-bit width and AVX10.x-512 would be for servers. They are not binary compatible as in you can't run AVX10.x-512 on a 256-bit width CPU - it has to be recompiled for AVX10.x-256 or use some kind of dynamic targeting (just like AVX2/AVX-512 now). It still brings desirable features to 256-bit width SKUs, and guarantees backward compatibility in versions, so it is a step forward.
AMD's 256-bit implementation of AVX-512 is binary compatible with the "true" 512-bit implementation, only the peformance is lower, right?
Posted on Reply
#10
ncrs
WirkoAMD's 256-bit implementation of AVX-512 is binary compatible with the "true" 512-bit implementation, only the peformance is lower, right?
Yes, it's a true AVX-512 implementation.
Maybe Intel will create something similar that will allow 256-bit wide CPUs to run AVX10.x-512 binaries, but at the moment from the specifications it doesn't look like it.

Funnily enough Skymont, the latest E-core microarchitecture already implements 256-bit AVX2 on two 128-bit units, and the core itself has 4x 128-bit FP units:

(source)
Posted on Reply
#11
Wirko
mikesgIf x86 is about a high number of hardware implemented instructions (vs ARM/RISC), shouldn't engineers extend and dictate how code and compilers should be written more... Instead of predicting what comes next, where's the hardware array/list functions... one cycle instructions... the hard work.
But ARM ISA has that famous FJCVTZS instruction specifically to speed up Javascript!
Posted on Reply
#12
JWNoctis
WirkoHere's another example:

The 80186 was (roughly) the microcontroller version of the 80286. Interestingly, I can find pics of them marked with Ⓜ AMD © INTEL, or © AMD, or Ⓜ © INTEL (still made by AMD), or Ⓜ © AMD © INTEL. AMD also used both type numbers, 80186 and Am186. This probably hints at their magnificent army of lawyers, engineers, reverse engineers, and reverse lawyers.

The M in cirlce means ... this?
Sounds like good old second sourcing to me, when both companies were still not readily distinguishable from other IC design and foundry houses of the era. That just required competent lawyers and engineers, rather than armies of them plus reverse engineers.

Incidentally, AMD made other interesting things, with chip art too.
WirkoAMD's 256-bit implementation of AVX-512 is binary compatible with the "true" 512-bit implementation, only the peformance is lower, right?
AMD's double-pumping arrangements had been remarkably efficient, and faster than just keep using AVX2/FMA3. I think y-cruncher's author did a deep dive of this when Zen 4 came out.
Posted on Reply
#13
Wirko
JWNoctisI thought it is more nuanced and tangled-up than "Intel is the IP owner for x86." There are more patents and cross-licencing agreements between the two remaining major players than any non-lawyer could parse, since and after x86-64.
Many crucial patents have expired, some are about to expire. Patents expire 20 years after filing and SSE3 was filed in March 2005. Fused multiply-add expires in October 2026 (the 20-year term seems to have been extended somehow, maybe Intel asked Disney for help). These dates are uncomfortably close, which is one of the reasons Intel is looking for new ways to squeeze some more $ from old tech.
Posted on Reply
#14
JWNoctis
WirkoBut ARM ISA has that famous FJCVTZS instruction specifically to speed up Javascript!
There was Jazelle, a whole extension meant to run Java bytecode on ARMv5. I think it was supposed to be a big thing in feature phone days.

Arm is not exactly foreign to that sort of thing.
Posted on Reply
#15
Wirko
JWNoctisIncidentally, AMD made other interesting things, with chip art too.
That must have been the world's first smiley. Now that we know who the author is, it's obvious that he made an illustration of his own face.
JWNoctisAMD's double-pumping arrangements had been remarkably efficient, and faster than just keep using AVX2/FMA3. I think y-cruncher's author did a deep dive of this when Zen 4 came out.
Yeah. it's not half slower than the 512-bit implementation ... Is it because real-world AVX-512 code often uses 256-bit and shorter operands? I don't know.
Posted on Reply
#16
ncrs
JWNoctisThere was Jazelle, a whole extension meant to run Java bytecode on ARMv5. I think it was supposed to be a big thing in feature phone days.

Arm is not exactly foreign to that sort of thing.
FJCVTZS is a relatively modern addition to the current AArch64 ISA and is not really related to Jazelle or Java. Here's an explanation why it was added, basically JavaScript uses a peculiar math conversion that was too slow with more generic instructions.
WirkoYeah. it's not half slower than the 512-bit implementation ... Is it because real-world AVX-512 code often uses 256-bit and shorter operands? I don't know.
AMD's "double pumped" version of AVX-512 isn't using 256-bit units in sequence, as in first half and then second half separately. It instead uses two 256-bit units simultaneously. What is more AVX-512 instructions aren't split into 2 microcoded operations, but instead are just 1 further optimizing the flow of instructions. Some parts of the core are also 512-bit wide while others are not. It's like they knew that in Zen 5 they would widen those parts and chose to compromise in Zen 4 when it made sense ;)


(source)
In any case since AVX-512 allows more dense packing of math operations than AVX2 (since it's a wider SIMD) in Zen 4's case it allowed the frontend to be less loaded. In turn the execution units were not as starved and performed more work in the same time. Since the decoding frontend was less loaded it consumed less power allowing the core to clock higher. A detailed analysis of it can be read here.
Posted on Reply
#17
londiste
WirkoHere's another example:

The 80186 was (roughly) the microcontroller version of the 80286. Interestingly, I can find pics of them marked with Ⓜ AMD © INTEL, or © AMD, or Ⓜ © INTEL (still made by AMD), or Ⓜ © AMD © INTEL. AMD also used both type numbers, 80186 and Am186. This probably hints at their magnificent army of lawyers, engineers, reverse engineers, and reverse lawyers.

The M in cirlce means ... this?
AMD was a second source manufacturer for Intel CPUs.
Posted on Reply
#18
Wirko
londisteAMD was a second source manufacturer for Intel CPUs.
Yes, I'm aware of that, that's how AMD entered the x86 world. The variety of copyright marks might indicate that some 80186/80286 chips were made using blueprints that Intel was forced to hand to AMD, and others were the result of reverse engineering.
ncrsAMD's "double pumped" version of AVX-512 isn't using 256-bit units in sequence, as in first half and then second half separately. It instead uses two 256-bit units simultaneously. What is more AVX-512 instructions aren't split into 2 microcoded operations, but instead are just 1 further optimizing the flow of instructions. Some parts of the core are also 512-bit wide while others are not. It's like they knew that in Zen 5 they would widen those parts and chose to compromise in Zen 4 when it made sense ;)
So the 256-bit implementation wasn't meant to save die space but rather to save power?
Posted on Reply
Add your own comment
Oct 16th, 2024 06:16 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts