Wednesday, April 8th 2020

x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

Intel's x86 processor architecture has been the dominant CPU instruction set for many decades, since IBM decided to put the Intel 8086 microprocessor into its first Personal Computer. Later, in 2006, Apple decided to replace their PowerPC based processors in Macintosh computers with Intel chips, too. This was the time when x86 became the only option for the masses to use and develop all their software on. While mobile phones and embedded devices are mostly Arm today, it is clear that x86 is still the dominant ISA (Instruction Set Architecture) for desktop computers today, with both Intel and AMD producing processors for it. Those processors are going inside millions of PCs that are used every day. Today I would like to share my thoughts on the demise of the x86 platform and how it might vanish in favor of the RISC-based Arm architecture.

Both AMD and Intel as producer, and millions of companies as consumer, have invested heavily in the x86 architecture, so why would x86 ever go extinct if "it just works"? The answer is that it doesn't just work.
Comparing x86 to Arm
The x86 architecture is massive, having more than a thousand instructions, some of which are very complex. This approach is called Complex Instruction Set Computing (CISC). Internally, these instructions are split into micro-ops, which further complicates processor design. Arm's RISC (Reduced Instruction Set Computing) philosophy is much simpler, and intentionally so. The design goal here is to build simple designs that are easy to manage, with a focus on power efficiency, too. If you want to learn more, I would recommend reading this. It is a simple explanation of differences and what design goals each way achieves. However, today this comparison is becoming pointless as both design approaches copy from each other and use the best parts of each other. Neither architecture is static, they are both constantly evolving. For example Intel invented the original x86, but AMD later added support for 64-bit computing. Various extensions like MMX, SSE, AVX and virtualization have addressed specific requirements for the architecture to stay modern and performing. On the ARM side, things have progressed, too: 64-bit support and floating point math support were added, just like SIMD multimedia instructions and crypto acceleration.

Licensing
Being originally developed by Intel, the x86 ISA is a property of Intel Corporation. To use its ISA, companies such as AMD and VIA sign a licensing agreement with Intel to use the ISA for an upfront fee. Being that Intel controls who can use its technology, they decide who will be able to build an x86 processor. Obviously they want to make sure to have as little competition as possible. However, another company comes into play here. Around 1999, AMD developed an extension to x86, called x86-64 which enables the 64-bit computing capabilities that we all use in our computers. A few years later the first 64-bit x86 processors were released and took the market by storm, with both Intel and AMD using the exact same x86-64 extensions for compatibility. This means that Intel has to license the 64-bit extension from AMD, and Intel licenses the base x86 spec to AMD. This is the famous "cross-licensing agreement" in which AMD and Intel decided to give each other access to technology so both sides have benefits, because it wouldn't be possible to build a modern x86 CPU without both.

Arm's licensing model, on the other hand, is completely different. Arm will allow anyone to use its ISA, as long as that company pays a [very modest] licensing cost. There is an upfront fee which the licensee pays, to gain a ton of documentation and the rights to design a processor based on the Arm ISA. Once the final product is shipped to customers, Arm charges a small percentage of royalty for every chip sold. The licensing agreement is very flexible, as companies can either design their cores from scratch or use some predefined IP blocks available from Arm.

Software Support
The x86 architecture is today's de facto standard for high-performance applications—every developers creates software for it, and they have to, if they want to sell it. In the open source world, things are similar, but thanks to the openness of that whole ecosystem, many developers are embracing alternative architectures, too. Popular Linux distributions have added native support for Arm, which means if you want to run that platform you won't have to compile every piece of software yourself, but you're free to install ready-to-use binary packages, just like on the other popular Linux distributions. Microsoft only recently started supporting Arm with their Windows-on-Arm project that aims to bring Arm-based devices to the hands of millions of consumers. Microsoft already had a project called Windows RT, and its successor, Windows 10 for ARM, which tried to bring Windows 8 editions to Arm CPU.

Performance
The Arm architecture is most popular for low-powered embedded and portable devices, where it can win with its energy-efficient design. That's why high performance has been a problem until recently. For example Marvell Technology Group (ThunderX processors) started out with first-generation Arm designs in 2014. Those weren't nearly as powerful as the x86 alternatives, however, it gave the buyers of server CPUs a sign - Arm processors are here. Today Marvell is shipping ThunderX2 processors that are very powerful and offer comparable performance similar to x86 alternatives (Broadwell and Skylake level performance), depending on the workload of course. Next-generation ThunderX3 processors are on their way this year. Another company doing processor design is Ampere Computing, and they just introduced their Altra CPUs, which should be very powerful as well.
What is their secret sauce? The base of every core is Arm's Neoverse N1 server core, designed to give the best possible performance. The folks over at AnandTech have tested Amazon's Graviton2 design which uses these Neoverse N1 cores and came to an amazing conclusion - the chip is incredibly fast and it competes directly with Intel. Something unimaginable a few years ago. Today we already have decent performance needed to compete with Intel and AMD offerings, but you might wonder why it matters so much since there are options already in the form of Xeon and EPYC CPUs. It does matter, it creates competition, and competition is good for everyone. Cloud providers are looking into deploying these processors as they promise to offer much better performance per dollar, and higher power efficiency—power cost is one of the largest expenses for these companies.
Arm Neoverse
Arm isn't sitting idle, they are doing a lot of R&D on their Neoverse ecosystem with next-generation cores almost ready. Intel's innovation has been stagnant and, while AMD caught up and started to outrun them, it is not enough to keep x86 safe from a joint effort of Arm and startup companies that are gathering incredible talent. Just take a look at Nuvia Inc. which is bringing some of the best CPU architects in the world together: Gerard Williams III, Manu Gulati, John Bruno are all well-known names in the industry, and they are leading the company that is promising to beat everything with its CPU's performance. You can call these "just claims", but take a look at some of the products like Apple's A13 SoC. Its performance in some benchmarks is comparable to AMD's Zen 2 cores and Intel's Skylake, showing how far the Arm ecosystem has come and that it has the potential to beat x86 at its own game.

Performance-per-Watt disparity between Arm and x86 define fiefdoms between the two. Arm chips offer high performance/Watt in smartphone and tablet form-factors where Intel failed to make a dent with its x86-based "Medfield" SoCs. Intel, on the other hand, consumes a lot more power, to get a lot more work gone at larger form-factors. It's like comparing a high-speed railway locomotive to a Tesla Model X. Both do 200 km/h, but the former pulls in a lot more power, and transports a lot more people. Recent attempts at scaling Arm to an enterprise platform met with limited success. A test server based on a 64-core Cavium ThunderX 2 pulls 800 Watts off the wall, which isn't much different from high core-count Xeons. At least, it doesn't justify the cost for enterprise customers to re-tool their infrastructure around Arm. Enterprise Linux distributions like Novell or RHEL haven't invested too much in scalar Arm-based servers (besides microservers), and Microsoft has no Windows Server for Arm.

Apple & Microsoft
If Apple's plan to replace Intel x86 CPUs in its products realizes, then x86 lost one of the bigger customers. Apple's design teams have proven over the years that they can design some really good cores, the Ax lineup of processors (A11, A12 and most recently A13) is testament to that. The question remains however, how well can they scale such a design and how quickly they can adapt the ecosystem for it. With Apple having a tight grip on its App Store for Mac, it wouldn't be too difficult for them to force developers to ship an Arm-compatible binary, too, if they want to keep their product on App Store.

On the Microsoft Windows side, things are different. There is no centralized Store—Microsoft has tried, and failed. Plenty of legacy software exists that is developed for x86 only. Even major developers of Windows software are currently not providing Arm binaries. For example Adobe's Creative Suite, which is the backbone of the creative industry, is x86 only. Game developers are busy enough learning DirectX 12 or Vulkan, they sure don't want to start developing titles with Arm support, too—in addition to Xbox and Playstation. An exception is the Microsoft Office suite, which is available for Windows RT, and is fully functional on that platform. A huge percentage of Windows users are tied to their software stack for either work or entertainment, so the whole software development industry would need to pay more attention to Arm and offer their software on that platform as well. However, that seems impossible for now. Besides Microsoft Edge, there is not even a 3rd party web-browser available. Firefox is in beta, Google's Chrome has seen some development, but there is no public release. That's probably why Microsoft went with the "emulation" route, unlike Apple. According to Microsoft, applications compiled for the Windows platform can run "unmodified, with good performance and a seamless user experience". This emulation does not support 64-bit applications at this time. Microsoft's Universal Windows Platform (UWP) "Store" apps can easily be ported to run on Arm, because the API was designed for that from the ground up.

Server & Enterprise
The server market is important for x86—it has the best margins, high volume and is growing fast, thanks to cloud computing. Historically, Intel has held more than 95% of server shipments with its Xeon lineup of CPUs, while AMD occupied the rest of that, Arm really played no role here. Recently AMD started the production of EPYC processors that deliver good performance, run power efficient and have good pricing, making a big comeback and gnawing away at Intel's market share. Most of the codebases in that sector should be able to run on Arm, and even supercomputers can use the Arm ISA, where the biggest example is the Fugaku pre-exascale supercomputer. By doing the custom design of Arm CPUs, vendors will make x86 a thing of the past.

Conclusion
Arm-based processors are lower-cost than Intel and AMD based solutions, while having comparable performance, and consuming less energy. At least that's the promise. I think that servers are the first line where x86 will slowly phase away, and consumer products are second, with Apple pursuing custom chips and Microsoft already offering Arm-based laptops.

On the other hand, eulogies of x86 tend to be cyclical. Just when it appears that Arm has achieved enough performance per Watt to challenge Intel in the ultra-compact client-computing segments, Intel pushes back. Lakefield is an ambitious effort by Intel to take on Arm by combining high-efficiency and high-performance cores onto a single chip, along with packaging innovations relevant to ultra-portables. When it comes out, Lakefield could halt Arm in its tracks as it seeks out high-volume client-computing segments such as Apple's MacBooks. Lakefield has the potential to make Apple second-guess itself. It's very likely that Apple's forward-looking decisions were the main reason Intel sat down to design it.

So far, Arm ISA is dominant in the mobile space. Phones manufactured by Samsung, Apple, Huawei and many more feature a processor that has an Arm-based CPU inside. Intel tried to get into the mobile space with its x86 CPUs but failed due to their inefficiency. The adoption rate was low, and some manufacturers like Apple preferred to do custom designs. However, SoftBank didn't pay $31 billion to acquire ARM just so it could eke out revenues from licensing the IP to smartphone makers. The architecture is designed for processors of all shapes and sizes. Right now it takes companies with complete control over their product stack, such as Amazon and Apple, to get Arm to a point where it is a viable choice in the desktop and server space. By switching to Arm, vendors could see financial benefit as well. It is reported that Apple could see reduction in processor prices anywhere from 40% to 60% by going custom Arm. Amazon offers Graviton 2 based instances that are lower-priced compared to Xeon or EPYC based solutions. Of course complete control of both hardware and software comes with its own benefits, as a vendor can implement any feature that users potentially need, without a need to hope that a 3rd party will implement them. Custom design of course has some added upfront development costs, however, the vendor is later rewarded with lower cost per processor.
Add your own comment

217 Comments on x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

#101
ARF
They have already been compiled. 2.87 million apps currently in Google Play Store alone.
Posted on Reply
#102
notb
ARFYes, we are speaking about consumer apps, he shifts to offtopic workstations...
You're making the fundamental logical error here. You're saying: people spend most time on smartphones, so they should be fine with an ARM laptop.
For some reason you can't grasp the fact that they use laptops precisely because they can't do something on their smartphones.

IMO you're a little confused when it comes to how consumers use their PCs. You probably think it's just browsing web, watching movies, listening to music and using communicators.

Sooner or later probably every consumer will run into a problem, because a program he uses on x86 has no ARM version.
ARM is not a usable consumer platform. It could be, but it isn't.
Posted on Reply
#103
bug
ARFThey have already been compiled. 2.87 million apps currently in Google Play Store alone.
So workstations when talking ARM computers are off topic, but Android/Google Play Store isn't. Noted.
Posted on Reply
#104
trparky
ARFThey have already been compiled. 2.87 million apps currently in Google Play Store alone.
There is a big damn difference between an app and what most people, even regular users, expect full blown programs to be like on a desktop platform. This is essentially why Windows 8.x failed so badly on the desktop, Microsoft underestimated what people would be doing with the desktops and so are you. People don't want apps; they want full programs with full functionality on a desktop.

By the way, did you work at Microsoft during the creation of Windows 8.x? Sure sounds like you did.
Posted on Reply
#105
mtcn77
Whenever someone hypes arm, I remember when jim keller told they thought they found a bug in the arm processor(that turned out there was a bug in their verification tool)@3:00.
Posted on Reply
#106
Kursah
Let's be done with the petty drama here folks. Anyone wants to push it past this post will be met with infraction points.
Posted on Reply
#107
mtcn77
I like living dangerously.
Posted on Reply
#108
ARF
A new Premiere Pro Beta with AMD and Nvidia hardware encoding.
Intel no longer has a lead over AMD with Premiere and Quick Sync.





Posted on Reply
#109
notb
ARFA new Premiere Pro Beta with AMD and Nvidia hardware encoding.
Intel no longer has a lead over AMD with Premiere and Quick Sync.
You don't understand this article or you didn't read it. Which one is it?

Adobe added NVENC support, i.e. hardware-accelerated encoding for Nvidia GPUs. This article is about NVENC.
All red bars are for encoding using Nvidia GPU. Blue are for the CPU. Green are for the Intel IGP.

Any questions?
Posted on Reply
#110
Chehh984
I would suggest going back a bit to a time when RISC and CISC were the options. There is a reason one of these won in personal computers. That reason is still valid today, ARM has great advantages which it uses really well in mobile world, but suggesting it could replace x86 in all spheres is just pure speculation. If ARM would be able to do everything x86 does than it would not be ARM anymore....
Posted on Reply
#111
ARF
Chehh984I would suggest going back a bit to a time when RISC and CISC were the options. There is a reason one of these won in personal computers. That reason is still valid today, ARM has great advantages which it uses really well in mobile world, but suggesting it could replace x86 in all spheres is just pure speculation. If ARM would be able to do everything x86 does than it would not be ARM anymore....
Huh, since Intel can't do anything on a more modern process than 14nm, since AMD's tremendous competitive advantages and consequences from that Intel will only have more troubles going forward, I would not claim with such certainty what is a speculation and what not.

ARM didn't exist when Intel made the first x86 processors back 40-50 years ago.
Things change and they change pretty fast.

Do you know what x86 actually is. It's basically the 1978 version, just with added some width here and there, more instructions which take transistor budget, and multiplied by more cores.
There is no innovation there, just pure evolution, and very slow at Intel's pace.
Posted on Reply
#112
londiste
x86 and ARM are not that far apart. x86 started with 8086 in 1978. ARM was founded in 1990 but first ARM CPU was in 1985.

ARM isn't that much different from what you describe - some added width here and there, more instructions which take transistor budget and multiplied by more cores.
In practice, that is hugely simplified way of looking at it. There are major changes in microarchitecture of them both even if ISA is largely the same.
Posted on Reply
#113
Aquinus
Resident Wat-man
ARFARM didn't exist when Intel made the first x86 processors back 40-50 years ago.
No, but x86 was definitely not the only ISA out there at the time and RISC ISAs like MIPS and SPARC were being introduced around the same time 32-bit x86 CPUs were showing up in the mid 80s.
ARFDo you know what x86 actually is. It's basically the 1978 version, just with added some width here and there, more instructions which take transistor budget, and multiplied by more cores.
There is no innovation there, just pure evolution, and very slow at Intel's pace.
There have been a lot of changes since the 8080. Just because core instructions haven't changed doesn't mean the rest of the CPU hasn't.

Edit: One of the things that makes x86 a potent option are the extension to x86; dedicated hardware for tasks that would otherwise take a boatload of clock cycles to accomplish otherwise. This is why vector extensions exist and why you don't just call things like add and multiply a bunch of times instead. So no, a modern x86 processor is very different than an 8080. Even extensions like x86_64 added a boatload of registers in addition to increasing their widths. The only thing that's in common is the core ISA.
Posted on Reply
#114
bug
Chehh984I would suggest going back a bit to a time when RISC and CISC were the options. There is a reason one of these won in personal computers. That reason is still valid today, ARM has great advantages which it uses really well in mobile world, but suggesting it could replace x86 in all spheres is just pure speculation. If ARM would be able to do everything x86 does than it would not be ARM anymore....
There is a reason, but that reason may not be valid anymore: compilers.
If your target instruction set is smaller, translating everything is harder. Back in the day this was also hindered by the available computing power, which doesn't seem to be the case anymore.
Which one is better overall (and I'm not confusing x86 with CISC or ARM with RISC here) or even if there will be a one size fits all solution, I couldn't tell you.
Posted on Reply
#115
mtcn77
bugThere is a reason, but that reason may not be valid anymore: compilers.
If your target instruction set is smaller, translating everything is harder. Back in the day this was also hindered by the available computing power, which doesn't seem to be the case anymore.
Which one is better overall (and I'm not confusing x86 with CISC or ARM with RISC here) or even if there will be a one size fits all solution, I couldn't tell you.
Since I am unfettered with firsthand knowledge, I think I know an easy shortcut.
A simple instruction set will put more pressure on the data caches, either by running further cycles, or further instructions(i-cache) to do the same amount of work, so will use up more overhead to do the same amount of work(d-cache).
However, its data caches are aligned -there are no divergent flow rates, so control management is simpler.
I'll guess it comes to how much overhead is present from wasted cycles due to the complex vs. simple instruction set difference, in reference to how much transistor - and therefore power - budget is saved from simplifying the instruction flow.
Posted on Reply
#116
bug
mtcn77Since I am unfettered with firsthand knowledge, I think I know an easy shortcut.
A simple instruction set will put more pressure on the data caches, either by running further cycles, or further instructions(i-cache) to do the same amount of work, so will use up more overhead to do the same amount of work(d-cache).
However, its data caches are aligned -there are no divergent flow rates, so control management is simpler.
I'll guess it comes to how much overhead is present from wasted cycles due to the complex vs. simple instruction set difference, in reference to how much transistor - and therefore power - budget is saved from simplifying the instruction flow.
I'm not up to date with what ARM does these days, but one of RISC's advantages was that executing any instruction within the same timeframe/cycles dramatically simplifies scheduling. By contrast, ever since Intel went pipelined (Pentium, iirc), they essentially have a sizeable silicon chunk breaking complex instructions down into simple ones, emulating what RISC does.
Like I said, I don't know whether one will prevail over the other. Or whether a hybrid design will trump both.
Posted on Reply
#117
ARF
The more energy efficient, which clearly ARM is, has to prevail.
Posted on Reply
#118
notb
ARFThe more energy efficient, which clearly ARM is, has to prevail.
And yet you're wasting all that energy on repeating the same theories over and over again.
Posted on Reply
#119
mtcn77
ARFThe more energy efficient, which clearly ARM is, has to prevail.
But there is always the mediator compiler involved.
X load fewer cycles which is precisely what x86 is compiled to doing(race to sleep), in deference to Y<X load 'more cycles' is some proposition to pass up.
Posted on Reply
#120
ARF
mtcn77But there is always the mediator compiler involved.
X load fewer cycles which is precisely what x86 is compiled to doing(race to sleep), in deference to Y<X load 'more cycles' is some proposition to pass up.
x86 can't even sleep normally. Have you seen how many screening apps, for example, wake up the "idle" cores in order to get a reading about its current clock. CPU-Z does it, it's ridiculous.

As to my "theories", they are not exactly theories because it has been proved multiple times that a 3-watt Snapdragon is 3-4-5 TIMES faster than an Atom with equal wattage.
Posted on Reply
#121
mtcn77
ARFx86 can't even sleep normally. Have you seen how many screening apps, for example, wake up the "idle" cores in order to get a reading about its current clock. CPU-Z does it, it's ridiculous.

As to my "theories", they are not exactly theories because it has been proved multiple times that a 3-watt Snapdragon is 3-4-5 TIMES faster than an Atom with equal wattage.
Except, Atom FPU operates at 2:1 rate(half, sorry if I mismatched in writing). It doesn't even match its architected speed quotient.
Something on snapdragons, Qualcomm on the other hand is doing heavy customisation.
Posted on Reply
#122
londiste
ARFx86 can't even sleep normally. Have you seen how many screening apps, for example, wake up the "idle" cores in order to get a reading about its current clock. CPU-Z does it, it's ridiculous.
What does this have to do with x86?
Posted on Reply
#123
mtcn77
londisteWhat does this have to do with x86?
X86 is not built like an SoC. There are multiple backdrop execution units on arm, maybe he mentioned that.
Posted on Reply
#124
bug
ARFThe more energy efficient, which clearly ARM is, has to prevail.
If ARM was so clearly more energy efficient, all servers would be running ARM today.
While not necessarily a limitation, at least ARM's designs do not scale up as well as we need. Why? Energy efficiency, that's why.
Posted on Reply
#125
Aquinus
Resident Wat-man
mtcn77I'll guess it comes to how much overhead is present from wasted cycles due to the complex vs. simple instruction set difference, in reference to how much transistor - and therefore power - budget is saved from simplifying the instruction flow.
I would assume that engineers are smart enough to build extensions when there is a need for them, not because they're bored and feel like it. The point of having these more complicated operations is to save clock cycles. I don't know if the cache thing is really all that true because at the end of the day, you're still fiddling around with the same data if you don't use something like vector extensions for doing a bunch of floating point operations or something. At least the point of vectorization is that you can do everything (the same kind of operation every time,) in a couple cycles in parallel instead of say, 20 cycles serially. This works when data doesn't have any interdependencies with other solutions calculated at the same time.

...but now here is the kicker:
mtcn77in reference to how much transistor - and therefore power - budget is saved from simplifying the instruction flow.
The thing is that a lot of these "faster" extensions are likely using more power, but probably not for the throughput they provide. So think of it this way, if AVX2 can speed up a workload by 100%, but power only increases by 50% while the AVX circuitry is active, then that's still a win.
Posted on Reply
Add your own comment
Nov 20th, 2024 07:44 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts