Wednesday, April 8th 2020

x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

Intel's x86 processor architecture has been the dominant CPU instruction set for many decades, since IBM decided to put the Intel 8086 microprocessor into its first Personal Computer. Later, in 2006, Apple decided to replace their PowerPC based processors in Macintosh computers with Intel chips, too. This was the time when x86 became the only option for the masses to use and develop all their software on. While mobile phones and embedded devices are mostly Arm today, it is clear that x86 is still the dominant ISA (Instruction Set Architecture) for desktop computers today, with both Intel and AMD producing processors for it. Those processors are going inside millions of PCs that are used every day. Today I would like to share my thoughts on the demise of the x86 platform and how it might vanish in favor of the RISC-based Arm architecture.

Both AMD and Intel as producer, and millions of companies as consumer, have invested heavily in the x86 architecture, so why would x86 ever go extinct if "it just works"? The answer is that it doesn't just work.
Comparing x86 to Arm
The x86 architecture is massive, having more than a thousand instructions, some of which are very complex. This approach is called Complex Instruction Set Computing (CISC). Internally, these instructions are split into micro-ops, which further complicates processor design. Arm's RISC (Reduced Instruction Set Computing) philosophy is much simpler, and intentionally so. The design goal here is to build simple designs that are easy to manage, with a focus on power efficiency, too. If you want to learn more, I would recommend reading this. It is a simple explanation of differences and what design goals each way achieves. However, today this comparison is becoming pointless as both design approaches copy from each other and use the best parts of each other. Neither architecture is static, they are both constantly evolving. For example Intel invented the original x86, but AMD later added support for 64-bit computing. Various extensions like MMX, SSE, AVX and virtualization have addressed specific requirements for the architecture to stay modern and performing. On the ARM side, things have progressed, too: 64-bit support and floating point math support were added, just like SIMD multimedia instructions and crypto acceleration.

Licensing
Being originally developed by Intel, the x86 ISA is a property of Intel Corporation. To use its ISA, companies such as AMD and VIA sign a licensing agreement with Intel to use the ISA for an upfront fee. Being that Intel controls who can use its technology, they decide who will be able to build an x86 processor. Obviously they want to make sure to have as little competition as possible. However, another company comes into play here. Around 1999, AMD developed an extension to x86, called x86-64 which enables the 64-bit computing capabilities that we all use in our computers. A few years later the first 64-bit x86 processors were released and took the market by storm, with both Intel and AMD using the exact same x86-64 extensions for compatibility. This means that Intel has to license the 64-bit extension from AMD, and Intel licenses the base x86 spec to AMD. This is the famous "cross-licensing agreement" in which AMD and Intel decided to give each other access to technology so both sides have benefits, because it wouldn't be possible to build a modern x86 CPU without both.

Arm's licensing model, on the other hand, is completely different. Arm will allow anyone to use its ISA, as long as that company pays a [very modest] licensing cost. There is an upfront fee which the licensee pays, to gain a ton of documentation and the rights to design a processor based on the Arm ISA. Once the final product is shipped to customers, Arm charges a small percentage of royalty for every chip sold. The licensing agreement is very flexible, as companies can either design their cores from scratch or use some predefined IP blocks available from Arm.

Software Support
The x86 architecture is today's de facto standard for high-performance applications—every developers creates software for it, and they have to, if they want to sell it. In the open source world, things are similar, but thanks to the openness of that whole ecosystem, many developers are embracing alternative architectures, too. Popular Linux distributions have added native support for Arm, which means if you want to run that platform you won't have to compile every piece of software yourself, but you're free to install ready-to-use binary packages, just like on the other popular Linux distributions. Microsoft only recently started supporting Arm with their Windows-on-Arm project that aims to bring Arm-based devices to the hands of millions of consumers. Microsoft already had a project called Windows RT, and its successor, Windows 10 for ARM, which tried to bring Windows 8 editions to Arm CPU.

Performance
The Arm architecture is most popular for low-powered embedded and portable devices, where it can win with its energy-efficient design. That's why high performance has been a problem until recently. For example Marvell Technology Group (ThunderX processors) started out with first-generation Arm designs in 2014. Those weren't nearly as powerful as the x86 alternatives, however, it gave the buyers of server CPUs a sign - Arm processors are here. Today Marvell is shipping ThunderX2 processors that are very powerful and offer comparable performance similar to x86 alternatives (Broadwell and Skylake level performance), depending on the workload of course. Next-generation ThunderX3 processors are on their way this year. Another company doing processor design is Ampere Computing, and they just introduced their Altra CPUs, which should be very powerful as well.
What is their secret sauce? The base of every core is Arm's Neoverse N1 server core, designed to give the best possible performance. The folks over at AnandTech have tested Amazon's Graviton2 design which uses these Neoverse N1 cores and came to an amazing conclusion - the chip is incredibly fast and it competes directly with Intel. Something unimaginable a few years ago. Today we already have decent performance needed to compete with Intel and AMD offerings, but you might wonder why it matters so much since there are options already in the form of Xeon and EPYC CPUs. It does matter, it creates competition, and competition is good for everyone. Cloud providers are looking into deploying these processors as they promise to offer much better performance per dollar, and higher power efficiency—power cost is one of the largest expenses for these companies.
Arm Neoverse
Arm isn't sitting idle, they are doing a lot of R&D on their Neoverse ecosystem with next-generation cores almost ready. Intel's innovation has been stagnant and, while AMD caught up and started to outrun them, it is not enough to keep x86 safe from a joint effort of Arm and startup companies that are gathering incredible talent. Just take a look at Nuvia Inc. which is bringing some of the best CPU architects in the world together: Gerard Williams III, Manu Gulati, John Bruno are all well-known names in the industry, and they are leading the company that is promising to beat everything with its CPU's performance. You can call these "just claims", but take a look at some of the products like Apple's A13 SoC. Its performance in some benchmarks is comparable to AMD's Zen 2 cores and Intel's Skylake, showing how far the Arm ecosystem has come and that it has the potential to beat x86 at its own game.

Performance-per-Watt disparity between Arm and x86 define fiefdoms between the two. Arm chips offer high performance/Watt in smartphone and tablet form-factors where Intel failed to make a dent with its x86-based "Medfield" SoCs. Intel, on the other hand, consumes a lot more power, to get a lot more work gone at larger form-factors. It's like comparing a high-speed railway locomotive to a Tesla Model X. Both do 200 km/h, but the former pulls in a lot more power, and transports a lot more people. Recent attempts at scaling Arm to an enterprise platform met with limited success. A test server based on a 64-core Cavium ThunderX 2 pulls 800 Watts off the wall, which isn't much different from high core-count Xeons. At least, it doesn't justify the cost for enterprise customers to re-tool their infrastructure around Arm. Enterprise Linux distributions like Novell or RHEL haven't invested too much in scalar Arm-based servers (besides microservers), and Microsoft has no Windows Server for Arm.

Apple & Microsoft
If Apple's plan to replace Intel x86 CPUs in its products realizes, then x86 lost one of the bigger customers. Apple's design teams have proven over the years that they can design some really good cores, the Ax lineup of processors (A11, A12 and most recently A13) is testament to that. The question remains however, how well can they scale such a design and how quickly they can adapt the ecosystem for it. With Apple having a tight grip on its App Store for Mac, it wouldn't be too difficult for them to force developers to ship an Arm-compatible binary, too, if they want to keep their product on App Store.

On the Microsoft Windows side, things are different. There is no centralized Store—Microsoft has tried, and failed. Plenty of legacy software exists that is developed for x86 only. Even major developers of Windows software are currently not providing Arm binaries. For example Adobe's Creative Suite, which is the backbone of the creative industry, is x86 only. Game developers are busy enough learning DirectX 12 or Vulkan, they sure don't want to start developing titles with Arm support, too—in addition to Xbox and Playstation. An exception is the Microsoft Office suite, which is available for Windows RT, and is fully functional on that platform. A huge percentage of Windows users are tied to their software stack for either work or entertainment, so the whole software development industry would need to pay more attention to Arm and offer their software on that platform as well. However, that seems impossible for now. Besides Microsoft Edge, there is not even a 3rd party web-browser available. Firefox is in beta, Google's Chrome has seen some development, but there is no public release. That's probably why Microsoft went with the "emulation" route, unlike Apple. According to Microsoft, applications compiled for the Windows platform can run "unmodified, with good performance and a seamless user experience". This emulation does not support 64-bit applications at this time. Microsoft's Universal Windows Platform (UWP) "Store" apps can easily be ported to run on Arm, because the API was designed for that from the ground up.

Server & Enterprise
The server market is important for x86—it has the best margins, high volume and is growing fast, thanks to cloud computing. Historically, Intel has held more than 95% of server shipments with its Xeon lineup of CPUs, while AMD occupied the rest of that, Arm really played no role here. Recently AMD started the production of EPYC processors that deliver good performance, run power efficient and have good pricing, making a big comeback and gnawing away at Intel's market share. Most of the codebases in that sector should be able to run on Arm, and even supercomputers can use the Arm ISA, where the biggest example is the Fugaku pre-exascale supercomputer. By doing the custom design of Arm CPUs, vendors will make x86 a thing of the past.

Conclusion
Arm-based processors are lower-cost than Intel and AMD based solutions, while having comparable performance, and consuming less energy. At least that's the promise. I think that servers are the first line where x86 will slowly phase away, and consumer products are second, with Apple pursuing custom chips and Microsoft already offering Arm-based laptops.

On the other hand, eulogies of x86 tend to be cyclical. Just when it appears that Arm has achieved enough performance per Watt to challenge Intel in the ultra-compact client-computing segments, Intel pushes back. Lakefield is an ambitious effort by Intel to take on Arm by combining high-efficiency and high-performance cores onto a single chip, along with packaging innovations relevant to ultra-portables. When it comes out, Lakefield could halt Arm in its tracks as it seeks out high-volume client-computing segments such as Apple's MacBooks. Lakefield has the potential to make Apple second-guess itself. It's very likely that Apple's forward-looking decisions were the main reason Intel sat down to design it.

So far, Arm ISA is dominant in the mobile space. Phones manufactured by Samsung, Apple, Huawei and many more feature a processor that has an Arm-based CPU inside. Intel tried to get into the mobile space with its x86 CPUs but failed due to their inefficiency. The adoption rate was low, and some manufacturers like Apple preferred to do custom designs. However, SoftBank didn't pay $31 billion to acquire ARM just so it could eke out revenues from licensing the IP to smartphone makers. The architecture is designed for processors of all shapes and sizes. Right now it takes companies with complete control over their product stack, such as Amazon and Apple, to get Arm to a point where it is a viable choice in the desktop and server space. By switching to Arm, vendors could see financial benefit as well. It is reported that Apple could see reduction in processor prices anywhere from 40% to 60% by going custom Arm. Amazon offers Graviton 2 based instances that are lower-priced compared to Xeon or EPYC based solutions. Of course complete control of both hardware and software comes with its own benefits, as a vendor can implement any feature that users potentially need, without a need to hope that a 3rd party will implement them. Custom design of course has some added upfront development costs, however, the vendor is later rewarded with lower cost per processor.
Add your own comment

217 Comments on x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

#126
Ashtr1x
I simply do not know all this junk discussions. But in one liner.

x86 is a standard set, ARM Is always custom BS. As a person who relies on a PC for most of the work and use a Smartphone for the mobile needs computing (Android specifically because the Filesystem is accessible to user making it a perfect on the go computer vs iOS trash locked down ecosystem) I simply do not see the market for ARM, the software ecosystem is just like Apple, it will be always custom.

Qualcomm had Centriq CPUs which were state of art ARM server chips heralded by Cloudflare and later abandoned entirely. The team which designed them are SD820's top cream which was a pure custom ARM core vs the regular Cortex based designs like SD835,45,55,65. Mostly like Samsung M cores in Exynos (which sucked anyways) and Apple cores. And that company which prides on the tons of patents *Abandoned* ARM server race. Why ? There's simply no ROI in that, That corporation is heavily focused on the R&D rather than that Broadcom which came to gulp that Qualcomm down and the primary beneficiary being Apple because of their Patents, it's all over at EEtimes regarding the Apple vs Qualcomm history where Apple is shameless as expected.

ARM Annapurna based Graviton 2 processors are made by Amazon directly and they do not sell them at all, it's just an initiative to make more profits vs purchasing Intel and AMD. It's full custom, I repeat. Same with Marvell as well. This ARM BS should end right now. The Win32 ecosystem which we all enjoy is built upon that x86 and both are perfect in terms of Legacy compatibility making it more powerful OS and HW solution, name one Platform in both SW and HW which rivals that ? Android is closest but with Google copying Apple BS everywhere from the point of marketing and HW (Made for Google trash clone of MFi program for licensing money) and SW (Filesystem retardation from 10 and up with new nerf called Scoped Storage, Copying Gestures) it's going to be a bust and locked down system.

This ARM glorification began when Anandtech started pushing that SPEC BS numbers, when the real world Application performance is even beat by the fastest phones like OnePlus and guess what ? That all powerful uber Omega class BS A series processors do not even have Emulation on their Locked down Apple Store and majority use it for garbage Social Media. Finally the scaling, it cannot simply scale like x86 cores and the business are trying hard since a decade to dethrone x86, I'm glad it's failing I hope it fails over and over and stop creeping into the PC area.
Posted on Reply
#127
bug
Ashtr1xI simply do not know all this junk discussions. But in one liner.
In one line, ARM is as standard as x86. It gets more custom implementations, but there's a well defined instruction set, just like for x86/x86_64.
Posted on Reply
#128
ARF
The results show Graviton 2 is pretty much on par with high-end x86 performance while being clocked significantly lower, using about half the power and a third of the silicon area of EPYC 7742. That's impressive for a cost-optimized cloud product - it doesn't need to win every single benchmark to be successful.
100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742
www.phoronix.com/scan.php?page=article&item=epyc-vs-graviton2&num=1
Posted on Reply
#129
Aquinus
Resident Wat-man
ARF100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742
www.phoronix.com/scan.php?page=article&item=epyc-vs-graviton2&num=1
It really depends on what you're doing and when the geometric mean is this far off, you really have to wonder how much different some of the actual benchmarks are.


This doesn't sound too bad, until someone like me, who might want to run a PostgreSQL server on it sees those results:

Posted on Reply
#130
bug
AquinusIt really depends on what you're doing and when the geometric mean is this far off, you really have to wonder how much different some of the actual benchmarks are.


This doesn't sound too bad, until someone like me, who might want to run a PostgreSQL server on it sees those results:

Geometric mean is just the aggregate score, of course it's made up of better and worse numbers. You found a case where Graviton2 loses badly, but there are tests where is bests Epyc, too.
But the thing is, the CPUs are not for you (or the public, in general). They're for Amazon and their use cases. Even if it's overall slower, but better in perf/W, AMD (and Intel, for that matter) just lost a crapload of sales.
Posted on Reply
#131
Aquinus
Resident Wat-man
bugGeometric mean is just the aggregate score, of course it's made up of better and worse numbers. You found a case where Graviton2 loses badly, but there are tests where is bests Epyc, too.
But the thing is, the CPUs are not for you (or the public, in general). They're for Amazon and their use cases. Even if it's overall slower, but better in perf/W, AMD (and Intel, for that matter) just lost a crapload of sales.
It's not just "a case that performs badly," it performs badly with a technology that I've been using for a decade in my career. That means something to me, even if it doesn't to you. This is kind of important when you talk about practical uses of server technology. I always look at how PostgreSQL performs because that matters to me.

Edit: As I said...
AquinusIt really depends on what you're doing
Posted on Reply
#132
bug
AquinusIt's not just "a case that performs badly," it performs badly with a technology that I've been using for a decade in my career. That means something to me, even if it doesn't to you. This is kind of important when you talk about practical uses of server technology. I always look at how PostgreSQL performs because that matters to me.
Besides the academic measurement of the performance (corner have been cut that affect some workloads), what does it really mean to you? You won't buy such a CPU and if you use it in Amazon, you pay for the machine size/performance anyway.
Of course, it's weakness and it is to be noted. But that won't affect you.
Posted on Reply
#133
Aquinus
Resident Wat-man
bugBesides the academic measurement of the performance (corner have been cut that affect some workloads), what does it really mean to you? You won't buy such a CPU and if you use it in Amazon, you pay for the machine size/performance anyway.
Of course, it's weakness and it is to be noted. But that won't affect you.
It is when you're a person who architects these kinds of systems and when you have a say in what kind of hardware it'll be running on. Capacity planning is kind of important when you want to scale.
Posted on Reply
#134
bug
AquinusIt is when you're a person who architects these kinds of systems and when you have a say in what kind of hardware it'll be running on. Capacity planning is kind of important when you want to scale.
And are you designing servers for Amazon?
Posted on Reply
#135
Aquinus
Resident Wat-man
bugAnd are you designing servers for Amazon?
No, I design and build systems that run on PostgreSQL that are hosted on cloud services like AWS, GCP, or Azure.

Edit: I've mainly worked with GCP though. I'd use AWS if I thought it was the better option.
Posted on Reply
#136
bug
AquinusNo, I design systems that run on PostgreSQL that are hosted on cloud services like AWS, GCP, or Azure.
Well then, worst case scenario, use whatever instance you were using already.

And keep in mind test at Phoronix are always run at default settings. That's not how you run PostgreSQL in production anyway.
I'm also thinking this could be problem with PostgreSQL on ARM, but even if true, it makes no difference unless/until it's patched.
Posted on Reply
#137
Aquinus
Resident Wat-man
bugAnd keep in mind test at Phoronix are always run at default settings. That's not how you run PostgreSQL in production anyway.
Have you ever maintained a PostgreSQL server running in a production setting? I don't screw with the defaults if performance is adequate. One of the nice things about Postgres defaults is that they're pretty conservative so they handle large numbers of concurrent connections pretty well OOTB with respect to how much resources each connection consumes. Mucking with things like `work_mem` can have a negative impact on things like the number of concurrent connections you can process since you're altering the ratio of CPU to memory required per connection by changing it and depending on what you're doing, it might not even help you at all. Also mucking with parallel query settings might actually not only use more resources, but hurt performance. So unless you really know what you're doing and how your system operates in the wild, it's always wise to stick with defaults unless there is a reason to change them.

You are far better off optimizing your database design than trying to tweak your way to success with settings.
Posted on Reply
#138
bug
AquinusHave you ever maintained a PostgreSQL server running in a production setting? I don't screw with the defaults if performance is adequate. One of the nice things about Postgres defaults is that they're pretty conservative so they handle large numbers of concurrent connections pretty well OOTB with respect to how much resources each connection consumes. Mucking with things like `work_mem` can have a negative impact on things like the number of concurrent connections you can process since you're altering the ratio of CPU to memory required per connection by changing it and depending on what you're doing, it might not even help you at all. Also mucking with parallel query settings might actually not only use more resources, but hurt performance. So unless you really know what you're doing and how your system operates in the wild, it's always wise to stick with defaults unless there is a reason to change them.

You are far better off optimizing your database design than trying to tweak your way to success with settings.
Let's just say my experience with PostgreSQL has been very different from yours ;)
Posted on Reply
#139
Aquinus
Resident Wat-man
bugLet's just say my experience with PostgreSQL has been very different from yours ;)
I'm starting to realize that. :laugh:

That's enough of a tangent though, my point is that when the geometric mean is that far off, you know that you're bound to find cases you care about where performance is sub-par.
Posted on Reply
#140
bug
AquinusI'm starting to realize that. :laugh:

That's enough of a tangent though, my point is that when the geometric mean is that far off, you know that you're bound to find cases you care about where performance is sub-par.
Goes without saying, you can find those cases even when the geometric mean is not that far off.
Posted on Reply
#141
Aquinus
Resident Wat-man
bugGoes without saying, you can find those cases even when the geometric mean is not that far off.
They tend to occur more frequently when it is though, but you're right. It can still occur when it's a lot closer. Just not usually like this.
Posted on Reply
#142
R-T-B
yeeeemanArm always wanted to get into High Performance computing, whereas x86 manufacturers always wanted to get into ultra low power devices. They never quite made it, because they develop optimal tools for completely different scenarios.
Thing is, ARM has made it into HPC, and several times now. The issue is more compatability at this point than raw ability.
FordGT90Conceptmy problem with ARM is that it isn't part of the familiar Windows ecosystem.
Technically with UWP and such, it is now.
bugIf ARM was so clearly more energy efficient, all servers would be running ARM today
We are already seeing signs of a migration, it's mainly fear of the unknown and recompiling certain legacy apps holding us back.
Vayra86Before AMD Ryzen
- My birth. 1986.

After AMD Ryzen
- My daughter's birth, 2018.

Thanks, AMD

:confused::kookoo::roll::lovetpu:
Hey, same birthyear. Let's party like it's 1986!
Posted on Reply
#143
mtcn77
bugone of RISC's advantages was that executing any instruction within the same timeframe/cycles dramatically simplifies scheduling. By contrast, ever since Intel went pipelined (Pentium, iirc), they essentially have a sizeable silicon chunk breaking complex instructions down into simple ones, emulating what RISC does
I like favors of this kind, you see. There is something visceral about conducting research. Applied sciences rock.
I wonder what Intel will do, they seem to have the best chances of deploying a breakthrough superconductor and what not, since they are more predisposed to search outlier hardware performance cases.
Posted on Reply
#144
ARF
This 64-core Graviton2 is (much) slower than the 80-core Ampere Altra.
Posted on Reply
#145
ARF
Excellent.

ARM-based Japanese supercomputer is now the fastest in the world
A Japanese supercomputer has taken the top spot in the biannual Top500 supercomputer speed ranking. Fugaku, a computer in Kobe co-developed by Riken and Fujitsu, makes use of Fujitsu’s 48-core A64FX system-on-chip. It’s the first time a computer based on ARM processors has topped the list.

Fugaku turned in a Top500 HPL result of 415.5 petaflops, 2.8 times as fast as IBM’s Summit, the nearest competitor. Fugaku also attained top spots in other rankings that test computers on different workloads, including Graph 500, HPL-AI, and HPCG. No previous supercomputer has ever led all four rankings at once.
www.theverge.com/2020/6/23/21300097/fugaku-supercomputer-worlds-fastest-top500-riken-fujitsu-arm
Posted on Reply
#147
ARF
And yet ARM-based home systems should be perfectly viable and we must start using them and start replacing the old x86.
Posted on Reply
#148
Vya Domus
ARFAnd yet ARM-based home systems should be perfectly viable and we must start using them and start replacing the old x86.
Why ? Just because ?
Posted on Reply
#149
ARF
Vya DomusWhy ? Just because ?
First - because I don't like my x86 experience with AMD Ryzen - it lags in loading in many cases - for example when opening the Start menu - some parts of it take longer to load and it gets ugly how one part of the Start menu is visible, and the other needs another second to appear.

Second - because power consumption - imagine a smartphone-sized tiny computer cases/boxes integrated in the monitors or attached to the monitors with the latest Snapdragon ARM-CPU.
That will be perfect for things like - email processing, 4K viewing, YouTube, Facebook, video playback of any type, lite smartphone-type gaming, etc.
Posted on Reply
#150
Vya Domus
ARFFirst - because I don't like my x86 experience with AMD Ryzen - it lags in loading in many cases - for example when opening the Start menu - some parts of it take longer to load and it gets ugly how one part of the Start menu is visible, and the other needs another second to appear.
Nonsense, come on, I have never experinced any lag in the Windows interface (unless obviously I am running something extremely demanding in the background). But let's assume that's the case, what makes you think an ARM chip wouldn't "lag" on the start menu as well ?
ARFSecond - because power consumption - imagine a smartphone-sized tiny computer cases/boxes integrated in the monitors or attached to the monitors with the latest Snapdragon ARM-CPU.
That will be perfect for things like - email processing, 4K viewing, YouTube, Facebook, video playback of any type, lite smartphone-type gaming, etc.
That's what you ARM advocates do not understand. ARM SoCs are designed to be power efficient at all costs, desktop chips aren't, they are focused around performance. A desktop ARM replacement will be "just as bad" as a x86 counter part if you want the same performance. ARM chips are no magic, they are just optimized for something else, looking at smartphone SoCs and extrapolating that to desktops is exceedingly dumb.

And if you don't want the performance, just use your damn phone or tablet. There is absolutely no point in having the exact same limited performance as you do with a phone without it's only advatage, which is mobility. No one wants that, you would be getting the worst of both worlds.
Posted on Reply
Add your own comment
Nov 20th, 2024 07:39 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts