Tuesday, April 3rd 2018
Apple to End the x86 Mac Era in 2020
One of the biggest tech stories of the 2000s was Apple's transition from the PowerPC machine architecture to Intel x86, which brought the Mac closer to being the PC it so loathed. The transition wasn't smooth, as besides the operating system, practically every third-party software developer (eg: Adobe), had to rewrite their software for the new architecture, with new APIs, and new runtime environments. Apple could be bringing about a similar change before the turn of the decade.
Apple already builds its own application processors for iOS devices, and some of the newer chips such as the A11 Bionic and A10 Fusion have already reached the performance levels of entry-level x86 desktop processors. It's only a matter of time before Apple can build its own SoCs for Macs (that's not just iMac desktops, but also Mac Pro workstations, MacBook, MacBook Air, and MacBook Pro). That timeline is expected to be around 2020. Since these chips are based on the ARM machine architecture, they will mandate a major transformation of the entire software ecosystem Apple built over the past decade and a half. Intel shares dropped by as much as 9.2 at the first reports of this move.
Source:
Bloomberg
Apple already builds its own application processors for iOS devices, and some of the newer chips such as the A11 Bionic and A10 Fusion have already reached the performance levels of entry-level x86 desktop processors. It's only a matter of time before Apple can build its own SoCs for Macs (that's not just iMac desktops, but also Mac Pro workstations, MacBook, MacBook Air, and MacBook Pro). That timeline is expected to be around 2020. Since these chips are based on the ARM machine architecture, they will mandate a major transformation of the entire software ecosystem Apple built over the past decade and a half. Intel shares dropped by as much as 9.2 at the first reports of this move.
48 Comments on Apple to End the x86 Mac Era in 2020
Instruction sets are overrated and mean little... All the big names are more than mature enough now. The microarchitecture behind them matters more.
ARM designs will always inherent disadvantages which just can't be mitigated due to it's RISC nature , instruction sets do matter , quite a lot.
Than being said either Apple will be shooting themselves in the foot attempting to become independent in a way which simply ins't fit for their current product stack , or they will just change said products , aka turning them into glorified iOS devices. And with a potential overhead , which can be significant in some cases. He is right , x86 software on ARM will be atrocious.
What you just said made no sense. If it's being translated to RISC how in the world can a CISC instruction run as anything but RISC at final runtime? If it was going to be busy during a memory access it will be busy. As in, it's all the same at the end game, it's just easier on the compiler if anything to "think" in CISC.
Illustration: I wrote a mulitplication code macro for my NES using a very light derivitave of homebrew basic someone made for it way back when. It multiplied using the old school "additive method," adding the first number over and over the set number of times in the second. It could be called in one line, but it still tied up the CPU for a godawful length of time. Conceptually, this was a "CISC" instruction of sorts, but the backend RISC was holding it up.
RISC is silicon efficient; CISC is process efficient.
Case in point, ARM has no instructions dedicated to virtual machines. I'm pretty sure that Windows 10 natively runs in a virtual machine on systems that support it for security reasons (you can't disable it).
infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf
When processing SIMD, some x86 instructions hijack the FPUs and ALUs. Sure, ALUs and FPUs only understand a reduced set of instructions but it's the instruction decoder at the top of the processor that determines RISC/CISC, not components inside.
Most intrinsics (that I'm familiar with at least) are closely or directly mapped to assembly instructions. If the specific ARM implementation have a comparable extension with matching parameters, then surely the compiler could convert them (in theory), but extensions like AVX etc. are closely linked to how AVX is implemented on x86 designs, an automatic translation to another vector extension could result in sub-optimal use or even performance loss vs. normal instructions.
It's important to understand that intrinsics are usually only used in the most performance critical part of a program's code. When used properly, the alignment of data in memory is meticulously designed in order to scale well with those specific intrinsics. Switching to another set of intrinsics may require realignment of data structures and code logic to get maximum performance. Vector extensions are especially sensitive, and using these well or not can easily make a >10× difference in performance.
Compilers are for instance very good at optimizing small things that comes down to syntax; like unrolling small loops, rearranging some accesses, etc. But they can never deal with the "big stuff", like scaling problems resulting from your design choices.
If you want to hear some good explanations about how efficient code works, take a look at these:
CppCon 2014: Mike Acton "Data-Oriented Design and C++"
code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care
Even if you don't grasp all the details, it should still be an eye-opener of how much the structure of the code matters. Yes, it always comes down to the skills of the coder and the understanding of the problem to be solved.
As Vya Domus mentions, vector instructions exploit data level parallelism. No compiler can ever optimize your code to make this parallelism, you have to make tightly packed data structures which matches the way you are going to process them.
Let's say you have 100 calculations in the form of A + B = C, then usually this will be compiled to two instructions fetching A and B into registers, one instruction to do the addition, and then one instruction to copy the sum back to memory. If you want to exploit AVX, you'll first have to align your data structures,
not like this: A0 B0 C0 A1 B1 C1 …
But like this:
A0 A1 A2 A3 A4 …
B0 B1 B2 B3 B4 …
C0 C1 C2 C3 C4 …
If you are using AVX2 on 32-bit floats, you can compute 8 additions per cycle. But you can't do this if your data is fragmented, which it might be in a typical OOP structure with data scattered across hundreds of objects.
Applications using intrinsics may only use them in a few functions (typically some "tight" loops), but the data structure might be shared with major parts of the codebase. So the developers usually have to be aware of the constraints even when they are not touching these parts of the code.
I don't know what Vya Domus means by intrinsics being sparingly used. It is used in many applications that matter for productivity; like Adobe programs, (3D) modelers, simulators, encoders etc, and essential libraries for compression etc. It's rarely used in games, and even if used, "never" impacts rendering performance. But as I mentioned, even when it's used, it's usually just a small percentage of the code.
To get back on topic; many performance critical applications can't be recompiled to another architecture and maintain acceptable performance without optimizations.
RISC is better and faster than CISC.
And x86/AMD64 CPUs are RISC nowadays.