Friday, June 1st 2018

An ARM to Rule Them All: ARM 76 To Challenge x86 Chips in the Laptop Space?

ARM has announced their next, high-performance computing solution with their A76 design, which brings another large performance increase to the fledgling architecture. having been touted for some time as a true contender to the aging x86 architecture, ARM has had a way of extracting impressive performance increases with each iteration of its computing designs, in the order of 20% do 40% performance increases in an almost annual basis. Compare that to the poster-child of x86 computing, Intel, and its passivity-fueled 5 to 10% yearly performance increases, and the projections aren't that hard to grasp: at some point in time, ARM cores will surpass x86 in performance - at least on the mobility space.

The new ARM A76 design, to be manufactured on the 7 nm process, brings about a 35% increase in performance compared to last years' A75. This comes with an added 40% power efficiency (partly from the 10 nm to 7 nm transition, the rest from architecture efficiency and performance improvements), despite the increase to maximum 3.0 GHz clocks. With the added performance, ARM is saying the new A76 will deliver 4x the Machine Learning performance of its previous A75 design.
Adding to those CPU performance improvements, is ARM's Mali-G76 GPU solution, which also packs some 30% increases in performance density (meaning, for the same silicon footprint, added 30% performance), accompanied by 30% better energy efficiency and 2.7x increased Machine Learning performance for GPU-accelerated workloads. The new GPU architecture supports up to three execution engines per shader core, features a dual texture mapper, presents configurable 2-4 slices of L2 cache, and supports up to 20 "cores" in devices for process and workload distribution.
This combination of CPU (with the ARM A76) and GPU (with the Mali-G76) performance improvements mean that ARM is now within spitting distance of x86 solutions in the mobile space; this, and the future performance projections should ARM be able to keep its development and performance improvement pace, may be one of the reasons why Microsoft invested the way it did in adding ARM support for its Windows operating system in recent times. ARM solutions that employ Microsoft's OS do provide better battery life than their x86 counterparts, and with the latest ARM 76 improvements, which are seemingly more significant than any x86 performance and efficiency increases in recent times, may well mean a push for x86 towards higher levels of required performance, leaving the entry productivity and content consumption scenarios for ARM-powered devices and architectures.
Sources: AnandTech, Tom's Hardware
Add your own comment

74 Comments on An ARM to Rule Them All: ARM 76 To Challenge x86 Chips in the Laptop Space?

#51
R-T-B
bug. I was just saying this was fixed on Linux long ago.
By being an open source product, one does tend to have access to the source code, yes.

I hate to break it to all you guys, but just because Windows is closed source doesn't mean it can't be rebuilt pretty easily for other machines. It can be. You just need... the source code! Which presumably, Microsoft has.

This isn't some *nix* magic marvel mirracle. It's called recompiling. Linux solved the flexibility equation by making their box open. Microsoft's is opaque and needs the men who have special priviliges to recompile the coveted MS source code do the work. That's all.
Posted on Reply
#52
bug
R-T-BBy being an open source product, one does tend to have access to the source code, yes.

I hate to break it to all you guys, but just because Windows is closed source doesn't mean it can't be rebuilt pretty easily for other machines. It can be. You just need... the source code! Which presumably, Microsoft has.

This isn't some *nix* magic marvel mirracle. It's called recompiling. Linux solved the flexibility equation by making their box open. Microsoft's is opaque and needs the men who have special priviliges to recompile the coveted MS source code do the work. That's all.
If only access to the source code would be all it takes.
Firefox, for example, was compiled for 64 bit Linux as early as 2011. The first 64 bit build for Windows was in 2015. And that was just a user-space application, not an entire OS ;)
R0H1TYou mean like chromebooks? If Google brings more Android apps to Chrome OS then yeah it could be even more interesting.
Not really. I was thinking more like Ubuntu (or something) machines. Set one up for your parents, put the Firefox icon on the desktop and you're covered a lot of their use cases ;)
Posted on Reply
#53
R-T-B
bugIf only access to the source code would be all it takes.
Firefox, for example, was compiled for 64 bit Linux as early as 2011. The first 64 bit build for Windows was in 2015. And that was just a user-space application, not an entire OS
NT and most modern OS kernels are designed with portability in mind. It seriously isn't that hard anymore. Heck, even OS/2 was portable (there was a short lived powerpc version).

32-bit -> 64-bit is really a different animal though, because the memory model and all structures related to it change. It is kind of tough to jump that way.
Posted on Reply
#54
bug
R-T-BNT and most modern OS kernels are designed with portability in mind. It seriously isn't that hard anymore. Heck, even OS/2 was portable (there was a short lived powerpc version).

32-bit -> 64-bit is really a different animal though, because the memory model and all structures related to it change. It is kind of tough to jump that way.
So I guess we're back where we started: Linux - works today; Windows - additional work required.
But you know what? Wild thought: let's wait and see the chips first. Because if ARM can't keep the perf/W advantage, we're beating around a dead horse.
Posted on Reply
#55
lexluthermiester
bugAs software runs more and more in the cloud and the OS you use to consume it matters less and less
I do not speak alone on this, cloud computing for the vast majority of people is a non-starter. The device-local-OS will likely never disappear.
There are simple reasons for this;
1. What happens when there is no network to connect to? No network and you effectively have a brick that looks pretty.
2. What happens when an update to a cloud app fails in some way? You can't roll it back to a working version because it's in the cloud and not installed local to the device.
3. Not everyone trusts the cloud for many and various reasons.

There more reasons why computing in the cloud is only useful to a select group of users, but I digress.

Commenting on an idea hinted at earlier, I think it would be interesting to see Windows 10 with the x86 runtimes performance numbers. And various distro's of Linux/BSD running on this new SOC? There are some devs that are going to have fun with this.
R-T-BI hate to break it to all you guys, but just because Windows is closed source doesn't mean it can't be rebuilt pretty easily for other machines. It can be.
And has been. Back in the day, Windows NT was compiled to run on MIPS and ARM, with good(not great) support for legacy software. WinNT ran smooth as silk on SGI machines of the day. At the time it was mind-blowing to see Windows running on non-PC hardware.
Posted on Reply
#56
bug
@lexluthermiester With games increasingly using the cloud to sync saves, browsers gaining the ability to remember where you were and let you pick it up on a different device, the OS does indeed matter less and less - mind you, it's still far from irrelevant. I'm not too thrilled about the cloud either (it's useful, but I'm not putting all my stuff in there), but what I meant is the whole ecosystem is starting to follow you rather the other way around. Thus, having another architecture run your stuff will probably go mostly unnoticed these days.
Posted on Reply
#57
lexluthermiester
bug@lexluthermiester With games increasingly using the cloud to sync saves, browsers gaining the ability to remember where you were and let you pick it up on a different device, the OS does indeed matter less and less - mind you, it's still far from irrelevant. I'm not too thrilled about the cloud either (it's useful, but I'm not putting all my stuff in there), but what I meant is the whole ecosystem is starting to follow you rather the other way around. Thus, having another architecture run your stuff will probably go mostly unnoticed these days.
Ok, I see what you're saying. And you're right on some levels, a lot of these kinds of software are fast becoming platform/hardware agnostic. Still, I think that there will always be advantages and disadvantages to a lot of things based on hardware architecture. Example, complex software will likely always run better on CISC and simpler software will likely always run better in RISC, generally. There will of course be exceptions to this as there always have been..
Posted on Reply
#58
R-T-B
bugSo I guess we're back where we started: Linux - works today; Windows - additional work required.
No, you are back where YOU started. I jumped in in the middle. Like a frog.
Posted on Reply
#59
londiste
Competitor to x86 in the mobile space. TDP-wise - and related, performance-wise - this is the higher end of the spectrum for ARM and lower end for x86. Betteridge's law of headlines comes to mind :)

ARM is evolving fast and heavy. What I would like to see though is what that evolution looks like in terms of transistors or die size.
By now, ARM and the whole mobile space is driven to a smaller and smaller process, TSMC 10nm SOCs are quite widespread and 7nm is coming soon-ish. x86 and desktop are actually a bit behind.

Apple's A11 - 6-core, should be one of the (if not the) quickest mobile SOC at the moment - has about twice the transistors of a Skylake i7. Thanks to 10nm process, it does have around 25% smaller die.
Posted on Reply
#60
Komshija
ARM made a major improvement over the last 10 years. I doubt that they will try to compete with the most powerful AMD and Intel CPU's, but there's no doubt that somewhere in the future (eg. 6-7 years from now) ARM SOC could reach current i7 or Ryzen 7 performance.

Kirin 970, Exynos 9810, Snapdragon 845 have at least the performance of an upper-class desktop/laptop CPU's from 10 years ago, such as C2D E7400 / T9600 / X2 5800+. I know that x86 and ARM architectures are two different things, but the smartphone chips became very powerful lately. Their GPU's have reached levels of mid-range AMD's and nVidia's GPU's from 7 years ago, such as GTX 545 / HD 6570.

Considering it's an SOC, it's rather amazing how such a small chip without big air fans or water cooling can reach such performance with only a fraction of power consumption.
Posted on Reply
#61
londiste
KomshijaKirin 970, Exynos 9810, Snapdragon 845 have at least the performance of an upper-class desktop/laptop CPU's from 10 years ago, such as C2D E7400 / T9600 / X2 5800+. I know that x86 and ARM architectures are two different things, but the smartphone chips became very powerful lately. Their GPU's have reached levels of mid-range AMD's and nVidia's GPU's from 7 years ago, such as GTX 545 / HD 6570.

Considering it's an SOC, it's rather amazing how such a small chip without big air fans or water cooling can reach such performance with only a fraction of power consumption.
All on 10nm process.
- Kirin 970 is 8-core: 4*A73 @ 2.36GHz + 4*A53 @ 1.8GHz
- Exynos 9810 is 8-core: A55 - 4xMongoose3 @ 2.9GHz + 4xA55 @ 1.9GHz
- Snapdragon 845 is 8-core: 4*A75(custom) @ 2.8GHz + 4*A55(custom) @ 1.8GHz

As far as transistor count goes, these are all larger than current-generation Ryzen and probably also larger than 8-core Xeons. Very much up-to-date stuff.

- C2D e7400/T9600 are 2-core @ 2.8GHz at 45nm process from 2008 (10 years ago)
- X2 5800+ is 2-core @ 3.0GHz at 65nm process from 2006 (12 years ago)

Granted, both the old CPUs are slower than the 3 mobile SOCs but not as much as you would expect. Technology-wise, the gulf is huge. Over a decade of both process and architecture improvements.
Despite appearances, desktop has not been standing still and has been evolving (although more slowly than mobile).
Posted on Reply
#62
Komshija
^^ Transistor count isn't exclusively related to a better performance. Keep in mind that these three are all SOC's, so it's logical to have more transistors than CPU's on the same process.
Posted on Reply
#63
londiste
You are right but the difference in transistor count is quite notable.
C2D has 228 million transistors. A64 X2 has 221 million.
Kirin 970 has 5.5 billion. Exynos 9810 and Snapdragon 845 should not be that much smaller. At least in the range of A11 at 4.3 billion.

Things that mobile SOCs have that desktop chips do not is primarily the (proper) GPU and modem.
Posted on Reply
#64
bug
lexluthermiesterOk, I see what you're saying. And you're right on some levels, a lot of these kinds of software are fast becoming platform/hardware agnostic. Still, I think that there will always be advantages and disadvantages to a lot of things based on hardware architecture. Example, complex software will likely always run better on CISC and simpler software will likely always run better in RISC, generally. There will of course be exceptions to this as there always have been..
You're not wrong, but even that software (and even software that runs on quantum computers, in the future) can be run remotely and have its output consumed over web services. Of course, the need to run some stuff locally won't go away entirely. Ever.
Posted on Reply
#65
R0H1T
londisteCompetitor to x86 in the mobile space. TDP-wise - and related, performance-wise - this is the higher end of the spectrum for ARM and lower end for x86. Betteridge's law of headlines comes to mind :)

ARM is evolving fast and heavy. What I would like to see though is what that evolution looks like in terms of transistors or die size.
By now, ARM and the whole mobile space is driven to a smaller and smaller process, TSMC 10nm SOCs are quite widespread and 7nm is coming soon-ish. x86 and desktop are actually a bit behind.

Apple's A11 - 6-core, should be one of the (if not the) quickest mobile SOC at the moment - has about twice the transistors of a Skylake i7. Thanks to 10nm process, it does have around 25% smaller die.
Apple is the current leader in this space, no one comes close (except in a handful of tasks) in their TDP range, not even the best of Intel.
londisteYou are right but the difference in transistor count is quite notable.
C2D has 228 million transistors. A64 X2 has 221 million.
Kirin 970 has 5.5 billion. Exynos 9810 and Snapdragon 845 should not be that much smaller. At least in the range of A11 at 4.3 billion.

Things that mobile SOCs have that desktop chips do not is primarily the (proper) GPU and modem.
Are we counting the total number of transistors on the SoC, if so then it's truly not an apples vs apples comparison. For instance we don't know how much space or transistors are dedicated for DSP, modem, GPU, AI et al.
Posted on Reply
#66
londiste
Just for the fun of it, looked up specs for the GPUs in these SOCs as well as the two desktop GPUs (both at 40nm from 2011) you pointed out. They do match up will enough and are all in the same performance range

Desktop GPUs:
- HD6570 - 480:24:8 @ 650MHz doing 624 GFLOPS
- GT545 - 144:24:16 @ 870/1740Mhz doing 500 GFLOPS

Mobile GPUs:
- Kirin 970: Mali-G72 MP12 - 384:12:12 @ 746MHz doing 573 GFLOPs
- Exynos 9810: Mali-G72 MP18 - 576:18:18 @ 572MHz doing 659 GFLOPs
- Snapdragon 845: Adreno 630 - 256:24:16 @ 710MHz doing 727 GFLOPs

For comparison, lets add current Intel's main iGPU:
- HD/UHD 630 (GT2) - 192:16:8 @1200MHz doing 460 GFLOPs

Edit:
As far as transistor counts go:
HD6570 is Turks and 716 million transistors.
GT545 is cut down (25% of chip disabled) GF116 and 1170 million transistors.
R0H1TApple is the current leader in this space, no one comes close (except in a handful of tasks) in their TDP range, not even the best of Intel.
Are we counting the total number of transistors on the SoC, if so then it's truly not an apples vs apples comparison. For instance we don't know how much space or transistors are dedicated for DSP, modem, GPU, AI et al.
Yup. Intel and AMD architectures simply does not scale that low. Similarly, ARM or A11 does not scale much higher.

DSPs are large, modems should not be too large. There is quite a lot of extra stuff on the chip for sure. AI is marketing term, probably for GPU.
Looking at the specs above, all these GPUs (as well ass Apple's in A11) are roughly billion transistors if not less. DSP is probably in the same range. I would expect the CPU cores to be roughly billion transistors, maybe a bit more for 8-core SOCs.

Transistor counts are a tricky thing. Not that SOC manufacturers are eager to disclose details but A11 is 4.3 billion, Kirin 970 is reportedly 5.5 billion, Snapdragon 845 is larger than 835 (3 billion), Exynos 9810 is unknown. Transistors and die size are related but not directly or linearly. Google gave this image for relative die sizes of Snapdragon 845, A11, and Exynos 9810: (Kirin 970 is 96.72 mm²).

Edit2:
Apple had some images in their A11 presentation highlighting separate parts of the SOC.
Based on images from presentation as well as the die picture linked above (where not all parts of GPU and CPU are highlighted) and assuming transistor density is roughly equal in A11 SOC (which it likely is), this is what the transistor counts of areas should be:
- CPU cores ~1 billion
- GPU ~0.85 billion
- ISP ~0.75 billion
Posted on Reply
#67
Vya Domus
londisteApple's A11 - 6-core, should be one of the (if not the) quickest mobile SOC at the moment - has about twice the transistors of a Skylake i7. Thanks to 10nm process, it does have around 25% smaller die.
If you look at the die shot of A11 or any other SoC , for that matter, you will realize half of the die is occupied by dedicated function hardware blocks that have nothing to do with CPU performance. SoCs have a high transistor count and density because they are ... well a "system on a chip" not just a CPU.
Posted on Reply
#68
Etna
R-T-BBy being an open source product, one does tend to have access to the source code, yes.

I hate to break it to all you guys, but just because Windows is closed source doesn't mean it can't be rebuilt pretty easily for other machines. It can be. You just need... the source code! Which presumably, Microsoft has.

This isn't some *nix* magic marvel mirracle. It's called recompiling. Linux solved the flexibility equation by making their box open. Microsoft's is opaque and needs the men who have special priviliges to recompile the coveted MS source code do the work. That's all.
That's not true and you know it. Recompiling does squat if the code was never meant to be portable.

Look at Libreoffice. The defacto productivity suite on Linux. Only builds on x86 and x64, ARM builds require a shit ton of patches that are not part of the upstream source. Can't build on POWER properly without another shit ton of patches.

Or even better, look at the Chromium source code. Fully open, but only builds for x64. Again, ARM builds require patches not carried by upstream. And completely not buildable on any other architecture such as POWER or MIPs.
Posted on Reply
#69
R-T-B
EtnaThat's not true and you know it. Recompiling does squat if the code was never meant to be portable.
It was a simplification. At any rate, the NT kernel is designed to be portable.
EtnaARM builds require a shit ton of patches that are not part of the upstream source.
Due to it's dependence on closed binary Java, but meh... all your examples are really just demonstrating why dependency hell is bad. I will acknowledge using platform exclusive features tends to make code... platform exclusive.
Posted on Reply
#70
bug
R-T-BIt was a simplification. At any rate, the NT kernel is designed to be portable.



Due to it's dependence on closed binary Java, but meh... all your examples are really just demonstrating why dependency hell is bad. I will acknowledge using platform exclusive features tends to make code... platform exclusive.
I think you're not wrong, but you're reducing it all to Windows. With WindowsRT behind them, probably Microsoft has a pretty good grasp on porting the OS. But a shiny new laptop with an OS and no programs to run isn't much of a laptop, is it? As platform agnostic as the Windows kernel may be, virtually every piece of Windows software worth a damn is still closed sourced.

For bonus points, I'm not sure why you think LibreOffice depends on the closed version of Java. OpenJDK has pretty much the same features (and has been compiled for platforms I wouldn't think were capable of running it).
Posted on Reply
#71
R-T-B
bugI think you're not wrong, but you're reducing it all to Windows. With WindowsRT behind them, probably Microsoft has a pretty good grasp on porting the OS.
The original NT code had several ports as far back as version NT 3.2 as well that were non-x86, old including things as exotic as DECs alpha chip. Heck, OS/2 (which was originally supposed to BE what NT is today, the 16-bit Windows replacement) shares some NT code in it's core (and actually, as late as XP NT kernels still could execute OS/2 command line apps, so NT has OS/2 code as well most certainly) and even it is portable due to Microsoft and IBM working together on it, and both agreeing portability was the future. Yes, there are actually powerpc builds of OS/2 around somewhere...

Anyhow, when Windows NT was being developed, RISC was viewed as the the new cool thing and x86 as "bloated" and "slow." As such, hedging their bets on both platforns, Microsoft built NT (upon which all modern Windows are based) with built with portability as a top priority. That legacy remains to this day.

But don't take my word for it. Google is your friend.



Sorry for the history lesson. But you are absolutely right about open vs closed source apps.

Yeah, good points on OO as well, have a thanks. I thought it depended on the closed source Java JIT compiler, but it seems I am wrong.
Posted on Reply
#73
lexluthermiester
bugOf course, the need to run some stuff locally won't go away entirely. Ever.
Exactly. A great many will never like, enjoy or want cloud apps. Just too restricting.
R-T-BBut don't take my word for it. Google is your friend.

I know there was an ARM based version made. Having a heck of a time finding it though. Maybe I saw a proof of concept early version?
Posted on Reply
#74
Yazz
GasarakiNo, because this is projected to not even be faster than the Apple A11. Can't rule s***.



Yes, Windows 10 can run on ARM processors now but the performance is dog slow.
You've obviously never sat down and used an ARM windows PC for a minute or more, Granted it might be because it doesn't have the huge amount of legacy windows 32bit support as part of the OS, but compared to an x86_64 Windows edition everything seems like it's got several more cores running at twice the mhz. The responsiveness and smooth parallel processing made me want to switch all my machines forever and not look back - but then most my software wouldn't work. =(
Vayra86It'll probably cost an ARM and a leg.

:fear:
Hasn't apple had plans for a while to supplement it's PC's and laptops with its own ARM based CPU's?
Posted on Reply
Add your own comment
Sep 26th, 2024 22:17 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts