Wednesday, March 4th 2020
Ampere Computing Uncovers 80 Core "Cloud-Native" Arm Processor
Ampere Computing, a startup focusing on making HPC and processors from cloud applications based on Arm Instruction Set Architecture, today announced the release of a first 80 core "cloud-native" processor based on the Arm ISA. The new Ampere Altra CPU is the company's first 80 core CPU meant for hyper scalers like Amazon AWS, Microsoft Azure, and Google Cloud. Being built on TSMC's 7 nm semiconductor manufacturing process, the Altra is a CPU that is utilizing a monolithic die to achieve maximum performance. Using Arm's v8.2+ instruction set, the CPU is using the Neoverse N1 platform as its core, to be ready for any data center workload needed. It also borrows a few security features from v8.3 and v8.5, namely the hardware mitigations of speculative attacks.
When it comes to the core itself, the CPU is running at 3.0 GHz frequency and has some very interesting specifications. The design of the core is such that it is 4-wide superscalar Out of Order Execution (OoOE), which Ampere refers to as "aggressive" meaning that there is a lot of data throughput going on. The cache levels are structured in a way that there is 64 KB of L1D and L1I cache per core, along with 1 MB of L2 cache per core as well. For system-level cache, there is 32 MB of L3 available to the SoC. All of the caches have Error-correcting code (ECC) built-in, giving the CPU a much-needed feature. There are two 128-bit wide Single Instruction Multiple Data (SIMD) units, which are there to do parallel processing if needed. There is no mention if they implement Arm's Scalable Vector Extensions (SVE) or not.The SoC is capable of handling 8-channel DDR4 memory running at 3200 MHz, and it supports up to 4 TB of memory per socket. Given that the CPU is also available in dual-socket configurations, you can get up to 8 TB of RAM in your system. From the CPU, there are 128 PCIe 4.0 lanes coming, however, if you opt to use a dual-socket configuration, 32 of those PCIe lanes are wasted on CPU-to-CPU communication and connection. That makes for a total of 192 PCIe 4.0 lanes in the dual-socket configuration, which is a decent amount. Of course, if a system like this wants to be a solid choice for hyper scalers, there needs to be a cache coherency protocol in place. Ampere is implementing the CCIX protocol here that runs over the PCIe lanes and it provides speeds of 25 GB/s per x16 slot. Whole SoC runs anywhere from 45 W to 210 W of TDP, given the core amount. The exact details on available SKUs are unknown yet.
Source:
AnandTech
When it comes to the core itself, the CPU is running at 3.0 GHz frequency and has some very interesting specifications. The design of the core is such that it is 4-wide superscalar Out of Order Execution (OoOE), which Ampere refers to as "aggressive" meaning that there is a lot of data throughput going on. The cache levels are structured in a way that there is 64 KB of L1D and L1I cache per core, along with 1 MB of L2 cache per core as well. For system-level cache, there is 32 MB of L3 available to the SoC. All of the caches have Error-correcting code (ECC) built-in, giving the CPU a much-needed feature. There are two 128-bit wide Single Instruction Multiple Data (SIMD) units, which are there to do parallel processing if needed. There is no mention if they implement Arm's Scalable Vector Extensions (SVE) or not.The SoC is capable of handling 8-channel DDR4 memory running at 3200 MHz, and it supports up to 4 TB of memory per socket. Given that the CPU is also available in dual-socket configurations, you can get up to 8 TB of RAM in your system. From the CPU, there are 128 PCIe 4.0 lanes coming, however, if you opt to use a dual-socket configuration, 32 of those PCIe lanes are wasted on CPU-to-CPU communication and connection. That makes for a total of 192 PCIe 4.0 lanes in the dual-socket configuration, which is a decent amount. Of course, if a system like this wants to be a solid choice for hyper scalers, there needs to be a cache coherency protocol in place. Ampere is implementing the CCIX protocol here that runs over the PCIe lanes and it provides speeds of 25 GB/s per x16 slot. Whole SoC runs anywhere from 45 W to 210 W of TDP, given the core amount. The exact details on available SKUs are unknown yet.
30 Comments on Ampere Computing Uncovers 80 Core "Cloud-Native" Arm Processor
The 3rd party benchmarks will reveal some stuff and we would be able to compare it with x86 performance. Lets focus on performance at the moment and we will see where get with this. Besides, Microsoft has a windows designed just for ARM. What is that tell you?
Adopting a different platform to this space is incredibly difficult, nobody wants emulation, that is usually a huge performance loss.
The server space has its own stuff, IBM has some nice CPUs too
ARM's strength has never been performance. It has _enough_ performance and can keep both core size as well as power/heat low thanks to that.
Intel gave up on Atoms when they deemed the market to be too low-margin (and Atoms were too expensive). They keep on making Atoms for some niches but not as aggressively.
AMD had Jaguar that powers consoles to this day but the line was abandoned after Puma in 2015.
There are very-very few useful comparisons between ARM and x86. ARM does well in low power, does well in server workloads where CPUs are idle much of the time. ThunderX2 tests showed much of that for server CPUs. Mainsteam x86 are bigger, hungrier and more powerful - competing in a different space.
www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/4
I understand the SPEC results aren't exactly "IPC" but we have no desktop apps running on iOS so that comparison will have to wait at least another 6 months or so. In fact it's quite likely that Axx will outperform the best from Intel, on their upcoming (rumored) Macbooks especially with the deep integration Apple already has with iOS & their custom ARM cores.
With the benefits of the multithreads, you can see there is a benefit, in my eyes this is the way you can actually gain performance. We can't rely on frequency any longer. Maybe the AMD cores or Intel's cores are faster than ARM at the moment but it doesn't mean, ARM's cores can't get faster. There's plenty of room for the improvement non like in x86 which we see in every iteration of a processor being released.
Sure, Microsoft has had ARM Windows CT then RT but this one, Windows 10 for ARM is more promising in my eyes. Besides, back in the 90's, x86 was gaining way more performance margins each CPU release than now. Since it was developing pretty fast. Maybe that is why ARM had been left out for further development but it hasn't been forgotten. ARM is starting now and it is growing. Apparently, ARM has no trouble with getting more cores in the CPUs like Intel has so it is a matter of time when ARM surpasses Intel or even AMD. This architecture is more flexible than x86 that's for sure. If it will end up in desktop market? I don't know but I know with ARM it is possible and there are also benefits for this transition.
Consider this that way.
x86 computer nowadays are mostly for servers and desktops. With some tweaks in power, x86 CPUs can be put in a laptop or low power laptops when the TDP is right.
ARM is mobile segment's power efficient master. With some tweaks to squeeze more performance out of these giving it more power it may become a desktop processor.
It's like both started in a different segment and wanting to spread into other markets, some things must have been adjusted in terms of performance to find its way into that different markets.
So, ARM is possible for a desktop PC now. Who know if it will be beneficial to use ARM instead x86 in the future. If it happens or not time will tell. If it is going to happen, we will have to wait and see what will it bring and how will it progress afterwards.
Intel was never in the mobile market. Atom was an attempt against ARM but it did not work too well.
www.anandtech.com/show/10288/intel-broxton-sofia-smartphone-socs-cancelled