Friday, July 17th 2020

Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

Jul 17th, 2020 13:18 Discuss (33 Comments)

Michael Larabel over at Phoronix posted an extremely comprehensive analysis on the performance differential between AMD's Rome-based EPYC and Intel's Cascade Lake Xeons one-year after release. The battery of tests, comprising more than 116 benchmark results, pits a Xeon Platinum 8280 2P system against an EPYC 7742 2P one. The tests were conducted pitting performance of both systems while running benchmarks under the Ubuntu 19.04 release, which was chosen as the "one year ago" baseline, against the newer Linux software stack (Ubuntu 20.10 daily + GCC 10 + Linux 5.8).

The benchmark conclusions are interesting. For one, Intel gained more ground than AMD over the course of the year, with the Xeon platform gaining 6% performance across releases, while AMD's EPYC gained just 4% over the same period of time. This means that AMD's system is still an average of 14% faster across all tests than the Intel platform, however, which speaks to AMD's silicon superiority. Check some benchmark results below, but follow the source link for the full rundown.

Source: Phoronix

Add your own comment

33 Comments on Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

#26

efikkan

Vya DomusI hope not, very wide SIMD is a fallacy in modern computer architecture design. SIMD was introduced in the days when other massively parallel compute hardware didn't exist and everyone thought frequency/numbers of transistors would just scale forever with increasingly lower power consumption.

The point of SIMD is to do the same logic across a larger vector of data, saving a lot of unnecessary logic.

Vya DomusGPUs make CPU SIMD redundant, I can't think of a single application that couldn't be scaled up from x86 AVX to CUDA/OpenCL, in fact the latter are way more robust anyway.

As I said in post #21, it has to do with overhead.
AVX is like having a tiny "GPU" with practically zero overhead and mixed with other instructions across the execution ports, while an actual GPU is a separate processor that cost you thousands of clock cycles to talk with and have its own memory system. Skipping between the CPU and the GPU every other instruction is never going to be possible, even if the GPU was on-die, there will always be a threshold about work size before it's worth sending something to the GPU. This should be obvious for those who have developed with this technology.
AVX and GPUs are both SIMD, but SIMD at different scales solving different problems.

#27

Vya Domus

GoldenXYou add too much latency over PCIe.

That's inconsequential with many data parallel algorithms, if the data set is non-trivial it wont matter that it took you 50ms or whatever to move a couple of GBs to a GPU if you are going to then iterate over it using thousands of threads, in fact that was the entire philosophy behind GPGPU. These days there is practically no worthwhile data parallel problem that a GPU wouldn't be able to solve faster. If the host-device latency mattered that much, no one would be using GPUs for compute. Just to prove a point I wrote a solver for particular type of linear systems and by the time the data was something like 8-10 MB the GPU version was already faster including the time it took for the data be transferred over, keep in mind that's not even as big as the CPU cache and no one has any use for a linear system that small.

#28

illli

efikkanWe already have a discussion about that, you are welcome to join it here: www.techpowerup.com/forums/threads/linus-torvalds-finds-avx-512-an-intel-gimmick-to-invent-and-win-at-benchmarks.269770/

No thanks. Just adding some counter here to your seemingly over-hyping of avx-512 in this topic

#29

GoldenX

Vya DomusThat's inconsequential with many data parallel algorithms, if the data set is non-trivial it wont matter that it took you 50ms or whatever to move a couple of GBs to a GPU if you are going to then iterate over it using thousands of threads, in fact that was the entire philosophy behind GPGPU. These days there is practically no worthwhile data parallel problem that a GPU wouldn't be able to solve faster. If the host-device latency mattered that much, no one would be using GPUs for compute. Just to prove a point I wrote a solver for particular type of linear systems and by the time the data was something like 8-10 MB the GPU version was already faster including the time it took for the data be transferred over, keep in mind that's not even as big as the CPU cache and no one has any use for a linear system that small.

For our use case in yuzu, it would be too much latency, and we are already very bandwidth limited. FMA and AVX2 already boost speed a nice 40%, AVX512 would help a lot, far more than using GPGPU capabilities. But, we could do ASTC decoding via OpenCL/CUDA if desktop GPUs never add support, that would beat any CPU instruction set.

#30

HD64G

Even FX CPUs on Linux were much closer in performance vs Intel CPU of that era than how they performed in windows. Intel compilers on windows made Intel look better that it was until recently that AMD invested heavily on Ryzen and the software platform around windows.

#31

Vya Domus

GoldenXFor our use case in yuzu, it would be too much latency, and we are already very bandwidth limited. FMA and AVX2 already boost speed a nice 40%, AVX512 would help a lot, far more than using GPGPU capabilities.

See, that's the thing. That means AVX 512 wouldn't help at all, 40% scaling with AVX2 is already pretty bad and it indicates that the limiting factor is not compute but rather memory bandwidth or branching. That's the problem with wider SIMD, it needs more bandwidth which is already scarce.

#32

GoldenX

Don't worry, the industry will "solve it" with 8GHz RAM sticks.

#33

Vya Domus

They wish, even if you get faster memory the bigger the latency becomes, GPUs are immune to that because of the way threads are scheduled. It's a losing battle anyway because the numbers of cores will keep increasing anyway, you'll never have enough bandwidth for super wide SIMD. Companies like Intel will have to accept CPUs should remain CPUs and stop emulating GPUs, actually they probably already had, I'm willing to bet we'll never see anything past 512-bit SIMD for a very, very long time.

Add your own comment

Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

33 Comments on Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

Related News

33 Comments on Linux Performance of AMD Rome vs Intel Cascade Lake, 1 Year On

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts