Yeah, different benchmark benchmarking different things.
We've been hearing this from the RISC camp since the late 80s; x86 have too much legacy overhead, RISC is more "efficient" and perhaps "faster". Even back then this was only partially true, but it's important to understand the premises. Back then, CISC chips like 80386 and 80486 were larger chips compared to some of their RISC counterparts, and this was before CPUs hit the power wall, so die size was the deciding factor for scaling clock speed. The reduced instruction set of RISC resulted in smaller designs which was cheaper to make and could be clocked higher, potentially reaching higher performance levels in some cases. But RISC always had much lower performance per clock, so higher clock speed was always a
requirement for RISC to be performing.
Since the 80s, CPU designs have changed radically. Modern x86 implementations have nothing in common with their ancestors, with design features such as pipelining, OoO execution, cache, prefetching, branch prediction, superscalar, SIMD and application specific acceleration. As clock speeds have increased beyond 3 GHz, new bottlenecks have emerged; like the power wall and memory wall. x86 today is just an ISA, implemented as different microarchitectures. All major x86 implementations since the mid 90s have adapted a "RISC like" microarchitecture, where x86 is translated into architecture-specific micro-operations, a sort of hybrid approach, to get the best of both worlds.
x86 and ARM implementations have adapted all the techniques mentioned above to achieve our current performance level. Many ARM implementations have used much more application specific instructions. Along with SIMD extensions, these are no longer technically purely RISC designs. Applications specific instructions is the reason why you can browse the web on your Android phone with a CPU consuming ~0.5W, watch or record h.264 videos in 1080p and so on. Some chips even have instructions to accelerate Java bytecode. If modern smartphones were pure RISC designs, they would never be usable like we know them. The same goes for Blu-ray players; if you open one up you'll probably find a ~5W MIPS CPU in there, and it relies either on a separate ASIC or special instructions for all the heavy lifting. One fact still remains, RISC still needs more instructions to do basic operations, and since the power wall is limiting clock speed, RISC will remain behind until they find a way to translate it to more efficient CISC-style operations.
I want to refer to some of the findings from the "VRG RISC vs CISC study" from the University of Wisconsin-Madison:
View attachment 109779
The only real efficiency advantage we see with ARM is in low power CPUs. But this has nothing to do with the ISA, just Intel failing to make their low-end x86 implementations scale well, this is why we see some ARM designs can compete with Atom.