• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

NVIDIA Claims Grace CPU Superchip is 2X Faster Than Intel Ice Lake

Just to put nVidia's SPECrate 2017_int_base score of 740 into perspective:

Power10 120C = 1700 / 2170 (base / peak)

EPYC 7773X 128C = 864 / 928
Xeon 8380H 224C = 1570 / 1620

Ampere Altra 160C = 596
 
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.

ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.
 
ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.

'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other :D
 
'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other :D

There's no Intel equivalent of fjcvtzs? (Floating point Javascript convert to Signed fixed-point rounding towards Zero).

Intel also has a singular "aesenc" function, while ARM has to do "aese + aesmc" (aes-encrypt plus aes-mix-columns, because ARM split this up into two different instructions). Things get ridiculous when we get into ARM-NEON instructions. There's literally 450 ways to load or store a SIMD register in ARM-NEON.

I'm not kidding: https://developer.arm.com/architect...tionhierarchiesinstructiongroup=[Load,Stride]

"Reduced" my ass. The reason this exists is because ARM has a bunch of hardcoded ways to read/write to memory to coincide with the stupid number of file-formats (especially video formats) that exist out there. By hard-coding an ASIC to read/write memory in the right order, ARM reduces the amount of power per load operation, making video processing (aka: Youtube) ever so slightly more power-efficient.

EDIT: If you're interested in the details: https://community.arm.com/arm-commu...osts/coding-for-neon---part-1-load-and-stores

If you want high-speed and low-power usage per instruction, you make extremely specific instructions, such as vector-load interleave-pattern 2 16-bit. (Aka: the vld 2 . 16 instruction). You know, not to be confused with vld 4 .8 or vld 1 .32. These instructions exist because video codecs are in YUV444 vs YUV420 or RGB888, or RGBA8888 formats, and video reading / multimedia programs have to decode them and handle all of the possibilities efficiently.

I've said it before and I'll say it again: ARM is CISC these days. Its kind of ridiculous how specific their instructions get, in this case more specific than Intel (who just implemented the "pshufb" instruction instead). GPUs probably have the most elegant solution: "shared" memory that acts as a crossbar that can implement arbitrary shuffles as needed (instead of needing hundreds of instructions to handle every combination of 128-bit 1/2/4-way interleaved 8/16/32 bit patterns)
 
Last edited:
Very spinal tap. .. This one goes to 11.

And I mean VERY, IE full of shite.
 
Back
Top