NVIDIA Claims Grace CPU Superchip is 2X Faster Than Intel Ice Lake

MrMilli · Apr 12, 2022

Just to put nVidia's SPECrate 2017_int_base score of 740 into perspective:

Power10 120C = 1700 / 2170 (base / peak)

EPYC 7773X 128C = 864 / 928
Xeon 8380H 224C = 1570 / 1620

Ampere Altra 160C = 596

aQi · Apr 12, 2022

ARF said:
I guess it is easier and faster to pay for good software developers than to AMD or Intel to design good semiconductors.

Lol

Vya Domus · Apr 12, 2022

ARF said:
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.

ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.

ARF · Apr 12, 2022

Vya Domus said:
ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.

'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other

dragontamer5788 · Apr 12, 2022

ARF said:
'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other

There's no Intel equivalent of fjcvtzs? (Floating point Javascript convert to Signed fixed-point rounding towards Zero).

Intel also has a singular "aesenc" function, while ARM has to do "aese + aesmc" (aes-encrypt plus aes-mix-columns, because ARM split this up into two different instructions). Things get ridiculous when we get into ARM-NEON instructions. There's literally 450 ways to load or store a SIMD register in ARM-NEON.

I'm not kidding: https://developer.arm.com/architect...tionhierarchiesinstructiongroup=[Load,Stride]

"Reduced" my ass. The reason this exists is because ARM has a bunch of hardcoded ways to read/write to memory to coincide with the stupid number of file-formats (especially video formats) that exist out there. By hard-coding an ASIC to read/write memory in the right order, ARM reduces the amount of power per load operation, making video processing (aka: Youtube) ever so slightly more power-efficient.

EDIT: If you're interested in the details: https://community.arm.com/arm-commu...osts/coding-for-neon---part-1-load-and-stores

If you want high-speed and low-power usage per instruction, you make extremely specific instructions, such as vector-load interleave-pattern 2 16-bit. (Aka: the vld 2 . 16 instruction). You know, not to be confused with vld 4 .8 or vld 1 .32. These instructions exist because video codecs are in YUV444 vs YUV420 or RGB888, or RGBA8888 formats, and video reading / multimedia programs have to decode them and handle all of the possibilities efficiently.

I've said it before and I'll say it again: ARM is CISC these days. Its kind of ridiculous how specific their instructions get, in this case more specific than Intel (who just implemented the "pshufb" instruction instead). GPUs probably have the most elegant solution: "shared" memory that acts as a crossbar that can implement arbitrary shuffles as needed (instead of needing hundreds of instructions to handle every combination of 128-bit 1/2/4-way interleaved 8/16/32 bit patterns)

TheoneandonlyMrK · Apr 12, 2022

Very spinal tap. .. This one goes to 11.

And I mean VERY, IE full of shite.

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

NVIDIA Claims Grace CPU Superchip is 2X Faster Than Intel Ice Lake

MrMilli

aQi

Vya Domus

ARF

dragontamer5788

TheoneandonlyMrK