• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Claims Grace CPU Superchip is 2X Faster Than Intel Ice Lake

Joined
Mar 1, 2008
Messages
287 (0.05/day)
Location
Antwerp, Belgium
Just to put nVidia's SPECrate 2017_int_base score of 740 into perspective:

Power10 120C = 1700 / 2170 (base / peak)

EPYC 7773X 128C = 864 / 928
Xeon 8380H 224C = 1570 / 1620

Ampere Altra 160C = 596
 
Joined
Jan 8, 2017
Messages
9,500 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.

ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
ARM has stopped being "RISC" a long time ago, they have a ton of instructions as well. But none of it really matters, most compilers only use a really small subset of those instructions, if you look at the assembly of the same code generated for ARM and x86 they'll be almost identical.

'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other :D
 
Joined
Apr 24, 2020
Messages
2,721 (1.60/day)
'Reduced' doesn't mean 0, it is a comparative adjective which literally means less than the other :D

There's no Intel equivalent of fjcvtzs? (Floating point Javascript convert to Signed fixed-point rounding towards Zero).

Intel also has a singular "aesenc" function, while ARM has to do "aese + aesmc" (aes-encrypt plus aes-mix-columns, because ARM split this up into two different instructions). Things get ridiculous when we get into ARM-NEON instructions. There's literally 450 ways to load or store a SIMD register in ARM-NEON.

I'm not kidding: https://developer.arm.com/architect...tionhierarchiesinstructiongroup=[Load,Stride]

"Reduced" my ass. The reason this exists is because ARM has a bunch of hardcoded ways to read/write to memory to coincide with the stupid number of file-formats (especially video formats) that exist out there. By hard-coding an ASIC to read/write memory in the right order, ARM reduces the amount of power per load operation, making video processing (aka: Youtube) ever so slightly more power-efficient.

EDIT: If you're interested in the details: https://community.arm.com/arm-commu...osts/coding-for-neon---part-1-load-and-stores

If you want high-speed and low-power usage per instruction, you make extremely specific instructions, such as vector-load interleave-pattern 2 16-bit. (Aka: the vld 2 . 16 instruction). You know, not to be confused with vld 4 .8 or vld 1 .32. These instructions exist because video codecs are in YUV444 vs YUV420 or RGB888, or RGBA8888 formats, and video reading / multimedia programs have to decode them and handle all of the possibilities efficiently.

I've said it before and I'll say it again: ARM is CISC these days. Its kind of ridiculous how specific their instructions get, in this case more specific than Intel (who just implemented the "pshufb" instruction instead). GPUs probably have the most elegant solution: "shared" memory that acts as a crossbar that can implement arbitrary shuffles as needed (instead of needing hundreds of instructions to handle every combination of 128-bit 1/2/4-way interleaved 8/16/32 bit patterns)
 
Last edited:
Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Very spinal tap. .. This one goes to 11.

And I mean VERY, IE full of shite.
 
Top