- Joined
- Nov 11, 2016
- Messages
- 3,413 (1.16/day)
System Name | The de-ploughminator Mk-III |
---|---|
Processor | 9800X3D |
Motherboard | Gigabyte X870E Aorus Master |
Cooling | DeepCool AK620 |
Memory | 2x32GB G.SKill 6400MT Cas32 |
Video Card(s) | Asus RTX4090 TUF |
Storage | 4TB Samsung 990 Pro |
Display(s) | 48" LG OLED C4 |
Case | Corsair 5000D Air |
Audio Device(s) | KEF LSX II LT speakers + KEF KC62 Subwoofer |
Power Supply | Corsair HX850 |
Mouse | Razor Death Adder v3 |
Keyboard | Razor Huntsman V3 Pro TKL |
Software | win11 |
NVidia A100 (the $10,000 server card) is only 19.5 FP32 TFlops: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf
And only 9.7 FP64 TFlops.
The Tensor-flops are an elevated number that only deep-learning folk care about (and apparently not all deep learning folk are using those tensor cores). Achieving ~20 FP32 TFlops general-purpose code is basically the best today (MI100 is a little bit faster, but without as much of that NVlink thing going on).
So 45 TFlops of FP32 is pretty huge by today's standards. However, Intel is going to be competing against the next-generation products, not the A100. I'm sure NVidia is going to grow, but 45TFlops per card is probably going to be competitive.
Nope, RTX 3090 has ~36TFLOPS of FP32, Tensor TFLOPS is something like INT4 or INT8, obviously A100 is designed for different type of workload that don't depend on FP32 or FP64 so much. The workstation Ampere A6000 has 40 TFLOPS of FP32, I guess Nvidia doesn't care about FP64 performance anymore after Titan X Maxwell