- Joined
- Apr 24, 2020
- Messages
- 2,709 (1.62/day)
One of the interesting things about AVX is the vast feature set which extends far beyond just arithmetics. It also support things like comparisons with masks, which essentially enables you to do conditionals without branching logic, and the feature set of AVX-512 is almost like a new instruction set. The potential here is huge, but it's still "inaccessible" to most programmers. If we get to a point where writing clean C code can be compiled into decent AVX instructions, even with more complex calculations and some basic conditionals, that would be huge for the adoption of AVX.
That's actually what makes me most excited about AVX512. All of these new AVX512 features allow auto-vectorization to happen far more easily. The details are complicated, but... lets just say that NVidia CUDA and AMD OpenCL has been doing this stuff for over a decade on GPUs. Intel finally is providing CPU-compilers the ability what GPU-compilers have been doing all along. It requires some additional support from the CPU instruction set to ease auto-vectorization and provide more SIMD-based branching controls. But once provided, the theory is already well studied from 1980s SIMD computers and is well known.
Honestly, Linus Torvalds is very clearly out of his depth in this subject matter. I'm no expert, but I can confidently say that I know more than Linus on this subject based on what he's saying here.
AVX and AVX2 are over a decade behind GPU-SIMD computers. AVX512 finally brings parity to CPU-autovectorizers to what GPUs have been doing since 2006. AVX512 is actually a really well designed instruction set... but Intel is certainly messing up the business side of things IMO.
Last edited: