Monday, October 19th 2020
Intel "Tiger Lake" Based Pentium and Celeron to Feature AVX2, an Instruction the Entry-Level Brands were Deprived Of
Intel's next-generation Pentium Gold and Celeron entry-level processors based on the "Tiger Lake" microarchitecture could finally receive the AVX2 instruction set. Intel had segmented AVX and AVX2 to be exclusive to the Core and Xeon brands, with the Pentium Gold and Celeron products based on the same microarchitectures to artificially lack these instructions.
Intel updated its ARK product information database with entries for "Tiger Lake" based Pentium Gold and Celeron products. The page for the Pentium Gold 7505 and Celeron 6305, mention support for AVX2 besides SSE4. Both are mobile chips with 15 W TDP, and are built on the same 10 nm SuperFin process as the rest of the 11th Gen Core "Tiger Lake" processor family.
Intel updated its ARK product information database with entries for "Tiger Lake" based Pentium Gold and Celeron products. The page for the Pentium Gold 7505 and Celeron 6305, mention support for AVX2 besides SSE4. Both are mobile chips with 15 W TDP, and are built on the same 10 nm SuperFin process as the rest of the 11th Gen Core "Tiger Lake" processor family.
39 Comments on Intel "Tiger Lake" Based Pentium and Celeron to Feature AVX2, an Instruction the Entry-Level Brands were Deprived Of
FMA alone can produce some very nice 40% uplifts, for example.
And just for the record, your username made this very hilarious :laugh:
But on the instruction level it also changes the opcodes to allow much more advanced operations on data sets, which is where the true power of AVX-512 is, beyond just being a "double AVX2". AVX-512 is getting close to being a "sub instruction set" of x86.
The challenge of all versions of AVX is the difficulty of using it, it requires expert level programmers to gain substantial performance gains. But the good news is that just enabling automatic optimizations usually gives ~10-30% performance gains "for free" (probably >50% with some minor effort), since the compiler can auto-vectorize and unroll some things, but in order to get that >10x performance gain, it still requires handcrafted low-level code. I believe compilers have some potential to improve here, but ultimately they can only deal with the code written by the programmer. Some of these feature sets are mostly relevant to enterprise users, like those "AI" features.
The good thing about having feature sets is that it makes it easier for e.g. AMD to implement the relevant features for consumers.
Intrinsics are still useful for a few applications, but its far harder to use intrinsics than to use a dedicated language like ISPC: ispc.github.io/
If that's still too much to ask for, then "#pragma omp simd" is the next recommendation. Works in C, C++, and Fortran on a variety of compilers (like GCC and LLVM). A shame about Microsoft Visual Studio... you can't win them all.
Skylake-SP, Cascade Lake & Cooper Lake are all niche workstation and 8-way multiprocessing server products. Not really relevant for desktop software.
That leaves Skylake-X, Ice Lake and Tiger Lake. Again Skylake-X was a niche product.
Whatever's that left for actual desktop usage, is not a mess.
Edit:
I forgot to add that in any case all CPU's support AVX-512F (AVX-512 Foundation). If you program for AVX-512, you can always rely on AVX-512F instructions and check for more (required anyway because of fall-back code).