Monday, February 20th 2023

Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Intel has recently updated its open-source C++ header file library for high-performance SIMD-based sorting to support the AVX-512 SIMD instruction set. Extending the capability of regular AVX2 support, the sorting functions now implement 512-bit extensions to offer greater performance. According to Phoronix, the NumPy Python library for mathematics that underpins a lot of software has updated its software base to use the AVX-512 boosted sorting functionality that yields a fantastic uplift in performance. The library uses AVX-512 to vectorize the quicksort for 16-bit and 64-bit data types using the extended instruction set. Benchmarked on an Intel Tiger Lake system, the NumPy sorting saw a 10-17x increase in performance.

Intel's engineer Raghuveer Devulapalli changed the NumPy code, which was merged into the NumPy codebase on Wednesday. Regarding individual data types, the new implementation increases 16-bit int sorting by 17x and 32-bit data type sorting by 12-13x, while float 64-bit sorting for random arrays has experienced a 10x speed up. Using the x86-simd-sort code, this speed-up shows the power of AVX-512 and its capability to enhance the performance of various libraries. We hope to see more implementations of AVX-512, as AMD has joined the party by placing AVX-512 processing elements on Zen 4.
Source: Phoronix
Add your own comment

28 Comments on Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

#26
Assimilator
R-T-BYeah. That's sadly why it HASN'T seen wider adoption. It's segmentation hell.
IMO they removed it simply because it would've caused the nuclear reactors that are their CPUs to meltdown. Maybe we'll see a return of AVX-512 in the consumer space with their next node shrink.
Posted on Reply
#27
AnotherReader
Let's hope that this code is vendor neutral. AVX-512 is a better vector ISA than SSE or AVX/AVX2, and for implementations like Zen 4, whose peak FLOPS for AVX and AVX-512 are identical, AVX-512 can be more power efficient due to requiring less power in decode and scheduling compared to its 256 bit alternative.
Posted on Reply
#28
mrnagant
Didn't AMD refer to their AVX-512 as double pumping? Just using two 256-bit halves? Phoronix shows that AMDs AVX-512 is 60% faster than AVX2 while using similar power and running at basically identical clock speeds.
Posted on Reply
Add your own comment
Nov 21st, 2024 14:01 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts