Monday, February 20th 2023

Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Feb 20th, 2023 05:24 Discuss (28 Comments)

Intel has recently updated its open-source C++ header file library for high-performance SIMD-based sorting to support the AVX-512 SIMD instruction set. Extending the capability of regular AVX2 support, the sorting functions now implement 512-bit extensions to offer greater performance. According to Phoronix, the NumPy Python library for mathematics that underpins a lot of software has updated its software base to use the AVX-512 boosted sorting functionality that yields a fantastic uplift in performance. The library uses AVX-512 to vectorize the quicksort for 16-bit and 64-bit data types using the extended instruction set. Benchmarked on an Intel Tiger Lake system, the NumPy sorting saw a 10-17x increase in performance.

Intel's engineer Raghuveer Devulapalli changed the NumPy code, which was merged into the NumPy codebase on Wednesday. Regarding individual data types, the new implementation increases 16-bit int sorting by 17x and 32-bit data type sorting by 12-13x, while float 64-bit sorting for random arrays has experienced a 10x speed up. Using the x86-simd-sort code, this speed-up shows the power of AVX-512 and its capability to enhance the performance of various libraries. We hope to see more implementations of AVX-512, as AMD has joined the party by placing AVX-512 processing elements on Zen 4.

Source: Phoronix

Add your own comment

28 Comments on Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

#26

Assimilator

R-T-BYeah. That's sadly why it HASN'T seen wider adoption. It's segmentation hell.

IMO they removed it simply because it would've caused the nuclear reactors that are their CPUs to meltdown. Maybe we'll see a return of AVX-512 in the consumer space with their next node shrink.

#27

AnotherReader

Let's hope that this code is vendor neutral. AVX-512 is a better vector ISA than SSE or AVX/AVX2, and for implementations like Zen 4, whose peak FLOPS for AVX and AVX-512 are identical, AVX-512 can be more power efficient due to requiring less power in decode and scheduling compared to its 256 bit alternative.

#28

mrnagant

Didn't AMD refer to their AVX-512 as double pumping? Just using two 256-bit halves? Phoronix shows that AMDs AVX-512 is 60% faster than AVX2 while using similar power and running at basically identical clock speeds.

Add your own comment

Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

28 Comments on Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Related News

28 Comments on Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts