New
v3.0
Version 3.0 release contains a new supported method avx512_argselect to compute arg nth_element (also known as argpartition in NumPy). It returns an array of indices that would partition the data array. Highlights of this release include:
- v3.0 x86-simd-sort is merged into NumPy main branch. It provides AVX-512 vectorized versions of
np.partitionandnp.argpartition. It speeds upnp.partitionup by up to 25x for 16-bit, 17x for 32-bit dtypes and about 8x speed up for 64-bit dtypes. Speeds up fornp.argpartitionare up-to 6.5x. - A slightly modified version of x86-simd-sort has now been merged into OpenJDK . It speeds up sorting 32-bit and 64-bit data by up to 15x and 7x respectively.
What's Changed
- Fix typo in README by @r-devulap in https://github.com/intel/x86-simd-sort/pull/50
- Update workflow for changes in benchmark tooling by @mosullivan93 in https://github.com/intel/x86-simd-sort/pull/54
- Further Makefile updates by @mosullivan93 in https://github.com/intel/x86-simd-sort/pull/52
- Add avx512_argselect for 32-bit and 64-bit dtypes by @r-devulap in https://github.com/intel/x86-simd-sort/pull/56
- Use __builtin_cpu_supports instead of cpuinfo by @r-devulap in https://github.com/intel/x86-simd-sort/pull/58
- MAINT: Remove template specializations for quicksort methods by @r-devulap in https://github.com/intel/x86-simd-sort/pull/59
- Add benchmarks for small arrays by @r-devulap in https://github.com/intel/x86-simd-sort/pull/62
- Improvement to benchmarking scripts by @r-devulap in https://github.com/intel/x86-simd-sort/pull/66
- Bug fix in benchmark script by @r-devulap in https://github.com/intel/x86-simd-sort/pull/67
- Use global Macros for GCC specific keywords by @r-devulap in https://github.com/intel/x86-simd-sort/pull/68
- Fix compiler warnings in src and tests by @r-devulap in https://github.com/intel/x86-simd-sort/pull/69
- Fix more compiler warnings by @r-devulap in https://github.com/intel/x86-simd-sort/pull/70
Full Changelog: https://github.com/intel/x86-simd-sort/compare/v2.0...v3.0