This is a beta release of cuPyNumeric.
Pip wheels are available on PyPI at https://pypi.org/project/nvidia-cupynumeric/, for Linux (x86-64 and ARM64, with CUDA 12 and multi-node support) and macOS (for ARM64). Conda packages are available at https://anaconda.org/legate/cupynumeric, for Linux (x86-64 and ARM64, with CUDA 12/13 and multi-node support). GASNet-based (rather than UCX-based) conda packages are under the gex label. Windows is currently supported through WSL.
Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/26.01/.
Highlights
Added functionality
- Implement
cupynumeric.pad. - Implement
cupynumeric.linalg.pinv(single-CPU/GPU only) - Implement
from_dlpackfor exporting cuPyNumericndarrays through the DLPack interface - Detect when an object being used to initialize a cuPyNumeric
ndarrayimplements the DLPack interface, and use it if possible.
Bugfixes
- Ensure unimplemented stub functions always return cuPyNumeric
ndarrays.
Known issues
- We are aware of hangs when using cuSolverMp-based APIs on 4+ Perlmutter nodes. This appears to be a cluster-specific issue, that we are investigating.
- We are aware of performance regressions with
cupynumeric.einsumon Blackwell GPUs, starting to occur with cuBLAS 13.2. These are under investigation.
Full Changelog: https://github.com/nv-legate/cupynumeric/compare/v25.11.00...v26.01.00