v25.10.00

This is a beta release of cuPyNumeric.

Pip wheels are available on PyPI at https://pypi.org/project/nvidia-cupynumeric/, for Linux (x86-64 and ARM64, with CUDA and multi-node support) and macOS (for ARM64). Conda packages are available at https://anaconda.org/legate/cupynumeric, for Linux (x86-64 and ARM64, with CUDA and multi-node support). GASNet-based (rather than UCX-based) conda packages are under the gex label. Windows is currently supported through WSL.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/25.10/.

Highlights

Added functionality

Implement cupynumeric.in1d.
Add DLPack import/export support to cuPyNumeric ndarrays.
Allow batched input for cupynumeric.linalg.solve.

Performance improvements

Optimized implementation for the special axis= case of cupynumeric.take.
Improve heuristics for choosing between batched and unbatched matrix multiplication.
Improved implementation of cupynumeric.nonzero that uses no additional scratch space.
Identify special cases of advanced indexing that can be executed faster using cupynumeric.einsum.

Documentation / profiling

Add a tutorial on using Legate Tasks to extend cuPyNumeric.
Add a user warning when an operation (e.g. printing to the console) causes a sharded array to be gathered onto a single memory.
Add sub-boxes to the Legate profiler, showing how long the Python interpreter spends inside cuPyNumeric API calls.

Breaking changes

Move nightly conda packages to a dedicated channel, -c legate-nightly.

Known issues

We are aware of hangs occurring under certain platforms and UCC configurations, when using cuSolverMp-backed multi-GPU operations (Cholesky factorization and linear solve). We expect these to be fixed by the 25.11 release, that updates to cuSolverMp 0.7.

Full Changelog: https://github.com/nv-legate/cupynumeric/compare/v25.08.00...v25.10.00

cupynumeric

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

JPProject.IdentityServer4.SSO