New
0.6.1
Summary
NVIDIA® NIXL Release 0.6.1 introduces a new Libfabric plugin for high-performance networking on AWS, expands language binding support and improves stability.
- Libfabric Networking for AWS: Introduces a new Libfabric plugin to leverage AWS Elastic Fabric Adapter (EFA) for high-performance, low-latency communication.
- Language Bindings:
Rust bindings now include support for agent configuration and the
getXferTelemetryAPI, improving integration and observability for Rust applications. - Performance and Stability: Resolves several critical stability issues to enhance robustness.
New Features
- Introduced a new Libfabric plugin with topology-aware support for AWS EFA devices (#784, #801, #802, #809, #817, #826, #831, #833, #850, #852, #866, #867, #868).
- Added an agent configuration flag to enable or disable telemetry capture on a per-agent basis (#764).
- Made the ETCD watch timeout configurable in the metadata listener, to avoid timeouts under heavy load conditions (#766).
- Added a
ca_bundleoption to the Object Storage plugin for compatibility with S3-compatible storage using self-signed certificates (#806).
Bugfixes
- [Core] Fixed a critical use-after-free error on disconnect where requests could be deleted without proper ownership (#782).
- [Listener] Prevented a crash in the etcd client on multiple metadata updates received in rapid succession (#765).
- [Telemetry] Addressed minor issues in the telemetry framework (#750).
Bindings
- [Rust] Added bindings for ThreadSync and AgentConfig (#824).
- [Rust] Exposed the
getXferTelemetryAPI in the Rust bindings for retrieving per-request performance metrics (#823).
Dependencies
- The UCX dependency is now optional, allowing NIXL to be built without UCX (#825).
- Upgraded the DOCA dependency to 3.1 in the
nixlbenchcontainer (#760).
Benchmarks
- Fixed the
nixlbenchcontainer runtime by ensuring the Python virtual environment is activated correctly (#848). - Fixed a function signature mismatch in the NVSHMEM worker (#786).
Build and Test Infrastructure
- Improved CUDA detection in
nixlbenchbuild scripts with a fallback mechanism (#777).
Full Changelog: https://github.com/ai-dynamo/nixl/compare/0.6.0...0.6.1