v0.18.4 Patch Release
What's Changed
- Update version by @sfc-gh-truwase in https://github.com/deepspeedai/DeepSpeed/pull/7719
- Disable deterministic option in compile tests by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7720
- Fix SuperOffloadOptimizer_Stage3 crash due to missing param_names parameter by @ImaGoodFella in https://github.com/deepspeedai/DeepSpeed/pull/7715
- [AMD][ROCm] Improve support of AMD by @k-artem in https://github.com/deepspeedai/DeepSpeed/pull/7448
- fix typo by @stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7722
- Skip none in backward hook by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7725
- [Engine] Only scale gradients if scale_wrt_gas is True by @kashif in https://github.com/deepspeedai/DeepSpeed/pull/7724
- Fix testcases that depends on triton by @k-artem in https://github.com/deepspeedai/DeepSpeed/pull/7731
- Fix rare hang in DeepSpeed Async I/O wait by releasing the Python GIL by @xylian86 in https://github.com/deepspeedai/DeepSpeed/pull/7727
- Fix #7733: Replace torch.sqrt with math.sqrt in scale_lr for sqrt method by @Rakshit-gen in https://github.com/deepspeedai/DeepSpeed/pull/7735
- replace moe checkpoint dp_world_size with seq_dp_world_size by @wukong1992 in https://github.com/deepspeedai/DeepSpeed/pull/7732
- [BUG] Fix UlyssesSPAttentionHF.register_with_transformers() crash with PEFT models by @Rakshit-gen in https://github.com/deepspeedai/DeepSpeed/pull/7737
- Add core api update blog by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7738
- Fix Nebula checkpoint engine commit() API mismatch by @Rakshit-gen in https://github.com/deepspeedai/DeepSpeed/pull/7740
- Fix DecoupledCheckpointEngine deadlock and improve reliability by @Rakshit-gen in https://github.com/deepspeedai/DeepSpeed/pull/7742
- Fix OnebitLamb NaN propagation with empty parameters by @Rakshit-gen in https://github.com/deepspeedai/DeepSpeed/pull/7736
- fix: remove premature MPI environment variable check in OpenMPIRunner by @leejianwoo-collab in https://github.com/deepspeedai/DeepSpeed/pull/7751
- Enable python 3.11 and 3.12 tests by @loadams in https://github.com/deepspeedai/DeepSpeed/pull/7007
- Add CI workflow to run tests on AWS by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7753
- Add fallback to BF16 support check by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7754
- Fix DeepCompile for PyTorch 2.8/2.9 compatibility by @tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7755
- Removed amp testcases by @k-artem in https://github.com/deepspeedai/DeepSpeed/pull/7745
- fix: avoid IndexError in BF16_Optimizer.destroy() when using DummyOptim by @leejianwoo-collab in https://github.com/deepspeedai/DeepSpeed/pull/7763
New Contributors
- @ImaGoodFella made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7715
- @k-artem made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7448
- @kashif made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7724
- @Rakshit-gen made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7735
- @leejianwoo-collab made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7751
Full Changelog: https://github.com/deepspeedai/DeepSpeed/compare/v0.18.3...v0.18.4