v0.18.1 Patch Release
What's Changed
- Add ZenFlow code for Stage 3 by @JoshWoo2003 in https://github.com/deepspeedai/DeepSpeed/pull/7516
- [XPU][CI] recover xpu-max1100 workflow by @Liangliang-Ma in https://github.com/deepspeedai/DeepSpeed/pull/7630
- Take **kwargs in init of DeepSpeedZeroOptimizer subclasses by @eternalNight in https://github.com/deepspeedai/DeepSpeed/pull/7634
- add support for tensor learning rate (vs scalar) by @NirSonnenschein in https://github.com/deepspeedai/DeepSpeed/pull/7633
- Fix illegal memory access with multi_tensor_apply size above INT_MAX by @wangyan-mms in https://github.com/deepspeedai/DeepSpeed/pull/7639
- No Muon optimizer for embeding and lm_head layer by @delock in https://github.com/deepspeedai/DeepSpeed/pull/7641
- z2: report param name and not zero id in assert by @stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7637
- z2: don't pass
dtypetoreport_ipg_memory_usageby @stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7636 - Ulysses HF Accelerate integration by @stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7638
- Add DataStates-LLM: Asynchronous Checkpointing Engine Support by @mauryaavinash95 in https://github.com/deepspeedai/DeepSpeed/pull/7166
New Contributors
- @JoshWoo2003 made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7516
- @wangyan-mms made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7639
Full Changelog: https://github.com/deepspeedai/DeepSpeed/compare/v0.18.0...v0.18.1