NVIDIA Megatron Core 0.15.0
- Features
- Performance
- MoE
- Model support
- FSDP
- Enable joint training of parallel modules (MR !3850)
- Inference
- Post-training
- RL
- Ease of use
- Bug fixes
- Fix convergence bug in MXFP8 parameter gradient buffer reuse (MR !3999)
- Fix loss mask cloning to prevent incorrect updates (MR !4164)
- Fix metadata loss in checkpoints (MR !4182)
- Fix FSDP grad accum fusion support (MR !4018)
- Fix non-TE optimizer checkpoint issue (MR !3931)
- Fix BERT virtual pipeline parallelism (MR !3993)
- Fix gc.freeze() slowdown by adding gc.collect() on last layer (MR !4003)
- Known issues
- New Contributors
- @marksverdhei made their first contribution in #1980
- @Skylion007 made their first contribution in #2047
- @azzhipa made their first contribution in 5db6704
- @vicoooo26 made their first contribution in 5db6704
- @A-transformer made their first contribution in e002b5c
- @chaitanyadwivedii made their first contribution in 20b3954
We'd like to thank all our external contributors whose work was merged in this release:
- External Contributor Acknowledgements
- Fix ImportError and NameError in examples/run_simple_mcore_train_loop.py by @marksverdhei in #1980
- Optimizer refactor: clean up public get_megatron_optimizer interface by @Skylion007 in #2047
- Typo fixes from community with co-authors @vicoooo26, @azzhipa, @A-transformer in 5db6704 and e002b5c
- Fix router input jitter dtype by @chaitanyadwivedii in 20b3954
Note: Some contributions came through internal MRs and use commit hashes instead of PR numbers. We are now GitHub first so all PRs moving forward will be tested and merged in public.