New
NVIDIA Neural Modules 2.6.0
Highlights
- Speech
- Removed the Automodel module. Automodel is available in the repo https://github.com/NVIDIA-NeMo/Automodel.
- Removed the Deploy module. Export & Deploy is available in the repo https://github.com/NVIDIA-NeMo/Export-Deploy.
- Non-Speech NeMo 2.0 collections are deprecated and will be removed in a later release. Their functionality is available in the Megatron Bridge repo at https://github.com/NVIDIA-NeMo/Megatron-Bridge.
Known Issues
- NeMo voice agent pipecat connecting issues
Detailed Changelogs:
ASR
Changelog
- fixing kernel restarting when transcribing by @weiqingw4ng :: PR: #14665
- Downgrade "datasets" library version in ASR tutorial to ensure compatibility with HF Datasets used by @KunalDhawan :: PR: #14679
- Fixing Sortformer training tutorial notebook by @tango4j :: PR: #14680
- Fix for "EncDecRNNTBPEModel transcribe() failed with TypeError" by @andrusenkoau :: PR: #14698
- Force activations and weights cast to FP32 Jasper Encoder Squeeze-Excite (merge to main) by @erastorgueva-nv :: PR: #14743
- Use lhotse dataloader for ASR models to support in-manifest channel selection for multichannel recordings by @racoiaws :: PR: #14586
- add transducer timestamps without alignments, timestamps to streaming by @lilithgrigoryan :: PR: #14766
- Adding bf16 Sortformer train and inference by @tango4j :: PR: #14627
- Replace texterrors with kaldialign library by @andrusenkoau :: PR: #14775
- fix: Use shutil.copy fallback to handle file metadata permission errors by @vipnydav :: PR: #14639
- Add Customization Capabilities to Cache-Aware Models by @artbataev :: PR: #14757
- Documentation for gpu-based phrase boosting by @andrusenkoau :: PR: #14800
- Streaming decoding policies (Wait-K and AlignAtt) for Canary model by @andrusenkoau :: PR: #14765
- Add tests for streaming buffered and cache-aware transducer models by @artbataev :: PR: #14823
- Merge updates of Multi-Talker Parakeet Model, Modules, Dataloader and Utils PR 01 by @weiqingw4ng :: PR: #14905
- Merge updates of Multi-Talker Parakeet - Unit tests and CI tests PR 02 by @weiqingw4ng :: PR: #14932
- Add Parakeet Hybrid RNNT CTC BPE Model with Prompt support by @ealbasiri :: PR: #14561
- fix notebooks by @nithinraok :: PR: #15079
- cherry pick #15070 by @nithinraok :: PR: #15082
TTS
Changelog
- Remove outdated TTS Tutorials by @blisc :: PR: #14660
- Add KokoroTTS support for voice agent framework by @tango4j :: PR: #14910
- remove language_modeling by @dimapihtar :: PR: #14192
NLP / NMT
Changelog
- Add gpt-oss by @cuichenx :: PR: #14457
- Fix sequence packing loss calculation by @rayandasoriya :: PR: #14437
- [Perf script] Llama and GPT3 perf script use mlp cast fusion by @guyueh1 :: PR: #14575
- Delete tutorials/llm/llama/biomedical-qa directory by @cuichenx :: PR: #14653
- Add gpt-oss lora exporter by @cuichenx :: PR: #14589
- Replace MegatronTokenizer with MegatronLegacyTokenizer by @chtruong814 :: PR: #14721
- Update ModelCommPGs API from megatron-core by @yaoyu-33 :: PR: #14578
- feat: Compatibility modification of megatron-fsdp by @shjwudp :: PR: #14593
- imported get_moe_layer_wise_logging_tracker from megatron core moe_utils by @prathamk-tw :: PR: #14694
- Fix gpt-oss yarn_original_max_position_embeddings value by @cuichenx :: PR: #14706
- Update docs per guidance by @pablo-garay :: PR: #14841
- Fixing three mcore links by @aschilling-nv :: PR: #14839
- Documentation for gpu-based phrase boosting by @andrusenkoau :: PR: #14800
- Update gpt-oss configs by @cuichenx :: PR: #14674
- remove language_modeling by @dimapihtar :: PR: #14192
- cp:
remove ExportDeployintor2.6.0by @pablo-garay :: PR: #15053 - cherry pick #15070 by @nithinraok :: PR: #15082
Export
Changelog
- fix: fix missing rope scaling in exporting llama embedding model by @ZhiyuLi-Nvidia :: PR: #14523
- Add gpt-oss lora exporter by @cuichenx :: PR: #14589
- Skip trt-llm and vllm install in install test by @chtruong814 :: PR: #14663
- Fix deepseek export dtype by @cuichenx :: PR: #14307
- Remove export-deploy, automodel, and eval tutorials by @chtruong814 :: PR: #14790
- cp:
remove ExportDeployintor2.6.0by @pablo-garay :: PR: #15053
Uncategorized:
Changelog
- Version bump to
2.6.0rc0.dev0by @github-actions[bot] :: PR: #14512 - [Audio]: added conformer U-Net model for SE by @nasretdinovr :: PR: #14442
- hyena/evo2: Make sure to convert to real after fp32 conversion by @antonvnv :: PR: #14515
- Force-set restore path for student in KD mode by @AAnoosheh :: PR: #14532
- Skip PTQ if PTQ model path exists by @jenchen13 :: PR: #14536
- Support QwenVL for inference API by @meatybobby :: PR: #14534
- Hyena: Allow to use unfused RMSNorm + TELinear to restore accuracy and some speed by @antonvnv :: PR: #14542
- [Audio]: added streaming mode to SpectrogramToAudio by @nasretdinovr :: PR: #14524
- Update evo2 defaults so converted checkpoints have the right parameters by @jstjohn :: PR: #14514
- deprecate t0 scripts by @dimapihtar :: PR: #14585
- cfg typo correction by @malay-nagda :: PR: #14588
- [Perf script] Add use_te_activation_func and activation_func_fp8_input_store flags by @guyueh1 :: PR: #14522
- Modify logging message to signal that RestoreConfig will be used by @balvisio :: PR: #14469
- Bump TE and Mcore by @chtruong814 :: PR: #14568
- Avoid host-device sync in PTL logging by @WanZzzzzz :: PR: #14489
- Integrate implicit filter kernel with Hyena layer by @farhadrgh :: PR: #14621
- Fix kv_channels configuration for Gemma2 27b by @ananthsub :: PR: #14590
- [Flux] small fixes by @CarlosGomes98 :: PR: #14333
- [Flux] Add MXFP8 Support by @alpha0422 :: PR: #14473
- Use hugginface_hub for downloading the FLUX checkpoint by @suiyoubi :: PR: #14638
- Fine-tune embedding models (E5-Large-V2 and LLaMA-3.2-1B) on the allnli triplet dataset with NeMo Framework by @girihemant19 :: PR: #14584
- remove service launch scripts by @dimapihtar :: PR: #14647
- Warn instead of error when chat template doesn't contain generation keyword by @jenchen13 :: PR: #14641
- Fix function calling notebook by @cuichenx :: PR: #14643
- [Audio]: fixed bug in conformer unet by @nasretdinovr :: PR: #14626
- Fix code checkout during test by @chtruong814 :: PR: #14658
- Fix Flux seed as optional Arg by @suiyoubi :: PR: #14652
- Remove PEFT scheme condition from recipe by @JRD971000 :: PR: #14661
- Add NeMo Voice Agent by @stevehuang52 :: PR: #14325
- Update get_tensor_shapes function whose signature was refactored by @AAnoosheh :: PR: #14594
- Delete nemo1 notebooks by @cuichenx :: PR: #14677
- Bump latest Mcore 020abf01 by @chtruong814 :: PR: #14676
- [Flux] correct vae_downscale_factor by @CarlosGomes98 :: PR: #14425