New
v0.9.2: MiniCPM-o, SwanLab, APOLLO
We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing ๐
- Event info: https://mp.weixin.qq.com/s/viPRDlhnzS3qO9-96fMeeA
New features
- ๐ฅ APOLLO optimizer by @zhuhanqing in #6617
- ๐ฅ SwanLab experiment tracker by @Zeyi-Lin in #6401
- ๐ฅ Ray Trainer by @erictang000 in #6542
- Batch inference with vLLM TP by @JieShenAI in #6190
- QLoRA on Ascend NPU by @codemayq in #6601
- Yarn and Llama3 rope scaling by @hiyouga in #6693
- Support
uv runby @erictang000 in #6907 - Ollama modelfile auto-generation by @codemayq in #4686
- Mistral tool prompt by @AlongWY in #5473
- Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369
New models
- Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) ๐
- Granite 3.0-3.1 (1B/2B/3B/8B) ๐
- PaliGemma2 (3B/10B/28B) ๐๐ผ๏ธ
- Moonlight (16B) ๐
- DeepSeek V2-V2.5 Base (236B) ๐
- DeepSeek V3 Base (671B) ๐
- Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 ๐๐ค
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 ๐๐ค
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 ๐๐ค
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 ๐๐ค๐ผ๏ธ
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 ๐๐ค๐ผ๏ธ
- Qwen2 Audio (7B) by @BUAADreamer in #6701 ๐๐ค๐
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 ๐๐ค๐ผ๏ธ๐
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 ๐๐ค
- Marco-o1 (8B) ๐๐ค
- Skywork-o1 (8B) ๐๐ค
- Phi-4 (14B) ๐๐ค
- Moonlight Instruct (16B) ๐
- Mistral Small (24B) ๐๐ค
- QwQ (32B) ๐๐ค
- Llama-3.3-Instruct (70B) ๐๐ค
- QvQ (72B) ๐๐ค๐ผ๏ธ
- DeepSeek V2-V2.5 (236B) ๐๐ค
- DeepSeek V3 (671B) ๐๐ค
New datasets
- Supervised fine-tuning datasets
- OpenO1 (en) ๐
- Open Thoughts (en) ๐
- Open-R1-Math (en) ๐
- Chinese-DeepSeek-R1-Distill (zh) ๐
Changes
- Refactor VLMs register by @hiyouga in #6600
- Refactor mm plugin by @hiyouga in #6895
- Refactor template by @hiyouga in #6896
- Refactor data pipeline by @hiyouga in #6901
- Update vlm arguments by @hiyouga in #6976
- We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here
Bug fix
- Add
trust_remote_codeoption by @yafshar in #5819 - Fix mllama config by @hiyouga in #6137 and #6140
- Fix mllama pad by @hiyouga in #6151 and #6874
- Pin tokenizers version by @hiyouga in #6157
- Fix tokenized data loading by @village-way in #6160
- Show hostname in webui by @hykilpikonna in #6170
- Fix VLMs zero3 training by @hiyouga in #6233
- Add
skip_special_tokensby @hiyouga in #6363 - Support non-reenterent-gc by @hiyouga in #6364
- Add
disable_shufflingoption by @hiyouga in #6388 - Fix gen kwargs by @hiyouga in #6395
- Enable module run by @youkaichao in #6457
- Fix eval loss value by @hiyouga in #6465
- Fix paligemma inference by @hiyouga in #6483
- Add deepseek v3 template by @piamo in #5507
- Add http proxy argument in dockerfile by @shibingli in #6462
- Fix trainer generate by @hiyouga in #6512
- Fix pixtral DPO training by @hiyouga in #6547
- Fix ray args by @stephen-nju in #6564
- Fix minicpm template by @BUAADreamer in #6620
- Fix stop tokens for visual detection by @hiyouga in #6624
- Pin vllm version by @hiyouga in #6629
- Fix mllama any image by @hiyouga in #6637 and #7053
- Fix tokenizer max length by @xiaosu-zhu in #6632
- Fix webui locale by @steveepreston in #6653
- Fix MiniCPM-o DPO training by @BUAADreamer in #6657
- Fix Qwen2 MoE training by @hiyouga in #6684
- Upgrade to gradio 5 by @hiyouga in #6688
- Support Japanese local file by @engchina in #6698
- Fix DPO loss by @yinpu in #6722
- Webui thinking mode by @hiyouga in #6778
- Upgrade to transformers 4.48 by @hiyouga in #6628
- Fix ci by @hiyouga in #6787
- Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
- Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
- Fix qwen2 tool prompt by @yueqis in #6796
- Fix llama pro by @hiyouga in #6814
- Allow thought in function call by @yueqis in #6797
- Add
ALLOW_EXTRA_ARGSby @hiyouga in #6831 - Fix Qwen2vl plugin by @hiyouga in #6855
- Upgrade vllm to 0.7.2 by @hiyouga in #6857
- Fix unit test for tool using by @hiyouga in #6865
- Skip broken data in sharegpt converter by @JJJYmmm in #6879
- Fix qwen2.5 plugin for video by @JJJYmmm in #6868
- Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)
Full Changelog: https://github.com/hiyouga/LLaMA-Factory/compare/v0.9.1...v0.9.2