v0.9.2: MiniCPM-o, SwanLab, APOLLO

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

Event info: https://mp.weixin.qq.com/s/viPRDlhnzS3qO9-96fMeeA

New features

🔥 APOLLO optimizer by @zhuhanqing in #6617
🔥 SwanLab experiment tracker by @Zeyi-Lin in #6401
🔥 Ray Trainer by @erictang000 in #6542
Batch inference with vLLM TP by @JieShenAI in #6190
QLoRA on Ascend NPU by @codemayq in #6601
Yarn and Llama3 rope scaling by @hiyouga in #6693
Support uv run by @erictang000 in #6907
Ollama modelfile auto-generation by @codemayq in #4686
Mistral tool prompt by @AlongWY in #5473
Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369

New models

Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
- Granite 3.0-3.1 (1B/2B/3B/8B) 📄
- PaliGemma2 (3B/10B/28B) 📄🖼️
- Moonlight (16B) 📄
- DeepSeek V2-V2.5 Base (236B) 📄
- DeepSeek V3 Base (671B) 📄
Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
- Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
- Marco-o1 (8B) 📄🤖
- Skywork-o1 (8B) 📄🤖
- Phi-4 (14B) 📄🤖
- Moonlight Instruct (16B) 📄
- Mistral Small (24B) 📄🤖
- QwQ (32B) 📄🤖
- Llama-3.3-Instruct (70B) 📄🤖
- QvQ (72B) 📄🤖🖼️
- DeepSeek V2-V2.5 (236B) 📄🤖
- DeepSeek V3 (671B) 📄🤖

New datasets

Supervised fine-tuning datasets
- OpenO1 (en) 📄
- Open Thoughts (en) 📄
- Open-R1-Math (en) 📄
- Chinese-DeepSeek-R1-Distill (zh) 📄

Changes

Refactor VLMs register by @hiyouga in #6600
Refactor mm plugin by @hiyouga in #6895
Refactor template by @hiyouga in #6896
Refactor data pipeline by @hiyouga in #6901
Update vlm arguments by @hiyouga in #6976
We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here

Bug fix

Add trust_remote_code option by @yafshar in #5819
Fix mllama config by @hiyouga in #6137 and #6140
Fix mllama pad by @hiyouga in #6151 and #6874
Pin tokenizers version by @hiyouga in #6157
Fix tokenized data loading by @village-way in #6160
Show hostname in webui by @hykilpikonna in #6170
Fix VLMs zero3 training by @hiyouga in #6233
Add skip_special_tokens by @hiyouga in #6363
Support non-reenterent-gc by @hiyouga in #6364
Add disable_shuffling option by @hiyouga in #6388
Fix gen kwargs by @hiyouga in #6395
Enable module run by @youkaichao in #6457
Fix eval loss value by @hiyouga in #6465
Fix paligemma inference by @hiyouga in #6483
Add deepseek v3 template by @piamo in #5507
Add http proxy argument in dockerfile by @shibingli in #6462
Fix trainer generate by @hiyouga in #6512
Fix pixtral DPO training by @hiyouga in #6547
Fix ray args by @stephen-nju in #6564
Fix minicpm template by @BUAADreamer in #6620
Fix stop tokens for visual detection by @hiyouga in #6624
Pin vllm version by @hiyouga in #6629
Fix mllama any image by @hiyouga in #6637 and #7053
Fix tokenizer max length by @xiaosu-zhu in #6632
Fix webui locale by @steveepreston in #6653
Fix MiniCPM-o DPO training by @BUAADreamer in #6657
Fix Qwen2 MoE training by @hiyouga in #6684
Upgrade to gradio 5 by @hiyouga in #6688
Support Japanese local file by @engchina in #6698
Fix DPO loss by @yinpu in #6722
Webui thinking mode by @hiyouga in #6778
Upgrade to transformers 4.48 by @hiyouga in #6628
Fix ci by @hiyouga in #6787
Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
Fix qwen2 tool prompt by @yueqis in #6796
Fix llama pro by @hiyouga in #6814
Allow thought in function call by @yueqis in #6797
Add ALLOW_EXTRA_ARGS by @hiyouga in #6831
Fix Qwen2vl plugin by @hiyouga in #6855
Upgrade vllm to 0.7.2 by @hiyouga in #6857
Fix unit test for tool using by @hiyouga in #6865
Skip broken data in sharegpt converter by @JJJYmmm in #6879
Fix qwen2.5 plugin for video by @JJJYmmm in #6868
Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)

Full Changelog: https://github.com/hiyouga/LLaMA-Factory/compare/v0.9.1...v0.9.2

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

Event info: https://mp.weixin.qq.com/s/viPRDlhnzS3qO9-96fMeeA

New features

🔥 APOLLO optimizer by @zhuhanqing in #6617
🔥 SwanLab experiment tracker by @Zeyi-Lin in #6401
🔥 Ray Trainer by @erictang000 in #6542
Batch inference with vLLM TP by @JieShenAI in #6190
QLoRA on Ascend NPU by @codemayq in #6601
Yarn and Llama3 rope scaling by @hiyouga in #6693
Support uv run by @erictang000 in #6907
Ollama modelfile auto-generation by @codemayq in #4686
Mistral tool prompt by @AlongWY in #5473
Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369

New models

Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
- Granite 3.0-3.1 (1B/2B/3B/8B) 📄
- PaliGemma2 (3B/10B/28B) 📄🖼️
- Moonlight (16B) 📄
- DeepSeek V2-V2.5 Base (236B) 📄
- DeepSeek V3 Base (671B) 📄
Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
- Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
- Marco-o1 (8B) 📄🤖
- Skywork-o1 (8B) 📄🤖
- Phi-4 (14B) 📄🤖
- Moonlight Instruct (16B) 📄
- Mistral Small (24B) 📄🤖
- QwQ (32B) 📄🤖
- Llama-3.3-Instruct (70B) 📄🤖
- QvQ (72B) 📄🤖🖼️
- DeepSeek V2-V2.5 (236B) 📄🤖
- DeepSeek V3 (671B) 📄🤖

New datasets

Supervised fine-tuning datasets
- OpenO1 (en) 📄
- Open Thoughts (en) 📄
- Open-R1-Math (en) 📄
- Chinese-DeepSeek-R1-Distill (zh) 📄

Changes

Refactor VLMs register by @hiyouga in #6600
Refactor mm plugin by @hiyouga in #6895
Refactor template by @hiyouga in #6896
Refactor data pipeline by @hiyouga in #6901
Update vlm arguments by @hiyouga in #6976
We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here

Bug fix

Add trust_remote_code option by @yafshar in #5819
Fix mllama config by @hiyouga in #6137 and #6140
Fix mllama pad by @hiyouga in #6151 and #6874
Pin tokenizers version by @hiyouga in #6157
Fix tokenized data loading by @village-way in #6160
Show hostname in webui by @hykilpikonna in #6170
Fix VLMs zero3 training by @hiyouga in #6233
Add skip_special_tokens by @hiyouga in #6363
Support non-reenterent-gc by @hiyouga in #6364
Add disable_shuffling option by @hiyouga in #6388
Fix gen kwargs by @hiyouga in #6395
Enable module run by @youkaichao in #6457
Fix eval loss value by @hiyouga in #6465
Fix paligemma inference by @hiyouga in #6483
Add deepseek v3 template by @piamo in #5507
Add http proxy argument in dockerfile by @shibingli in #6462
Fix trainer generate by @hiyouga in #6512
Fix pixtral DPO training by @hiyouga in #6547
Fix ray args by @stephen-nju in #6564
Fix minicpm template by @BUAADreamer in #6620
Fix stop tokens for visual detection by @hiyouga in #6624
Pin vllm version by @hiyouga in #6629
Fix mllama any image by @hiyouga in #6637 and #7053
Fix tokenizer max length by @xiaosu-zhu in #6632
Fix webui locale by @steveepreston in #6653
Fix MiniCPM-o DPO training by @BUAADreamer in #6657
Fix Qwen2 MoE training by @hiyouga in #6684
Upgrade to gradio 5 by @hiyouga in #6688
Support Japanese local file by @engchina in #6698
Fix DPO loss by @yinpu in #6722
Webui thinking mode by @hiyouga in #6778
Upgrade to transformers 4.48 by @hiyouga in #6628
Fix ci by @hiyouga in #6787
Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
Fix qwen2 tool prompt by @yueqis in #6796
Fix llama pro by @hiyouga in #6814
Allow thought in function call by @yueqis in #6797
Add ALLOW_EXTRA_ARGS by @hiyouga in #6831
Fix Qwen2vl plugin by @hiyouga in #6855
Upgrade vllm to 0.7.2 by @hiyouga in #6857
Fix unit test for tool using by @hiyouga in #6865
Skip broken data in sharegpt converter by @JJJYmmm in #6879
Fix qwen2.5 plugin for video by @JJJYmmm in #6868
Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)

Full Changelog: https://github.com/hiyouga/LLaMA-Factory/compare/v0.9.1...v0.9.2

LlamaFactory

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

New features

New models

New datasets

Changes

Bug fix

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

v0.9.2: MiniCPM-o, SwanLab, APOLLO

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

New features

New models

New datasets

Changes

Bug fix

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp