v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations
v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations
Key changes
PPO fixes and enhancements
- Fixed a bug related to vf_loss coefficient for PPO, which was introduced in v0.4 https://github.com/volcengine/verl/pull/2016
- Improved numerical stability when clamping KL divergence-related values https://github.com/volcengine/verl/pull/1779
Checkpoints related
- Switched Megatron checkpointer to mcore's dist_checkpoint, which reduces peak memory usage and improves distributed model saving performance via
*.checkpoint.async_save=True. - [BREAKING] Megatron's checkpoint directory layout is updated accordingly. Documentation
- [BREAKING] Checkpoint manager constructor now takes as the keyword to replace https://github.com/volcengine/verl/pull/2125