Highlights

Agentic RL rollout interface [beta]

verl v0.5 introduces the AgentLoop abstraction that allows easy extension to custom rollout with tool/agent interactions. Server-based asynchronous rollout is adopted to efficiently utilize GPUs. verl provides a few example agent loop implementations including:

Multi-turn conversations and tool calls
LangGraph-based Agent

Please check the documentation for the system architecture design.

Disaggregated placement & async training [prototype]

verl v0.5 includes a community-contributed one-step-off async training recipe, with trainer and rollout deployed on disaggregated resources and off-policy model updates with staleness = 1. In a small scale experiment, the reference recipe provides 20-40% throughput gain compared to the on-policy baseline depending on the configuration. Please checkout the code and documentation for example configurations.

Remote generative reward models [prototype]

A recipe is provided as a prototype to demonstrate the recommended way to use generative reward models in verl. Documentation and code.

New features

LoRA RL support for VLMs: https://github.com/volcengine/verl/pull/2182
Better checkpoint manager support for SFT trainer https://github.com/volcengine/verl/pull/2292/
Support rollout trajectory tracing and RolloutViewer with improved debug-ability and visualization
Megatron with mbridge integration, which better supports hf model loading into megatron https://github.com/volcengine/verl/pull/2064

Important fixes & improvements

Fixed an issue with FSDP2 state_dict memory usage caused by torch 2.6. Either using verl v0.5 or torch 2.7 avoids OOMs https://github.com/volcengine/verl/pull/2606
Significantly reduced the overhead of vllm async server performance (v.s. vllm engine) https://github.com/volcengine/verl/pull/2246/
Fixed sglang + Megatron TP16 https://github.com/volcengine/verl/pull/2336
Improved SGLang + Megatron weight resharding by 10x https://github.com/volcengine/verl/pull/2418 and MoE weight resharding by 3x https://github.com/volcengine/verl/pull/2692
Significant rollout load balancing for GRPO-like algorithms via repeating samples before dispatching them https://github.com/volcengine/verl/pull/2324

Breaking changes and deprecations

Full list: https://github.com/volcengine/verl/discussions/2270

Rollout

When generate_sequences with sampling params n>1, change DataProto repeat behavior:
- chunk-dispatch-repeat: DataProto is chunked and dispatched to rollout workers, then repeated in rollout workers.
- repeat-chunk-dispatch: DataProto is repeated by n in driver, then chunked and dispatched to rollout workers. Switch from chunk-dispatch-repeat to repeat-chunk-dispatch, this change may break almost all recipes and projects using verl GRPO as submodules. https://github.com/volcengine/verl/pull/2324
verl.workers.rollout.sglang_rollout.AsyncSglangServer is now renamed as AsyncSGLangServer
vllm <= v0.6 support is dropped

Multi-turn

We are moving multi-turn supports from ChatScheduler to AgentLoop to improve usability. https://github.com/volcengine/verl/pull/2124

Megatron

Megatron recomputation options are moved to *.megatron.override_transformer_config. https://github.com/volcengine/verl/pull/2651 Default values are:

override_transformer_config:
  recompute_granularity: null
  recompute_modules:
  - core_attn
  recompute_method: null
  recompute_num_layers: null

Merged config actor_rollout_ref.(actor, ref, rollout).profiler to actor_rollout_ref.profiler

What's Changed

Trainer & FSDP

[fsdp] fix: Change the data in the update_actor function from to.('cpu') to to.(get_device_id()) by @Keilo001 in https://github.com/volcengine/verl/pull/2477
[fsdp] fix: vlm dynamic batch & unify dynamic batch api by @hiyouga in https://github.com/volcengine/verl/pull/2524
[fsdp] fix: change geo3k model name from non-vl to vl by @nanjiangwill in https://github.com/volcengine/verl/pull/2555
[trainer, recipe] feat: add support for external generative reward models by @yyDing1 in https://github.com/volcengine/verl/pull/2121
[trainer] fix: fix split placement by @vermouth1992 in https://github.com/volcengine/verl/pull/2227
[trainer, vllm] feat: add lora exclude_modules to support VL model lora training by @Cccei000 in https://github.com/volcengine/verl/pull/2182
[trainer] fix: pre-commit broken by #2354 by @ETOgaosion in https://github.com/volcengine/verl/pull/2358
[trainer, cfg] feat: add BaseConfig for all dataclass configs. Introduce dataclass for algorithm related configs by @eric-haibin-lin in https://github.com/
[trainer] fix: Use safe masked mean/sum to handle NaN values outside the mask by @Yangruipis in https://github.com/volcengine/verl/pull/2377
[trainer, data] feat: Dynamic Data Generation by @jwong8314 in https://github.com/volcengine/verl/pull/2312/verl/pull/2433
[trainer] fix: use .keys() to check 'response_mask' in TensorDict by @askender in https://github.com/volcengine/verl/pull/2491
[trainer] fix: Allow FSDP2 when doing strategy check by @HollowMan6 in https://github.com/volcengine/verl/pull/2497
[trainer] refactor: no need to call load_reward_manager in compute_reward_async by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2557
[trainer, fsdp, vllm, recipe] feat: one step off async training recipe by @imh966 in https://github.com/volcengine/verl/pull/2231
[trainer] fix: maybe_filter_out_long_prompts on image and video by @firefighter-eric in https://github.com/volcengine/verl/pull/2553
[trainer] refactor: Training Engine Interface and Development Plan by @ZihengJiang in https://github.com/volcengine/verl/pull/1977
[trainer] feat: Add FSDPCheckpointManager for SFTtrainer, support resume training, manage the number of CKPTS in keep by @Pursuer-Hsf in https://github.com/volcengine/verl/pull/2292/

Rollout & SGLang

[rollout] feat: add agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/2124
[rollout] feat: add zeromq vllm distributed executor by @wuxibin89 in https://github.com/volcengine/verl/pull/2246
[BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2257
[rollout] feat: Allow customization of async server class by @ultmaster in https://github.com/volcengine/verl/pull/2326
[rollout] fix: fix hf rollout and add single gpu test by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2371
[BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead of rollout workers by @wuxibin89 in https://github.com/volcengine/verl/pull/2324
[misc] feat: trace rollout generation and tool calls using weave by @chenhaiq in https://github.com/volcengine/verl/pull/2345
[cfg] refactor: make the rollout & ref configs more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2410
[perf] feat: add range tag to start/stop profile; clean actor_rollout_ref.profiler by @davidmlw in https://github.com/volcengine/verl/pull/2456
[rollout] feat: support mlflow in rollout trace by @chenhaiq in https://github.com/volcengine/verl/pull/2440
[rollout] feat: add ReactAgentLoop based on LangGraph by @wuxibin89 in https://github.com/volcengine/verl/pull/2463
[rollout] fix: fix bug for remax when the rollout mode is async by @none0663 in https://github.com/volcengine/verl/pull/2574
[tool] chore: introduce RolloutViewer TUI tools by @Yangruipis in https://github.com/volcengine/verl/pull/2469
[rollout,vllm] fix: A major issue in random sampling of vllm engine by @guanning03 in https://github.com/volcengine/verl/pull/2646
[tool] chore: Add log for AsyncRolloutRequest ID, and rollout viewr to support request id display and search by @Hecate0821 in https://github.com/volcengine/
[rollout] fix: use flashattn3 backend in sglang to avoid error in tool call by @chenhaiq in https://github.com/volcengine/verl/pull/2244
[rollout] fix: Make free_cache_engine option workable in latest vLLM/SGLang by @HollowMan6 in https://github.com/volcengine/verl/pull/1464
[rollout] fix: #1646 stop words for sglang rollout by @linxxx3 in https://github.com/volcengine/verl/pull/1991
[sglang, rollout] refactor: use torch.Tensor in async rollout schemas by @nanjiangwill in https://github.com/volcengine/verl/pull/2362
[rollout] fix: sglang async fail with Multi-stage Awake feature by @chenhaiq in https://github.com/volcengine/verl/pull/2365
[sglang] feat: Add multi-interaction registry support and testing by @SwordFaith in https://github.com/volcengine/verl/pull/2184
[sglang] feat: Repeat sampling parameter n into requests of GRPO in SGLang by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2258
[sglang,tool] feat: Add support for tools that generate multimodal data by @nanjiangwill in https://github.com/volcengine/verl/pull/2146
[sglang] fix: only wake up weights on infer_tp 0 by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2403
[sglang] fix: Import Error in the latest sglang by @yyDing1 in https://github.com/volcengine/verl/pull/2275
[sglang] fix: Fix qwen2vl weight keys issue by @hebiao064 in https://github.com/volcengine/verl/pull/2434
[sglang] fix: Only flush cache on TP rank=0. by @SuperCB in https://github.com/volcengine/verl/pull/2455
[sglang] feat: update weights in batch with FSDP by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2559
[sglang] fix: adding missing param for sgl async unit test by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2561
[sglang] fix: update response handling and scoring method in GSM8K interaction by @aaronyeeio in https://github.com/volcengine/verl/pull/2428
[sglang] fix: rename Sglang to SGLang following SGLang's fashion by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2672
[sglang] fix: Bug in megatron+sglang TP16 update_weights. by @SuperCB in https://github.com/volcengine/verl/pull/2336
[sglang, megatron, perf] feat: speed up megatron sglang weight update by 10x by @Yangruipis in https://github.com/volcengine/verl/pull/2418
[megatron] fix: wrong response_mask for megatron + sglang mutli-turn by @Yangruipis in https://github.com/volcengine/verl/pull/2543

Megatron

[megatron] feat: add megatron memory log by @ETOgaosion in https://github.com/volcengine/verl/pull/2272
[megatron] feat: use mbridge as megatron adaptor by @ISEEKYAN in https://github.com/volcengine/verl/pull/2064
[megatron] fix: optimizer scheduler misalignment with FSDP by @ETOgaosion in https://github.com/volcengine/verl/pull/2303
[cfg] refactor: split fsdp/megatron specific configs, consolidate shared ones for reward_model and critic by @eric-haibin-lin in https://github.com/volcengine
[megatron] feat: fused kernel lightweight by @ISEEKYAN in https://github.com/volcengine/verl/pull/2210
[megatron] feat: allow override DistributedDataParallelConfig by @ETOgaosion in https://github.com/volcengine/verl/pull/2523
[data, megatron] feat: add dynamic batching computational workload balance by @conver334 in https://github.com/volcengine/verl/pull/2452
[megatron] feat: support distributed megatron model converter and merger by @Yangruipis in https://github.com/volcengine/verl/pull/2281
[cfg] refactor: add flatten megatron trainer config generation and verification script by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2582
[BREAKING][megatron] refactor: activation checkpointing APIs by @ETOgaosion in https://github.com/volcengine/verl/pull/2651
[megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS not taking effect by @ETOgaosion in https://github.com/volcengine/verl/pull/2687

Hardware

[hardware] feat: support ray actor sharing situation on ASCEND NPU by @FightingZhen in https://github.com/volcengine/verl/pull/2341
[Hardware] feat: Support AMD (ROCMm Kernel) - Update Dockerfile/Docker Image by @yushengsu-thu in https://github.com/volcengine/verl/pull/2390
[hardware] fix: enable sleep mode on ASCEND NPU by @as12138 in https://github.com/volcengine/verl/pull/2459
[hardward] chore: Enable Generation of Wheel File During Docker Build by @rhiremat in https://github.com/volcengine/verl/pull/2332

Misc fixes

[ckpt] feat: support esi by @plutoZZZZ in https://github.com/volcengine/verl/pull/2192
[model] fix: separate minicpmo data by @hiyouga in https://github.com/volcengine/verl/pull/2212
[misc] chore: pin transformers under 4.53 by @hiyouga in https://github.com/volcengine/verl/pull/2241
[worker] fix: OOM on first iteration in multi-turn RL by @zTonyZhao in https://github.com/volcengine/verl/pull/2253
[algo] fix: correctly aggregate kl metrics in PPO actor by @0x404 in https://github.com/volcengine/verl/pull/2259
[recipe] feat: add retool recipe by @wuxibin89 in https://github.com/volcengine/verl/pull/2233
[cfg] fix: Security Enhancement Block Dangerous Modules in Sandbox Environment by @none0663 in https://github.com/volcengine/verl/pull/2170
[cfg] chore: add non-negative expected_len assertion by @LeavesLei in https://github.com/volcengine/verl/pull/2330
[algo] feat: mask out observation token in GAE by @wuxibin89 in https://github.com/volcengine/verl/pull/2337
[tool] fix: avoid exception when sandbox return None by @chenhaiq in https://github.com/volcengine/verl/pull/2346
[perf] feat: support entropy checkpointing without rmpad or sp by @FightingZhen in https://github.com/volcengine/verl/pull/2342
[ckpt] fix: edit esi doc by @plutoZZZZ in https://github.com/volcengine/verl/pull/2354
[docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future by @ETOgaosion in https://github.com/volcengine/verl/pull/2085volcengine/verl/pull/2147
[data] feat: add interface for user-defined curriculum sampler by @frrad in https://github.com/volcengine/verl/pull/2314
[cfg] fix: pickleing error in multiprocessing in the reward_fn by @none0663 in https://github.com/volcengine/verl/pull/2239
[ray] refactor: Seperate the constants into different file by @YeonwooSung in https://github.com/volcengine/verl/pull/2025
[misc] refactor: replace pkg_resources with importlib.metadata by @askender in https://github.com/volcengine/verl/pull/2392
[tool] fix: Add MCP usage documentation by @AlecHenx in https://github.com/volcengine/verl/pull/2261
[cfg] refactor: make actor config more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2379
[misc] fix: huggingface model config max_position_embeddings assertion for model with extended context length by @Wangmerlyn in https://github.com/volcengine/verl/pull/737
[data] refactor: move sampler api to experimental by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2381
[perf] feat: add npu profiler for FSDP backend by @tongtong0613 in https://github.com/volcengine/verl/pull/2194
[misc] refactor: Replace deepcopy with tensor.clone by @ji-huazhong in https://github.com/volcengine/verl/pull/2442
[misc] fix: add *.yaml to pyproject due to modular config by @nanjiangwill in https://github.com/volcengine/verl/pull/2468
[misc] feat: add py.typed file to verl/ by @frrad in https://github.com/volcengine/verl/pull/2467
[env] feat: upgrade tensordict version by @vermouth1992 in https://github.com/volcengine/verl/pull/2460
[docker] feat: provide images with deepep by @ETOgaosion in https://github.com/volcengine/verl/pull/2480
[training_utils] feat: log_generations_to_swanlab use table by @Zeyi-Lin in https://github.com/volcengine/verl/pull/2489
[env] feat: safely bump py version to 3.10 by @Tavish9 in https://github.com/volcengine/verl/pull/2421
[BUG] fix bug for #2506, when passing as response_mask to policy_loss_fn by @none0663 in https://github.com/volcengine/verl/pull/2513
[single_controller] fix: replace unittest.mock.patch with context manager for env var handling by @PeterSH6 in https://github.com/volcengine/verl/pull/2498
[recipe] fix: DAPO rewards using sandbox fusion by @HollowMan6 in https://github.com/volcengine/verl/pull/2496
[cfg] refactor: support +extra.any_key usage for the base dataclass config in verl by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2502
[ray] refactor: Use public method to get node IP by @kevin85421 in https://github.com/volcengine/verl/pull/2521
[env] fix: bump tensordict to 0.9.1 by @ultmaster in https://github.com/volcengine/verl/pull/2541
[data] fix: Add missing init files in verl experimental data folders by @JoostvDoorn in https://github.com/volcengine/verl/pull/2548
[ray] fix: strip [] for ipv6 address by @wuxibin89 in https://github.com/volcengine/verl/pull/2545
[tool] fix: correctly convert 'None' to null in sandbox fusion _process_single_case by @mathewjhan in https://github.com/volcengine/verl/pull/2409
[training_utils] fix: uneven support in split by @ultmaster in https://github.com/volcengine/verl/pull/2560
[perf] feat: Clip gsm8k solution string to optimize reward calculation by @PopSoda2002 in https://github.com/volcengine/verl/pull/2568
set use_kl_in_reward=True in reinforce_plus_plus by @Titanpku in https://github.com/volcengine/verl/pull/2580
[cfg] feat: add critic config class by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2583
[tool] fix: supports variable arguments for marked_timer by @tardis-key in https://github.com/volcengine/verl/pull/2576
[single_controller] fix: padding for kwargs by @ShareLer in https://github.com/volcengine/verl/pull/2585
[docker] fix: downgrade TransformerEngine version 2.2.1 to allow mcore image using rope fusion and provide another set of v0.5 image by @ETOgaosion in https://github.com/volcengine/verl/pull/2611volcengine/verl/pull/2292
[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node by @vermouth1992 in https://github.com/volcengine/verl/pull/2645verl/pull/2636
[perf] feat: mistral and gemma3_text mfu compute support by @xihuai18 in https://github.com/volcengine/verl/pull/2622
[misc] fix: fix prompt and response key in gemma7b example by @apeforest in https://github.com/volcengine/verl/pull/2610
[data, recipe] fix: remove redundant json parsing by @zhxieml in https://github.com/volcengine/verl/pull/2671

New Contributors

Welcome new contributors to the verl community! @rhiremat @LeavesLei @diqiuzhuanzhuan @frrad @shuyhere @askender @Tavish9 @Wangmerlyn @SuperCB @tongtong0613 @jwong8314 @ji-huazhong @Keilo001 @conver334 @JoostvDoorn @mathewjhan @PopSoda2002 @rudeigerc @Titanpku @firefighter-eric @meituan-search @xihuai18 @tardis-key @ZihengJiang @Pursuer-Hsf @beep-bebop @aaronyeeio @Hecate0821 @apeforest @zhxieml

Full Changelog: https://github.com/volcengine/verl/compare/v0.4.1...v0.5.0

Highlights

Agentic RL rollout interface [beta]

Multi-turn conversations and tool calls
LangGraph-based Agent

Please check the documentation for the system architecture design.

Disaggregated placement & async training [prototype]

Remote generative reward models [prototype]

A recipe is provided as a prototype to demonstrate the recommended way to use generative reward models in verl. Documentation and code.

New features

LoRA RL support for VLMs: https://github.com/volcengine/verl/pull/2182
Better checkpoint manager support for SFT trainer https://github.com/volcengine/verl/pull/2292/
Support rollout trajectory tracing and RolloutViewer with improved debug-ability and visualization
Megatron with mbridge integration, which better supports hf model loading into megatron https://github.com/volcengine/verl/pull/2064

Important fixes & improvements

Fixed an issue with FSDP2 state_dict memory usage caused by torch 2.6. Either using verl v0.5 or torch 2.7 avoids OOMs https://github.com/volcengine/verl/pull/2606
Significantly reduced the overhead of vllm async server performance (v.s. vllm engine) https://github.com/volcengine/verl/pull/2246/
Fixed sglang + Megatron TP16 https://github.com/volcengine/verl/pull/2336
Improved SGLang + Megatron weight resharding by 10x https://github.com/volcengine/verl/pull/2418 and MoE weight resharding by 3x https://github.com/volcengine/verl/pull/2692
Significant rollout load balancing for GRPO-like algorithms via repeating samples before dispatching them https://github.com/volcengine/verl/pull/2324

Breaking changes and deprecations

Full list: https://github.com/volcengine/verl/discussions/2270

Rollout

When generate_sequences with sampling params n>1, change DataProto repeat behavior:
- chunk-dispatch-repeat: DataProto is chunked and dispatched to rollout workers, then repeated in rollout workers.
- repeat-chunk-dispatch: DataProto is repeated by n in driver, then chunked and dispatched to rollout workers. Switch from chunk-dispatch-repeat to repeat-chunk-dispatch, this change may break almost all recipes and projects using verl GRPO as submodules. https://github.com/volcengine/verl/pull/2324
verl.workers.rollout.sglang_rollout.AsyncSglangServer is now renamed as AsyncSGLangServer
vllm <= v0.6 support is dropped

Multi-turn

We are moving multi-turn supports from ChatScheduler to AgentLoop to improve usability. https://github.com/volcengine/verl/pull/2124

Megatron

Megatron recomputation options are moved to *.megatron.override_transformer_config. https://github.com/volcengine/verl/pull/2651 Default values are:

override_transformer_config:
  recompute_granularity: null
  recompute_modules:
  - core_attn
  recompute_method: null
  recompute_num_layers: null

Merged config actor_rollout_ref.(actor, ref, rollout).profiler to actor_rollout_ref.profiler

What's Changed

Trainer & FSDP

[fsdp] fix: Change the data in the update_actor function from to.('cpu') to to.(get_device_id()) by @Keilo001 in https://github.com/volcengine/verl/pull/2477
[fsdp] fix: vlm dynamic batch & unify dynamic batch api by @hiyouga in https://github.com/volcengine/verl/pull/2524
[fsdp] fix: change geo3k model name from non-vl to vl by @nanjiangwill in https://github.com/volcengine/verl/pull/2555
[trainer, recipe] feat: add support for external generative reward models by @yyDing1 in https://github.com/volcengine/verl/pull/2121
[trainer] fix: fix split placement by @vermouth1992 in https://github.com/volcengine/verl/pull/2227
[trainer, vllm] feat: add lora exclude_modules to support VL model lora training by @Cccei000 in https://github.com/volcengine/verl/pull/2182
[trainer] fix: pre-commit broken by #2354 by @ETOgaosion in https://github.com/volcengine/verl/pull/2358
[trainer, cfg] feat: add BaseConfig for all dataclass configs. Introduce dataclass for algorithm related configs by @eric-haibin-lin in https://github.com/
[trainer] fix: Use safe masked mean/sum to handle NaN values outside the mask by @Yangruipis in https://github.com/volcengine/verl/pull/2377
[trainer, data] feat: Dynamic Data Generation by @jwong8314 in https://github.com/volcengine/verl/pull/2312/verl/pull/2433
[trainer] fix: use .keys() to check 'response_mask' in TensorDict by @askender in https://github.com/volcengine/verl/pull/2491
[trainer] fix: Allow FSDP2 when doing strategy check by @HollowMan6 in https://github.com/volcengine/verl/pull/2497
[trainer] refactor: no need to call load_reward_manager in compute_reward_async by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2557
[trainer, fsdp, vllm, recipe] feat: one step off async training recipe by @imh966 in https://github.com/volcengine/verl/pull/2231
[trainer] fix: maybe_filter_out_long_prompts on image and video by @firefighter-eric in https://github.com/volcengine/verl/pull/2553
[trainer] refactor: Training Engine Interface and Development Plan by @ZihengJiang in https://github.com/volcengine/verl/pull/1977
[trainer] feat: Add FSDPCheckpointManager for SFTtrainer, support resume training, manage the number of CKPTS in keep by @Pursuer-Hsf in https://github.com/volcengine/verl/pull/2292/

Rollout & SGLang

[rollout] feat: add agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/2124
[rollout] feat: add zeromq vllm distributed executor by @wuxibin89 in https://github.com/volcengine/verl/pull/2246
[BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2257
[rollout] feat: Allow customization of async server class by @ultmaster in https://github.com/volcengine/verl/pull/2326
[rollout] fix: fix hf rollout and add single gpu test by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2371
[BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead of rollout workers by @wuxibin89 in https://github.com/volcengine/verl/pull/2324
[misc] feat: trace rollout generation and tool calls using weave by @chenhaiq in https://github.com/volcengine/verl/pull/2345
[cfg] refactor: make the rollout & ref configs more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2410
[perf] feat: add range tag to start/stop profile; clean actor_rollout_ref.profiler by @davidmlw in https://github.com/volcengine/verl/pull/2456
[rollout] feat: support mlflow in rollout trace by @chenhaiq in https://github.com/volcengine/verl/pull/2440
[rollout] feat: add ReactAgentLoop based on LangGraph by @wuxibin89 in https://github.com/volcengine/verl/pull/2463
[rollout] fix: fix bug for remax when the rollout mode is async by @none0663 in https://github.com/volcengine/verl/pull/2574
[tool] chore: introduce RolloutViewer TUI tools by @Yangruipis in https://github.com/volcengine/verl/pull/2469
[rollout,vllm] fix: A major issue in random sampling of vllm engine by @guanning03 in https://github.com/volcengine/verl/pull/2646
[tool] chore: Add log for AsyncRolloutRequest ID, and rollout viewr to support request id display and search by @Hecate0821 in https://github.com/volcengine/
[rollout] fix: use flashattn3 backend in sglang to avoid error in tool call by @chenhaiq in https://github.com/volcengine/verl/pull/2244
[rollout] fix: Make free_cache_engine option workable in latest vLLM/SGLang by @HollowMan6 in https://github.com/volcengine/verl/pull/1464
[rollout] fix: #1646 stop words for sglang rollout by @linxxx3 in https://github.com/volcengine/verl/pull/1991
[sglang, rollout] refactor: use torch.Tensor in async rollout schemas by @nanjiangwill in https://github.com/volcengine/verl/pull/2362
[rollout] fix: sglang async fail with Multi-stage Awake feature by @chenhaiq in https://github.com/volcengine/verl/pull/2365
[sglang] feat: Add multi-interaction registry support and testing by @SwordFaith in https://github.com/volcengine/verl/pull/2184
[sglang] feat: Repeat sampling parameter n into requests of GRPO in SGLang by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2258
[sglang,tool] feat: Add support for tools that generate multimodal data by @nanjiangwill in https://github.com/volcengine/verl/pull/2146
[sglang] fix: only wake up weights on infer_tp 0 by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2403
[sglang] fix: Import Error in the latest sglang by @yyDing1 in https://github.com/volcengine/verl/pull/2275
[sglang] fix: Fix qwen2vl weight keys issue by @hebiao064 in https://github.com/volcengine/verl/pull/2434
[sglang] fix: Only flush cache on TP rank=0. by @SuperCB in https://github.com/volcengine/verl/pull/2455
[sglang] feat: update weights in batch with FSDP by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2559
[sglang] fix: adding missing param for sgl async unit test by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2561
[sglang] fix: update response handling and scoring method in GSM8K interaction by @aaronyeeio in https://github.com/volcengine/verl/pull/2428
[sglang] fix: rename Sglang to SGLang following SGLang's fashion by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2672
[sglang] fix: Bug in megatron+sglang TP16 update_weights. by @SuperCB in https://github.com/volcengine/verl/pull/2336
[sglang, megatron, perf] feat: speed up megatron sglang weight update by 10x by @Yangruipis in https://github.com/volcengine/verl/pull/2418
[megatron] fix: wrong response_mask for megatron + sglang mutli-turn by @Yangruipis in https://github.com/volcengine/verl/pull/2543

Megatron

[megatron] feat: add megatron memory log by @ETOgaosion in https://github.com/volcengine/verl/pull/2272
[megatron] feat: use mbridge as megatron adaptor by @ISEEKYAN in https://github.com/volcengine/verl/pull/2064
[megatron] fix: optimizer scheduler misalignment with FSDP by @ETOgaosion in https://github.com/volcengine/verl/pull/2303
[cfg] refactor: split fsdp/megatron specific configs, consolidate shared ones for reward_model and critic by @eric-haibin-lin in https://github.com/volcengine
[megatron] feat: fused kernel lightweight by @ISEEKYAN in https://github.com/volcengine/verl/pull/2210
[megatron] feat: allow override DistributedDataParallelConfig by @ETOgaosion in https://github.com/volcengine/verl/pull/2523
[data, megatron] feat: add dynamic batching computational workload balance by @conver334 in https://github.com/volcengine/verl/pull/2452
[megatron] feat: support distributed megatron model converter and merger by @Yangruipis in https://github.com/volcengine/verl/pull/2281
[cfg] refactor: add flatten megatron trainer config generation and verification script by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2582
[BREAKING][megatron] refactor: activation checkpointing APIs by @ETOgaosion in https://github.com/volcengine/verl/pull/2651
[megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS not taking effect by @ETOgaosion in https://github.com/volcengine/verl/pull/2687

Hardware

[hardware] feat: support ray actor sharing situation on ASCEND NPU by @FightingZhen in https://github.com/volcengine/verl/pull/2341
[Hardware] feat: Support AMD (ROCMm Kernel) - Update Dockerfile/Docker Image by @yushengsu-thu in https://github.com/volcengine/verl/pull/2390
[hardware] fix: enable sleep mode on ASCEND NPU by @as12138 in https://github.com/volcengine/verl/pull/2459
[hardward] chore: Enable Generation of Wheel File During Docker Build by @rhiremat in https://github.com/volcengine/verl/pull/2332

Misc fixes

[ckpt] feat: support esi by @plutoZZZZ in https://github.com/volcengine/verl/pull/2192
[model] fix: separate minicpmo data by @hiyouga in https://github.com/volcengine/verl/pull/2212
[misc] chore: pin transformers under 4.53 by @hiyouga in https://github.com/volcengine/verl/pull/2241
[worker] fix: OOM on first iteration in multi-turn RL by @zTonyZhao in https://github.com/volcengine/verl/pull/2253
[algo] fix: correctly aggregate kl metrics in PPO actor by @0x404 in https://github.com/volcengine/verl/pull/2259
[recipe] feat: add retool recipe by @wuxibin89 in https://github.com/volcengine/verl/pull/2233
[cfg] fix: Security Enhancement Block Dangerous Modules in Sandbox Environment by @none0663 in https://github.com/volcengine/verl/pull/2170
[cfg] chore: add non-negative expected_len assertion by @LeavesLei in https://github.com/volcengine/verl/pull/2330
[algo] feat: mask out observation token in GAE by @wuxibin89 in https://github.com/volcengine/verl/pull/2337
[tool] fix: avoid exception when sandbox return None by @chenhaiq in https://github.com/volcengine/verl/pull/2346
[perf] feat: support entropy checkpointing without rmpad or sp by @FightingZhen in https://github.com/volcengine/verl/pull/2342
[ckpt] fix: edit esi doc by @plutoZZZZ in https://github.com/volcengine/verl/pull/2354
[docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future by @ETOgaosion in https://github.com/volcengine/verl/pull/2085volcengine/verl/pull/2147
[data] feat: add interface for user-defined curriculum sampler by @frrad in https://github.com/volcengine/verl/pull/2314
[cfg] fix: pickleing error in multiprocessing in the reward_fn by @none0663 in https://github.com/volcengine/verl/pull/2239
[ray] refactor: Seperate the constants into different file by @YeonwooSung in https://github.com/volcengine/verl/pull/2025
[misc] refactor: replace pkg_resources with importlib.metadata by @askender in https://github.com/volcengine/verl/pull/2392
[tool] fix: Add MCP usage documentation by @AlecHenx in https://github.com/volcengine/verl/pull/2261
[cfg] refactor: make actor config more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2379
[misc] fix: huggingface model config max_position_embeddings assertion for model with extended context length by @Wangmerlyn in https://github.com/volcengine/verl/pull/737
[data] refactor: move sampler api to experimental by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2381
[perf] feat: add npu profiler for FSDP backend by @tongtong0613 in https://github.com/volcengine/verl/pull/2194
[misc] refactor: Replace deepcopy with tensor.clone by @ji-huazhong in https://github.com/volcengine/verl/pull/2442
[misc] fix: add *.yaml to pyproject due to modular config by @nanjiangwill in https://github.com/volcengine/verl/pull/2468
[misc] feat: add py.typed file to verl/ by @frrad in https://github.com/volcengine/verl/pull/2467
[env] feat: upgrade tensordict version by @vermouth1992 in https://github.com/volcengine/verl/pull/2460
[docker] feat: provide images with deepep by @ETOgaosion in https://github.com/volcengine/verl/pull/2480
[training_utils] feat: log_generations_to_swanlab use table by @Zeyi-Lin in https://github.com/volcengine/verl/pull/2489
[env] feat: safely bump py version to 3.10 by @Tavish9 in https://github.com/volcengine/verl/pull/2421
[BUG] fix bug for #2506, when passing as response_mask to policy_loss_fn by @none0663 in https://github.com/volcengine/verl/pull/2513
[single_controller] fix: replace unittest.mock.patch with context manager for env var handling by @PeterSH6 in https://github.com/volcengine/verl/pull/2498
[recipe] fix: DAPO rewards using sandbox fusion by @HollowMan6 in https://github.com/volcengine/verl/pull/2496
[cfg] refactor: support +extra.any_key usage for the base dataclass config in verl by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2502
[ray] refactor: Use public method to get node IP by @kevin85421 in https://github.com/volcengine/verl/pull/2521
[env] fix: bump tensordict to 0.9.1 by @ultmaster in https://github.com/volcengine/verl/pull/2541
[data] fix: Add missing init files in verl experimental data folders by @JoostvDoorn in https://github.com/volcengine/verl/pull/2548
[ray] fix: strip [] for ipv6 address by @wuxibin89 in https://github.com/volcengine/verl/pull/2545
[tool] fix: correctly convert 'None' to null in sandbox fusion _process_single_case by @mathewjhan in https://github.com/volcengine/verl/pull/2409
[training_utils] fix: uneven support in split by @ultmaster in https://github.com/volcengine/verl/pull/2560
[perf] feat: Clip gsm8k solution string to optimize reward calculation by @PopSoda2002 in https://github.com/volcengine/verl/pull/2568
set use_kl_in_reward=True in reinforce_plus_plus by @Titanpku in https://github.com/volcengine/verl/pull/2580
[cfg] feat: add critic config class by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2583
[tool] fix: supports variable arguments for marked_timer by @tardis-key in https://github.com/volcengine/verl/pull/2576
[single_controller] fix: padding for kwargs by @ShareLer in https://github.com/volcengine/verl/pull/2585
[docker] fix: downgrade TransformerEngine version 2.2.1 to allow mcore image using rope fusion and provide another set of v0.5 image by @ETOgaosion in https://github.com/volcengine/verl/pull/2611volcengine/verl/pull/2292
[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node by @vermouth1992 in https://github.com/volcengine/verl/pull/2645verl/pull/2636
[perf] feat: mistral and gemma3_text mfu compute support by @xihuai18 in https://github.com/volcengine/verl/pull/2622
[misc] fix: fix prompt and response key in gemma7b example by @apeforest in https://github.com/volcengine/verl/pull/2610
[data, recipe] fix: remove redundant json parsing by @zhxieml in https://github.com/volcengine/verl/pull/2671

New Contributors

Full Changelog: https://github.com/volcengine/verl/compare/v0.4.1...v0.5.0

Highlights

Agentic RL rollout interface [beta]

Disaggregated placement & async training [prototype]

Remote generative reward models [prototype]

New features

Important fixes & improvements

Breaking changes and deprecations

What's Changed

Trainer & FSDP

Rollout & SGLang

Megatron

Hardware

Misc fixes

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

Highlights

Agentic RL rollout interface [beta]

Disaggregated placement & async training [prototype]

Remote generative reward models [prototype]

New features

Important fixes & improvements

Breaking changes and deprecations

What's Changed

Trainer & FSDP

Rollout & SGLang

Megatron

Hardware

Misc fixes

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp