[NEW] rhymerl: History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
TransferQueue: support multiple data partition and optimize tensor zero-copy serialization
One-step-off-policy/Fully async: optimize weight synchronization by checkpoint engine with bucket and pipeline support.
What's Changed
[data] fix: MultiturnSFTDataset handle messages with list args in tool call by @gongyisheng in https://github.com/volcengine/verl/pull/4125
[ci, doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version by @FightingZhen in https://github.com/volcengine/verl/pull/4123
[data] fix: fix global_seqlen metric by @conver334 in https://github.com/volcengine/verl/pull/4129
[ci] fix: Optimize ascend docker build workflow and dockerfile to solve OOM problem by @FightingZhen in https://github.com/volcengine/verl/pull/4137
[ci] fix: fix error limiting MindSpeed cloning depth to one by @FightingZhen in https://github.com/volcengine/verl/pull/4140
[ci] feat: specify torch and torch_npu version into ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4141
[ci] fix: move torch and torch_npu install order in ascend dockerfile to ensure installed version correct by @FightingZhen in https://github.com/volcengine/verl/pull/4142
[ci] fix: Correct version relationship between torch and torchvision in ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4143
[doc] chore: Add one_step_off_policy support doc of Ascend NPU by @baymax591 in https://github.com/volcengine/verl/pull/4151
[rollout] fix: resource pool name in standalone mode by @PeterSH6 in https://github.com/volcengine/verl/pull/4149
[ci] feat: Update e2e_ascend CI image to 8.3.RC1 version, remove weekly validation workflow by @FightingZhen in https://github.com/volcengine/verl/pull/4146
[doc] chore: add pytorch conference materials by @hongpeng-guo in https://github.com/volcengine/verl/pull/4161
[rollout] fixup load_format=dummy update_weights not do process_weight… by @Annarine in https://github.com/volcengine/verl/pull/4130
[vllm] fix: Change parameter validation to align with vllm validation by @HelloWorldBeginner in https://github.com/volcengine/verl/pull/4153
[trainer] fix: reproducible problem when resume training by @wlhgtc in https://github.com/volcengine/verl/pull/4156
[recipe, tool] feat: support multi-turn and tool call for recipe/fully_async_policy by @sl-1314 in https://github.com/volcengine/verl/pull/4067
[cfg] fix: add rollout_correcton config field with omegaconf.open_dict by @tongyx361 in https://github.com/volcengine/verl/pull/4167
[doc] fix: Misc doc fixes by @kerrickstaley in https://github.com/volcengine/verl/pull/4171
[recipe] feat: add qwen3 8b grpo one_step_off_policy script on ASCEND NPU by @baymax591 in https://github.com/volcengine/verl/pull/4163
[BREAKING][rollout] feat: change rollout to server mode by default by @wuxibin89 in https://github.com/volcengine/verl/pull/4106
[algo] feat: Add RateLimitedRewardLoopManager with three-layer rate limiting for API-based rewards by @JoyboyBrian in https://github.com/volcengine/verl/pull/4107
New Contributors
@gongyisheng made their first contribution in https://github.com/volcengine/verl/pull/4125
@Annarine made their first contribution in https://github.com/volcengine/verl/pull/4130
@HelloWorldBeginner made their first contribution in https://github.com/volcengine/verl/pull/4153
@wlhgtc made their first contribution in https://github.com/volcengine/verl/pull/4156
@sl-1314 made their first contribution in https://github.com/volcengine/verl/pull/4067
@kerrickstaley made their first contribution in https://github.com/volcengine/verl/pull/4171
@JoyboyBrian made their first contribution in https://github.com/volcengine/verl/pull/4107
@shevateng0 made their first contribution in https://github.com/volcengine/verl/pull/4139
@ashvinnihalani made their first contribution in https://github.com/volcengine/verl/pull/4091
@johnjunjun7 made their first contribution in https://github.com/volcengine/verl/pull/3427
@zjchenn made their first contribution in https://github.com/volcengine/verl/pull/4184
@HzZHoO made their first contribution in https://github.com/volcengine/verl/pull/4183
@EricMarcus-ai made their first contribution in https://github.com/volcengine/verl/pull/4185
@Shiguang-Guo made their first contribution in https://github.com/volcengine/verl/pull/4187
@Agoniii made their first contribution in https://github.com/volcengine/verl/pull/3519
@jQizhang made their first contribution in https://github.com/volcengine/verl/pull/4222
@JobQiu made their first contribution in https://github.com/volcengine/verl/pull/4248
@momo609 made their first contribution in https://github.com/volcengine/verl/pull/4166
@Kite0011 made their first contribution in https://github.com/volcengine/verl/pull/4250
@LLLLxmmm made their first contribution in https://github.com/volcengine/verl/pull/4175
@jprellberg made their first contribution in https://github.com/volcengine/verl/pull/4196
@chengminhua made their first contribution in https://github.com/volcengine/verl/pull/4209
@Leem-Li made their first contribution in https://github.com/volcengine/verl/pull/4253
@nuerxiati made their first contribution in https://github.com/volcengine/verl/pull/4165
@litianjian made their first contribution in https://github.com/volcengine/verl/pull/4101
@appletea233 made their first contribution in https://github.com/volcengine/verl/pull/4410
@jsfanfanfan made their first contribution in https://github.com/volcengine/verl/pull/4408
@icerain-alt made their first contribution in https://github.com/volcengine/verl/pull/4406
@Lokiscripter made their first contribution in https://github.com/volcengine/verl/pull/4398
Full Changelog: https://github.com/volcengine/verl/compare/v0.6.1...v0.7.0
[megatron] feat: load dist checkpoint with customized prefix for state dict keys. by @shevateng0 in https://github.com/volcengine/verl/pull/4139
[megatron] fix: Use tokenizer path or model path in config by @ashvinnihalani in https://github.com/volcengine/verl/pull/4091
[doc] chore: update docker installation guide by @wuxibin89 in https://github.com/volcengine/verl/pull/4155
[recipe] feat: DeepSeek-R1-Zero on Ascend NPU by @johnjunjun7 in https://github.com/volcengine/verl/pull/3427
[recipe] fix: compatibility with vLLM Qwen3Next model by @zjchenn in https://github.com/volcengine/verl/pull/4184
[recipe] fix: readme in recipe/r1_ascend by @HzZHoO in https://github.com/volcengine/verl/pull/4183
[recipe] fix: ReactAgentLoop error handling for failed LangGraph invocations by @le-czs in https://github.com/volcengine/verl/pull/4182
[ci] chore: Update e2e_ascend CI trigger policy by @FightingZhen in https://github.com/volcengine/verl/pull/4189
[recipe] fix: Qwen3-vl npu patch by @leisuzz in https://github.com/volcengine/verl/pull/4186
[rollout, doc] feat: limit tracing samples by @EricMarcus-ai in https://github.com/volcengine/verl/pull/4185
[worker, sglang] fix: Rename the file sglang_router.py to avoid circular imports by @Shiguang-Guo in https://github.com/volcengine/verl/pull/4187
[megatron, recipe] fix: error of megatron init while detached actor and rollout by @lalala-2 in https://github.com/volcengine/verl/pull/4179
[ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3465
[rollout, vllm] feat: support blockwise fp8 rollout by @Agoniii in https://github.com/volcengine/verl/pull/3519
[ci] feat: Move hf_transfer dependency to requirement file by @FightingZhen in https://github.com/volcengine/verl/pull/4210
[misc] feat: init random model supports custom code in the model by @HollowMan6 in https://github.com/volcengine/verl/pull/4217
[single_controller] feat: support dispatch tensordict by @vermouth1992 in https://github.com/volcengine/verl/pull/4213
[recipe, doc, ckpt] fix: error of ckpt in fully async by @lalala-2 in https://github.com/volcengine/verl/pull/4199
[megatron] feat: FP8 training by @ISEEKYAN in https://github.com/volcengine/verl/pull/4223
[megatron] feat: moe fp16 training by @HaochenYuan in https://github.com/volcengine/verl/pull/4158
[recipe] fix: incorrect reward function in fapo scripts by @yyDing1 in https://github.com/volcengine/verl/pull/4195
[rollout, vllm] feat: support blockwise FP8 rollout for vLLM v0.11 MoE RL by @jQizhang in https://github.com/volcengine/verl/pull/4222
[single_controller] feat: support multiple replicate worker in one resource pool by @yyDing1 in https://github.com/volcengine/verl/pull/4226
[megatron] fix: BF16 mode should use PAO as well by @ashvinnihalani in https://github.com/volcengine/verl/pull/4221
Revert "[megatron] fix: BF16 mode should use PAO as well" by @ISEEKYAN in https://github.com/volcengine/verl/pull/4234
[doc] feat: Add Search Self-Play to awesome work list by @Necolizer in https://github.com/volcengine/verl/pull/4245
[worker] feat: add support for colocate replicas by @yyDing1 in https://github.com/volcengine/verl/pull/4233
[trainer] feat: refactor workers with model engine by @wuxibin89 in https://github.com/volcengine/verl/pull/4211
[single_controller] feat: support resource pool split method by @yyDing1 in https://github.com/volcengine/verl/pull/4251
[recipe] fix: tighten async rollouter task handling by @le-czs in https://github.com/volcengine/verl/pull/4230
Revert "[single_controller] feat: support resource pool split method" by @vermouth1992 in https://github.com/volcengine/verl/pull/4258
Revert "[worker] feat: add support for colocate replicas" by @wuxibin89 in https://github.com/volcengine/verl/pull/4259
Revert "[single_controller] feat: support multiple replicate worker in one resource pool" by @vermouth1992 in https://github.com/volcengine/verl/pull/4260
[ci] fix: Fix triton-ascend unavailable error in Ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4254
[ci] fix: Fix error in ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4265
[rollout] fix: ensure weight sync regardless of free_cache_engine by @JobQiu in https://github.com/volcengine/verl/pull/4248
[doc] feat: add rollout&train consistency doc for Ascend Platform by @momo609 in https://github.com/volcengine/verl/pull/4166
[recipe] feat: allow customize agent name by @vermouth1992 in https://github.com/volcengine/verl/pull/4269
[ci] fix: Remove redundant uninstall command in e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4267
[megatron] Fix: fix bugs in mcore backend context-parallel code logic by @Kite0011 in https://github.com/volcengine/verl/pull/4250
[recipe] feat: add Experimental VLA RL Support by @The-Hierophant in https://github.com/volcengine/verl/pull/3918
[recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller by @LLLLxmmm in https://github.com/volcengine/verl/pull/4175
[ci] feat: Increase e2e_sft timeout from 25 to 30 minutes by @vermouth1992 in https://github.com/volcengine/verl/pull/4279
[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT by @HollowMan6 in https://github.com/volcengine/verl/pull/4063
[single_controller] feat: support resource_pool split by @yyDing1 in https://github.com/volcengine/verl/pull/4273
[recipe] feat: move recipes to new repository verl-recipe by @wuxibin89 in https://github.com/volcengine/verl/pull/4283
[worker] feat: restore colocate workers based on new splited resource pool by @yyDing1 in https://github.com/volcengine/verl/pull/4282
[misc] feat: Add actor_rollout_ref.actor.calculate_entropy for entropy fwd by @EduardDurech in https://github.com/volcengine/verl/pull/4239
[trainer] feat: Self-Normalized Importance Sampling by @EduardDurech in https://github.com/volcengine/verl/pull/3980
[ci, megatron] fix: add rotary_pos_cos_sin to forward by @HollowMan6 in https://github.com/volcengine/verl/pull/4291
[megatron] fix: pass trust_remote_code to get_generation_config by @jprellberg in https://github.com/volcengine/verl/pull/4196
[misc] fix: support nested datastructure in dataproto to convert to tensordict by @PeterSH6 in https://github.com/volcengine/verl/pull/4296
[ci] fix: use local hf model path by @wuxibin89 in https://github.com/volcengine/verl/pull/4299
[data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix by @0oshowero0 in https://github.com/volcengine/verl/pull/4289
[recipe] feat: support reward_loop for recipe/fully_async_policy by @sl-1314 in https://github.com/volcengine/verl/pull/4224
[misc] fix: fix list conversion in get_tensordict by @PeterSH6 in https://github.com/volcengine/verl/pull/4304
[hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. by @ji-huazhong in https://github.com/volcengine/verl/pull/4309
[rollout] fix: some compatibility changes in agent loop and reward by @pengwu22 in https://github.com/volcengine/verl/pull/4293
[worker] fix: do not pass router address and tokenizer is their value is none by @yyDing1 in https://github.com/volcengine/verl/pull/4310
[doc] chore: Update ascend quickstart doc by @FightingZhen in https://github.com/volcengine/verl/pull/4321
[misc] feat: add more utils of tensordict by @vermouth1992 in https://github.com/volcengine/verl/pull/4322
[recipe] fix: Fixed scripts for one_step_off_policy async not implemention by @baymax591 in https://github.com/volcengine/verl/pull/4350
[model] feat: refactor engine folder structure by @vermouth1992 in https://github.com/volcengine/verl/pull/4352
[recipe] feat: move char count recipe to verl-recipe by @vermouth1992 in https://github.com/volcengine/verl/pull/4351
[ci] chore: switch ascend ci calculation resource by @FightingZhen in https://github.com/volcengine/verl/pull/4347
feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode by @szrlee in https://github.com/volcengine/verl/pull/4360
[misc] refactor: clean up unused sharding managers by @ji-huazhong in https://github.com/volcengine/verl/pull/4361
[worker] feat: Add TrainingWorker that resembles Tinker-like API by @vermouth1992 in https://github.com/volcengine/verl/pull/4371
[vllm] fix: Fix issues that occur during the ACLGraph initialization process in the NPU. by @chengminhua in https://github.com/volcengine/verl/pull/4209
[megatron] feat: support gpt-oss by @ISEEKYAN in https://github.com/volcengine/verl/pull/4323
[megatron] fix: megatron async save ckpt fix by @Leem-Li in https://github.com/volcengine/verl/pull/4253
[misc] feat: Update news section in README.md by @vermouth1992 in https://github.com/volcengine/verl/pull/4385
[misc] fix: handle empty TensorDict in DataProto serialization by @le-czs in https://github.com/volcengine/verl/pull/4379
[trainer,fsdp] feat: enable reproducibility for training by @ji-huazhong in https://github.com/volcengine/verl/pull/4378
[trainer] feat: support ray-based sft trainer by @vermouth1992 in https://github.com/volcengine/verl/pull/4382
[megatron] feat: optimize the mbridge checkpoint saving speed by @ISEEKYAN in https://github.com/volcengine/verl/pull/4386
[rollout] feat: add support for discriminative reward model in reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4358
[recipe] feat: refactor one step off to support server mode by @ArronHZG in https://github.com/volcengine/verl/pull/4307
[misc] feat: support TensorDict in DataProtoFuture by @vermouth1992 in https://github.com/volcengine/verl/pull/4395
[fsdp] fix: Fixing the error caused by empty tensors in the multi_turn + remove_padding scenario by @nuerxiati in https://github.com/volcengine/verl/pull/4165
[doc] fix: add Geo-RS-Seq-TIS estimators and update documentation by @szrlee in https://github.com/volcengine/verl/pull/4359
[worker] feat: custom master addr port by @tongyx361 in https://github.com/volcengine/verl/pull/4389
[doc] feat: update reward loop document by @yyDing1 in https://github.com/volcengine/verl/pull/4404
[algo] feat: support router replay by @litianjian in https://github.com/volcengine/verl/pull/4101
[recipe] fix: FlowRL actor to pure implementation by @Xuekai-Zhu in https://github.com/volcengine/verl/pull/4397
[doc] feat: add more user instructions to reward loop doc by @yyDing1 in https://github.com/volcengine/verl/pull/4409
[doc] feat: add OneThinker link in readme by @appletea233 in https://github.com/volcengine/verl/pull/4410
[ci] fix: NPU not support router replay by @wuxibin89 in https://github.com/volcengine/verl/pull/4414
[worker] feat: custom reward_manager by @tongyx361 in https://github.com/volcengine/verl/pull/4387
[vllm] feat: retires vllm spmd mode in the codebase by @PeterSH6 in https://github.com/volcengine/verl/pull/4411
[sglang] fix: HTTP server startup issues for Prometheus and Grafana integration by @jsfanfanfan in https://github.com/volcengine/verl/pull/4408
[doc] chore: Update ascend quickstart and docker build guidance doc by @FightingZhen in https://github.com/volcengine/verl/pull/4420
[sglang] feat: retires sglang spmd mode in the codebase by @PeterSH6 in https://github.com/volcengine/verl/pull/4422
[fsdp] feat: update NPU fused kernels for Qwen3 moe block by @icerain-alt in https://github.com/volcengine/verl/pull/4406
[misc] refactor: clean up unused sharding manager by @ji-huazhong in https://github.com/volcengine/verl/pull/4439
[hardware] chore: clean npu_patch by @FightingZhen in https://github.com/volcengine/verl/pull/4436
[misc] fix: fix memory leakage when initializing multiple tools by @PeterSH6 in https://github.com/volcengine/verl/pull/4430
[trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe by @moehanabi in https://github.com/volcengine/verl/pull/3975
[misc] feat: optimize performance of index_select_tensor_dict by @vermouth1992 in https://github.com/volcengine/verl/pull/4444
[ci] test: Disable ReMax training test in vllm workflow by @PeterSH6 in https://github.com/volcengine/verl/pull/4445
[rollout] fix: RolloutConfig should support repetition_penalty config… by @Lokiscripter in https://github.com/volcengine/verl/pull/4398
[recipe] feat: add fully async comm between rollout and sim node in disagg mode by @HanlinDu in https://github.com/volcengine/verl/pull/4433
[misc] feat: optimize nested tensor index by @vermouth1992 in https://github.com/volcengine/verl/pull/4447
[model] feat: add qwen3-4b grpo script on ASCEND NPU A3 by @5082459 in https://github.com/volcengine/verl/pull/4432
[megatron] fix: Remove Deprecated Megatron Optimizer Args by @DaizeDong in https://github.com/volcengine/verl/pull/4396
[megatron] fix: respect use_distributed_optimizer in config by @HollowMan6 in https://github.com/volcengine/verl/pull/4392
[recipe, ci] fix: remove batch mode for remote generative reward model by @yyDing1 in https://github.com/volcengine/verl/pull/4448
[misc] feat: optimize rearrange_micro_batches by @vermouth1992 in https://github.com/volcengine/verl/pull/4451
[rollout, sglang] feat: support blockwise fp8 rollout by @Agoniii in https://github.com/volcengine/verl/pull/4415
[trainer] feat: model engine sft trainer support vlm model by @wuxibin89 in https://github.com/volcengine/verl/pull/4403
[trainer] feat: add reward loop config to default config by @yyDing1 in https://github.com/volcengine/verl/pull/4452
[vllm] feat: support abort generating requests in vllm server by @PeterSH6 in https://github.com/volcengine/verl/pull/4453
[ci] chore: cleanup some ci workflow by @wuxibin89 in https://github.com/volcengine/verl/pull/4459
[trainer] feat: allow override for reward_manager_worker in agent loop by @ryxli in https://github.com/volcengine/verl/pull/4423
[model] feat: enhances TrainingWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/4461
[recipe] feat: Modify the way of obtaining default_runtime_env by @xichengpro in https://github.com/volcengine/verl/pull/4468
[rollout] fix: mlflow consecutive slashes by @BaiqingL in https://github.com/volcengine/verl/pull/4446
[fsdp] fix: reward model also reads override config attn_implementation by @pengwu22 in https://github.com/volcengine/verl/pull/4458
[vllm] fix: compatible to vllm0.12 by @ISEEKYAN in https://github.com/volcengine/verl/pull/4473
[model] feat: support manual control load/offload by @vermouth1992 in https://github.com/volcengine/verl/pull/4472
[ci] feat: Update e2e_ascend to improve CI execution efficiency by @FightingZhen in https://github.com/volcengine/verl/pull/4477
[ci] fix: Fix e2e_ascend sft test case error by @FightingZhen in https://github.com/volcengine/verl/pull/4481
[trainer] feat: support moving ppo actor logics to single controller by @vermouth1992 in https://github.com/volcengine/verl/pull/4480
[megatron] fix: correct typo in modeling_qwen2_megatron.py by @study8677 in https://github.com/volcengine/verl/pull/4486
[fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers 4.57.x by @pengyanai in https://github.com/volcengine/verl/pull/4402
[ci] fix: fix format check error by @ji-huazhong in https://github.com/volcengine/verl/pull/4506
[hardware] feat: Auto set device_name to npu for Ascend NPU by @FightingZhen in https://github.com/volcengine/verl/pull/4489
[trainer] feat: make reward loop disrm default by @yyDing1 in https://github.com/volcengine/verl/pull/4466
[algo,doc] refactor: rollout correction by @szrlee in https://github.com/volcengine/verl/pull/4511
[trainer] feat: enable model engine based critic by @vermouth1992 in https://github.com/volcengine/verl/pull/4507
[vllm, rollout] feat: support reset prefix cache after abort by @PeterSH6 in https://github.com/volcengine/verl/pull/4519
[ci] chore: remove proxy settings in e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4527
[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager by @hellcatCS in https://github.com/volcengine/verl/pull/4505
[sglang, rollout] feat: delete remaining sglang spmd code by @PeterSH6 in https://github.com/volcengine/verl/pull/4523
[data] feat: TransferQueue - Add zero-copy serialization support & usage improvement by @0oshowero0 in https://github.com/volcengine/verl/pull/4429
[rollout] feat: pass agent_data to tool calling by @wuxibin89 in https://github.com/volcengine/verl/pull/4469
[megatron,ci] chore: update instructions and scripts for LoRA by @HollowMan6 in https://github.com/volcengine/verl/pull/4533
[megatron] chore: clean legacy code path part 1, make engine use mbridge by default by @ISEEKYAN in https://github.com/volcengine/verl/pull/4528
[megatron] chore: clean legacy code path part 2, clean legacy CI by @ISEEKYAN in https://github.com/volcengine/verl/pull/4529
[trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack by @wuxibin89 in https://github.com/volcengine/verl/pull/4492
[ray] chore: Update Ray version dependency in requirements-npu.txt by @FightingZhen in https://github.com/volcengine/verl/pull/4543
[ci] chore: migrate all rm related ci to reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4520
[algo] fix: Add seq mean mask denominator option by @szrlee in https://github.com/volcengine/verl/pull/4510
[trainer] fix: change name for reward loop worker override by @ryxli in https://github.com/volcengine/verl/pull/4549
[rollout,vllm] feat: disable sleep mode in fully-async mode by @chenjiaoAngel in https://github.com/volcengine/verl/pull/4521
[rollout, trainer] feat: extend agent loop for custom implementations by @JoyboyBrian in https://github.com/volcengine/verl/pull/4548
[rollout] chore: update reward loop file names by @yyDing1 in https://github.com/volcengine/verl/pull/4547
[ci] fix: Add mbridge dependency into e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4560
[doc] feat: add JupyterLab plugin instructions by @yqsstudy in https://github.com/volcengine/verl/pull/4536
[ci] feat: Increase e2e_sft timeout from 30 to 40 minutes by @vermouth1992 in https://github.com/volcengine/verl/pull/4552
[misc] chore: add "reward" tag to PR template by @yyDing1 in https://github.com/volcengine/verl/pull/4573
[BREAKING][recipe, ckpt] feat: support parameter sync by checkpoint-engine. only for fully_async mode. by @zpltys in https://github.com/volcengine/verl/pull/4427
[training_utils] fix: fix model enum acquire logic error in registry by @FightingZhen in https://github.com/volcengine/verl/pull/4577
[megatron] feat: add script for qwen3next training by @ISEEKYAN in https://github.com/volcengine/verl/pull/4582
[ci] fix: exclude FSDP-related source files from Megatron CI by @zzhbrr in https://github.com/volcengine/verl/pull/4574
[reward,ci] fix: cast by @tongyx361 in https://github.com/volcengine/verl/pull/4594
[vllm] feat: TensorLoRARequest support newer vLLM versions by @HollowMan6 in https://github.com/volcengine/verl/pull/4606
[misc] feat: always use robust get_event_loop by @tongyx361 in https://github.com/volcengine/verl/pull/4603
[trainer] feat: Implemented VeomniEngine as a alternative training backend by @A1waysBeenHere in https://github.com/volcengine/verl/pull/4072
[perf] fix: modify the NPU profiler default configuration by @tardis-key in https://github.com/volcengine/verl/pull/4475
[megatron] feat: support discrete profiling for mindspeed by @tardis-key in https://github.com/volcengine/verl/pull/4271
[doc] chore: update LoRA docs with megatron guidelines by @HollowMan6 in https://github.com/volcengine/verl/pull/4565
[reward] feat: Optimize reward computation when use_reward_loop=True by @none0663 in https://github.com/volcengine/verl/pull/4581
[rollout] chore: rename reward loop class name and update ci by @yyDing1 in https://github.com/volcengine/verl/pull/4572
[log] fix: fix wandb log validate run error on async-tool by @chenjiaoAngel in https://github.com/volcengine/verl/pull/4591
[sglang] fix: warmup_thread_args->warmup_thread_kwargs in aync_sglang_server.py by @EduardDurech in https://github.com/volcengine/verl/pull/4617
[reward] feat: use load_extern_object in get_custom_reward_fn, supporting pkg path by @tongyx361 in https://github.com/volcengine/verl/pull/4615
[vllm] fix: correctly pass params to from_lora_tensors in vLLM 0.12.0 by @HollowMan6 in https://github.com/volcengine/verl/pull/4614
[reward,doc] feat: enrich the reward loop documentation by @yyDing1 in https://github.com/volcengine/verl/pull/4619
[megatron] fix: fix MLA with sequence packing + CP by @wuweiqiang24 in https://github.com/volcengine/verl/pull/4611
[megatron, doc] refactor: update the megatron doc by @ISEEKYAN in https://github.com/volcengine/verl/pull/4630
[reward] feat: add retry to the request post method in the reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4628
[vllm] fix: LoRAModel import path change for vLLM 0.13.0 by @HollowMan6 in https://github.com/volcengine/verl/pull/4631
[misc] refactor: refactor flops counter by @vermouth1992 in https://github.com/volcengine/verl/pull/4633
[misc] feat: add importlib option to import external reward loop module by @PeterSH6 in https://github.com/volcengine/verl/pull/4635
[rollout] feat: ensure max_new_tokens is set correctly in sampling_params by @yanyc428 in https://github.com/volcengine/verl/pull/4634
[recipe] feat: accelerate rollout via model-free speculative decoding by @He-Jingkai in https://github.com/volcengine/verl/pull/4535
[training_utils] feat: use TMA to load Tiles in linear_cross_entropy kernels by @CtfGo in https://github.com/volcengine/verl/pull/4576
[data] feat: Add multimodal dataset fliter for user-customized results by @Kite0011 in https://github.com/volcengine/verl/pull/4608
[vllm] feat: Support online quant for rollout with torchao by @jerryzh168 in https://github.com/volcengine/verl/pull/3084
[misc] feat: Update news section in README.md by @vermouth1992 in https://github.com/volcengine/verl/pull/4646
[algo] feat: add cispo by @xvlincaigou in https://github.com/volcengine/verl/pull/4508
[data] feat: TransferQueue - remove redundant data collect for both TQ and DataProto by @0oshowero0 in https://github.com/volcengine/verl/pull/4618
[recipe, perf] feat: add nsys profiler support for env worker by @chenchaoxu7575 in https://github.com/volcengine/verl/pull/4463
[worker] fix: Add profiler initialization for ActorRolloutRefWorker in engine_worker by @pqhgit in https://github.com/volcengine/verl/pull/4586
[recipe, megatron, fsdp] fix: checkpoint-engine fix trainer param offload in fully-async mode by @zpltys in https://github.com/volcengine/verl/pull/4655
[doc] feat: Add fine-grained profiling tutorial for FSDP and Megatron on Ascend by @mengchengTang in https://github.com/volcengine/verl/pull/4610
[misc] feat: .git-blame-ignore-revs for large but non-informative commits by @tongyx361 in https://github.com/volcengine/verl/pull/4661
[doc] feat: Add OpenTinker to awesome work list by @zhusq20 in https://github.com/volcengine/verl/pull/4669
[fsdp] feat: Support zero2 optional feature for FSDP1 by @ZLiao097 in https://github.com/volcengine/verl/pull/4659
[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario by @PeterSH6 in https://github.com/volcengine/verl/pull/4668
[data] feat: TransferQueue - Support sync TransferQueue client & optimize clear interface and validation procedure by @0oshowero0 in https://github.com/volcengine/verl/pull/4660
[misc] fix: .git-blame-ignore-revs file is invalid by @HollowMan6 in https://github.com/volcengine/verl/pull/4674
[training_utils] fix: no allocator set when using TMA for kernels by @HollowMan6 in https://github.com/volcengine/verl/pull/4676
[fsdp] fix: replicate ref compute_log_prob (disable calculate_entropy ...) in LoRA by @HollowMan6 in https://github.com/volcengine/verl/pull/4675
[algo] SAPO algo by Qwen by @BounharAbdelaziz in https://github.com/volcengine/verl/pull/4345
[megatron] fix: megatron async save ckpt fix by @Leem-Li in https://github.com/volcengine/verl/pull/4638
[ci] fix: fix config by @vermouth1992 in https://github.com/volcengine/verl/pull/4685
Revert "[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario" by @vermouth1992 in https://github.com/volcengine/verl/pull/4687
[trainer, fsdp, megatron] feat: support one_step_off_policy on Ascend NPU by @baymax591 in https://github.com/volcengine/verl/pull/4686
[ci] test: add one step off policy test cases for npu by @ji-huazhong in https://github.com/volcengine/verl/pull/4485
fix(lora): use TOKEN_CLS task type for Critic model by @yurekami in https://github.com/volcengine/verl/pull/4695
fix: correct enable_activation_offload config parameter name by @yurekami in https://github.com/volcengine/verl/pull/4692
[misc] fix: deprecate rollout.mode config option by @yurekami in https://github.com/volcengine/verl/pull/4690
[ci] feat: Set Megatron related environment variable with ENV in Ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4699
[docker] feat: update stable image to vllm==0.12.0, sglang==0.5.6 by @Begunner in https://github.com/volcengine/verl/pull/4653
[rollout] fix: use configured response_length as default max_tokens in vLLM async server by @yurekami in https://github.com/volcengine/verl/pull/4703
[megatron] fix: set model to eval during compute_log_prob/compute_values by @HollowMan6 in https://github.com/volcengine/verl/pull/4708
[trainer] fix: fallback vision tower to flash_attention_2 for Qwen2.5-VL when u… by @aoshen524 in https://github.com/volcengine/verl/pull/4670
[docker] fix: new images for sgl056 and vllm012 have compatibility issues by @Begunner in https://github.com/volcengine/verl/pull/4714
[docs] feat: improve docstrings in tensordict_utils.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4732
[rollout,docs] fix: improve error message (#4682) and docstrings (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4729
docs: fix typos in code comments and messages by @yurekami in https://github.com/volcengine/verl/pull/4724
[training_utils] fix: RM extra scaling in KL/PG losses by @JacobHelwig in https://github.com/volcengine/verl/pull/4711
[deployment] feat: support build docker image with aarch64 platform by @rainj-me in https://github.com/volcengine/verl/pull/4605
[megatron] fix: Bump Megatron-Bridge commit for PEFT recompute by @HollowMan6 in https://github.com/volcengine/verl/pull/4702
[docs] feat: improve docstrings in seqlen_balancing.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4731
[doc] feat: improve docstrings in torch_functional.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4730
[reward] fix: make RateLimitedRewardManager accept legacy kwargs by @JoyboyBrian in https://github.com/volcengine/verl/pull/4739
[perf] feat: support profiler in model engine and sft trainer by @vermouth1992 in https://github.com/volcengine/verl/pull/4749
[ci] test: move cpu tests to volcengine machines by @Begunner in https://github.com/volcengine/verl/pull/4738
[trainer,megatron] fix: super tiny fix the issue of repeatedly importing the mindspeed patch by @ji-huazhong in https://github.com/volcengine/verl/pull/4751
[perf]feat: GPT-OSS mfu compute support by @mikequan0425 in https://github.com/volcengine/verl/pull/4750
[tool] fix: attach to existing MLflow run when MLFLOW_RUN_ID is set by @dubin555 in https://github.com/volcengine/verl/pull/4740
[trainer] fix: use dp_size instead of world_size in _balance_batch by @yurekami in https://github.com/volcengine/verl/pull/4697
[rollout] feat: Add vllm logprob mode and default processed_logprob by @RobotGF in https://github.com/volcengine/verl/pull/4755
[ci] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/4760
[ci] test: migrate sft test cases on npu to model engine implementation by @ji-huazhong in https://github.com/volcengine/verl/pull/4762
[doc, cfg] fix: correct typos in training and docker configurations by @Racktic in https://github.com/volcengine/verl/pull/4767
[vllm] fix: use packaging.version for correct semantic version comparison by @Racktic in https://github.com/volcengine/verl/pull/4768
Revert "[algo] fix: Add seq mean mask denominator option" by @wuxibin89 in https://github.com/volcengine/verl/pull/4769
[data] feat: major refactor RLHFDataset for multi-modal data by @wuxibin89 in https://github.com/volcengine/verl/pull/4759
[perf] feat: add remote reward manager and fix math verify issue by @yyDing1 in https://github.com/volcengine/verl/pull/4752
[trainer] feat: enable ray-based sft trainer on ascend npu by @ji-huazhong in https://github.com/volcengine/verl/pull/4764
[training_utils] fix: Nested tensor micro-batching by @JacobHelwig in https://github.com/volcengine/verl/pull/4776
[worker] fix: Config for PPO batch size by @JacobHelwig in https://github.com/volcengine/verl/pull/4773
[ci] fix: fix cpu unit test by @vermouth1992 in https://github.com/volcengine/verl/pull/4774
[cfg] chore: remove redundant fields and fix typo by @JoyboyBrian in https://github.com/volcengine/verl/pull/4754
[worker] fix: Model engine parameter offload by @JacobHelwig in https://github.com/volcengine/verl/pull/4777
[fsdp] feat: integrate TiledMLP for memory-efficient MLP computation by @kevssim in https://github.com/volcengine/verl/pull/4649
[doc] chore: Update ascend_quick_start.rst by @wucong25 in https://github.com/volcengine/verl/pull/4609
[sglang, vllm, rollout] fix: use model's max_position_embeddings for max_model_len by @PeterSH6 in https://github.com/volcengine/verl/pull/4779
[doc] fix: reward_loop enable flag name by @zhuangqh in https://github.com/volcengine/verl/pull/4788
[doc] feat: add v0.7 release blog by @wuxibin89 in https://github.com/volcengine/verl/pull/4796
@HanlinDu made their first contribution in https://github.com/volcengine/verl/pull/4433
@5082459 made their first contribution in https://github.com/volcengine/verl/pull/4432
@DaizeDong made their first contribution in https://github.com/volcengine/verl/pull/4396
@ryxli made their first contribution in https://github.com/volcengine/verl/pull/4423
@study8677 made their first contribution in https://github.com/volcengine/verl/pull/4486
@pengyanai made their first contribution in https://github.com/volcengine/verl/pull/4402
@hellcatCS made their first contribution in https://github.com/volcengine/verl/pull/4505
@yqsstudy made their first contribution in https://github.com/volcengine/verl/pull/4536
@zpltys made their first contribution in https://github.com/volcengine/verl/pull/4427
@zzhbrr made their first contribution in https://github.com/volcengine/verl/pull/4574
@wuweiqiang24 made their first contribution in https://github.com/volcengine/verl/pull/4611
@yanyc428 made their first contribution in https://github.com/volcengine/verl/pull/4634
@He-Jingkai made their first contribution in https://github.com/volcengine/verl/pull/4535
@CtfGo made their first contribution in https://github.com/volcengine/verl/pull/4576
@jerryzh168 made their first contribution in https://github.com/volcengine/verl/pull/3084
@xvlincaigou made their first contribution in https://github.com/volcengine/verl/pull/4508
@chenchaoxu7575 made their first contribution in https://github.com/volcengine/verl/pull/4463
@pqhgit made their first contribution in https://github.com/volcengine/verl/pull/4586
@mengchengTang made their first contribution in https://github.com/volcengine/verl/pull/4610
@zhusq20 made their first contribution in https://github.com/volcengine/verl/pull/4669
@BounharAbdelaziz made their first contribution in https://github.com/volcengine/verl/pull/4345
@yurekami made their first contribution in https://github.com/volcengine/verl/pull/4695
@Begunner made their first contribution in https://github.com/volcengine/verl/pull/4653
@JacobHelwig made their first contribution in https://github.com/volcengine/verl/pull/4711
@rainj-me made their first contribution in https://github.com/volcengine/verl/pull/4605
@mikequan0425 made their first contribution in https://github.com/volcengine/verl/pull/4750
@dubin555 made their first contribution in https://github.com/volcengine/verl/pull/4740
@RobotGF made their first contribution in https://github.com/volcengine/verl/pull/4755
@Racktic made their first contribution in https://github.com/volcengine/verl/pull/4767
@wucong25 made their first contribution in https://github.com/volcengine/verl/pull/4609
@zhuangqh made their first contribution in https://github.com/volcengine/verl/pull/4788