New
v0.27.0
Features
- Add
vllm_group_portargument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545 - Preserve truncated tokens in BFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/4632
- Support async reward functions and parallelize call to reward functions. by @pramodith in https://github.com/huggingface/trl/pull/4567
- RLOO supports async rewards. by @pramodith in https://github.com/huggingface/trl/pull/4718
- Support vLLM 0.12.0 by @jiqing-feng in https://github.com/huggingface/trl/pull/4117
- feat: DeepSeek V3.2 Off-policy sequence masking by @casinca in https://github.com/huggingface/trl/pull/4689
- 🎭 Up to 50% less VRAM during forward with
forward_masked_logitsfunction by @qgallouedec in https://github.com/huggingface/trl/pull/4729 - [GRPO] Add a config to limit the number of tool calling iterations by @pramodith in https://github.com/huggingface/trl/pull/4761
- Switch gradient checkpointing default to use_reentrant=False (PyTorch recommended) by @qgallouedec in https://github.com/huggingface/trl/pull/4811
- Add support for GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization by @nbasyl in https://github.com/huggingface/trl/pull/4785
Experimental
- Move
AutoModelForCausalLMWithValueHeadandAutoModelForSeq2SeqLMWithValueHeadto experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4654 - Move DPODataCollatorWithPadding to
experimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4667 - Move
DataCollatorForChatMLtoexperimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4668 - Move
add_bos_token_if_neededandadd_eos_token_if_neededtoexperimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4674 - Move
truncate_rightandSIMPLE_CHAT_TEMPLATEtoexperimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4677 - Move
prepare_model_for_kbit_training,enable_gradient_checkpointing,prepare_peft_modeltoexperimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4704 - Move
get_rewardfunction toexperimental.utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4683 - Remove experimental imports from testing_utils by @albertvillanova in https://github.com/huggingface/trl/pull/4727
- ORPO: Avoid catastrophic cancellation in loss function by @hartmans in https://github.com/huggingface/trl/pull/4763
- Refactor KTO [1/N]: Modernize model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/4783
- [GOLD] add probability merging fix to implement chain rule by @kashif in https://github.com/huggingface/trl/pull/4765
- Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support by @albertvillanova in https://github.com/huggingface/trl/pull/4792
- Refactor KTO coordinated with DPO [b/N]: Simplify truncation logic by @albertvillanova in https://github.com/huggingface/trl/pull/4808
Fixes
- Accounting for case
num_generations_eval=1in the calculation of the advantage by @qgallouedec in https://github.com/huggingface/trl/pull/4662 - Fix vLLM error for tools usage not supported when running GRPO training by @apalmas-saifh in https://github.com/huggingface/trl/pull/4663
- Fix GRPO config validation in case
num_generations_evalis specified and different thannum_generationsby @apalmas-saifh in https://github.com/huggingface/trl/pull/4682 - Fix top_k default value to 0 for disabling top-k filtering by @albertvillanova in https://github.com/huggingface/trl/pull/4695
- Include
generation_configfor tiny model uploads by @qgallouedec in https://github.com/huggingface/trl/pull/4643 - Fix KeyError with transformers 5.0.0+ where push_to_hub_token is removed by @Manodeepray in https://github.com/huggingface/trl/pull/4691
- Overwrite model default generation config used by model.generate by @albertvillanova in https://github.com/huggingface/trl/pull/4647
- Fix: handle multiple tool calls in
qwen3_schemaby @mattbui in https://github.com/huggingface/trl/pull/4709 - Fix bugs when using multi-gpu: dataset streaming for offline trainers + dtype initialization by @kaixuanliu in https://github.com/huggingface/trl/pull/3950
- Ensure llm-blender is importable with transformers >= v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4781
- Monkey patch for
HybridCachein Liger-Kernel with transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4798 - [fix] GRPOTrainer: proper access
argsby @carlyou in https://github.com/huggingface/trl/pull/4801 - Fix vllm compat patches to be applied only to affected versions by @albertvillanova in https://github.com/huggingface/trl/pull/4815
- fix bug when sft calc outputs.token_accuracy by @kaixuanliu in https://github.com/huggingface/trl/pull/4814
- fix xpu vllm client server by @jiqing-feng in https://github.com/huggingface/trl/pull/4780
Documentation and Examples
- docs: add RapidFire AI integration section to SFT Trainer by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4661
- Fix environment image name for BrowserGym example script by @sergiopaniego in https://github.com/huggingface/trl/pull/4680
- Docs(
grpo_trainer.md): Added Qwen SAPO details underLoss Typesby @casinca in https://github.com/huggingface/trl/pull/4681 - [docs] Adds GRPO, RSO and LoRA to Paper Index by @SSusantAchary in https://github.com/huggingface/trl/pull/4441
- Enable zero3 init and 16-bit model saving for ds ulysses config by @edbeeching in https://github.com/huggingface/trl/pull/4701
- Set version to packaged one in notebooks by @sergiopaniego in https://github.com/huggingface/trl/pull/4648
- BrowserGym example for LLMs (no vision) by @sergiopaniego in https://github.com/huggingface/trl/pull/4696
- docs: Add RapidFire AI cross-references to DPO and GRPO trainer docs by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4705
- [docs] Fix RapidFire AI position in documentation by @qgallouedec in https://github.com/huggingface/trl/pull/4715
- Add inference example to GRPO agent training notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4710
- Upload FunctionGemma notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4721
- Update agents notebook dependencies by @sergiopaniego in https://github.com/huggingface/trl/pull/4724
- Add uv/hf jobs support to OpenEnv scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4720
- Add GRPO QLoRA free notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4660
- Hotfix for browsergym openenv notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4740
- docs: fix "Good Second Issue" redirection link by @casinca in https://github.com/huggingface/trl/pull/4749
- [Docs] Add SRL (Supervised Reinforcement Learning) to Community Tutorials by @s23deepak in https://github.com/huggingface/trl/pull/4758
- Add LFM2.5 to GRPO notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4793
- Sudoku GRPO example script using TextArena by @sergiopaniego in https://github.com/huggingface/trl/pull/4762
- [EXAMPLES] Update wordle to new openenv release by @burtenshaw in https://github.com/huggingface/trl/pull/4791
- Update the typos in docs/source/grpo_trainer.md by @Tianyi-Billy-Ma in https://github.com/huggingface/trl/pull/4804
- Updat examples to new OpenEnv version by @sergiopaniego in https://github.com/huggingface/trl/pull/4796
- Update GRPO example to use Qwen2.5 instead of Qwen2 by @BurnyCoder in https://github.com/huggingface/trl/pull/4803
Deprecations
- Remove deprecated functions and parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4651
- Remove
MergeModelCallbackfrom import structure by @qgallouedec in https://github.com/huggingface/trl/pull/4664 - Remove
ChatMlSpecialTokensby @qgallouedec in https://github.com/huggingface/trl/pull/4666 - Remove unused
_win_rate_completions_dffunction from callbacks by @qgallouedec in https://github.com/huggingface/trl/pull/4672 - Deprecate max_prompt_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4703
- Small fix on contributing docs by @murilo-cunha in https://github.com/huggingface/trl/pull/4753
- Remove
DbrxForCausalLMsupport by @qgallouedec in https://github.com/huggingface/trl/pull/4799
CI Improvements
- Hotfix CI due to generation config by setting tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4657
- Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4734
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/huggingface/trl/pull/4733
- Include data type for tiny models and update tests by @qgallouedec in https://github.com/huggingface/trl/pull/4728
- Change tiny model dtype from float16 to bfloat16 to fix CUDA error by @albertvillanova in https://github.com/huggingface/trl/pull/4745
- Add revision override mechanism for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4769
- Hotfix: Set float32 as default dtype for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4770
- Hotfix CI with dev dependencies: xfail test_training_vlm_and_liger by @albertvillanova in https://github.com/huggingface/trl/pull/4777
- Add initial multi-GPU CI tests for distributed training by @qgallouedec in https://github.com/huggingface/trl/pull/4784
- Set dtype default to float32 by @albertvillanova in https://github.com/huggingface/trl/pull/4778
- Test FSDP2 by @qgallouedec in https://github.com/huggingface/trl/pull/4813
- Test ZeRO Stage 3 by @qgallouedec in https://github.com/huggingface/trl/pull/4821
- Hotfix CI main tests: Pin transformers 4.57.4 by @albertvillanova in https://github.com/huggingface/trl/pull/4830
- Hotfix CI distributed smoke tests: xfail test_sft_peft[zero3] by @albertvillanova in https://github.com/huggingface/trl/pull/4831
- Test ZeRO Stage 2 by @qgallouedec in https://github.com/huggingface/trl/pull/4822
Miscellaneous
- Move
compute_accuracyto PRM Trainer file by @qgallouedec in https://github.com/huggingface/trl/pull/4656 - Move
clone_chat_templatetochat_template_utilsby @qgallouedec in https://github.com/huggingface/trl/pull/4653 - Move
GeometricMixtureWrappertonash_md_trainer.pyby @qgallouedec in https://github.com/huggingface/trl/pull/4670 - Move
exact_div,print_rich_table,truncate_response,forwardtoppo_trainerby @qgallouedec in https://github.com/huggingface/trl/pull/4676 - Merge
OnPolicyConfigandPPOConfigand moveOnlineTrainerStateby @qgallouedec in https://github.com/huggingface/trl/pull/4671 - Move PEFT tests for
AutoModelForCausalLMWithValueHeadtotest_ppo_trainerby @qgallouedec in https://github.com/huggingface/trl/pull/4678 - Move
generateandbatch_generationtoppo_trainer.pyby @qgallouedec in https://github.com/huggingface/trl/pull/4675 - Import
TrainerCallbackfrom top-level transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4694 - Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4690
- Align import utils with transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4684
- Align stable trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4687
- Align GRPO and RLOO initialization by @qgallouedec in https://github.com/huggingface/trl/pull/4685
- Align use of vllm_max_model_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4702
- Align RLOO with GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/4706
- Fix test assertion for
top_kparameter inOnlineDPOTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4714 - Disallow
PeftModel+peft_configin trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4713 - Fix deprecation version for RLOO max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/4726
- Refactor vLLM generation [3/N]: Decouple profiling from trainer by @albertvillanova in https://github.com/huggingface/trl/pull/4717
- Avoid docstyle formatting for
TestParseResponseby @qgallouedec in https://github.com/huggingface/trl/pull/4736 - 🥂 Happy New Year by @qgallouedec in https://github.com/huggingface/trl/pull/4775
--
New Contributors
- @pointerhacker made their first contribution in https://github.com/huggingface/trl/pull/4545
- @apalmas-saifh made their first contribution in https://github.com/huggingface/trl/pull/4663
- @Manodeepray made their first contribution in https://github.com/huggingface/trl/pull/4691
- @salmanmkc made their first contribution in https://github.com/huggingface/trl/pull/4734
- @mattbui made their first contribution in https://github.com/huggingface/trl/pull/4709
- @murilo-cunha made their first contribution in https://github.com/huggingface/trl/pull/4753
- @hartmans made their first contribution in https://github.com/huggingface/trl/pull/4763
- @s23deepak made their first contribution in https://github.com/huggingface/trl/pull/4758
- @Tianyi-Billy-Ma made their first contribution in https://github.com/huggingface/trl/pull/4804
- @carlyou made their first contribution in https://github.com/huggingface/trl/pull/4801
- @BurnyCoder made their first contribution in https://github.com/huggingface/trl/pull/4803
Full Changelog: https://github.com/huggingface/trl/compare/v0.26.0...v0.27.0