v0.27.0

Features

Add vllm_group_port argument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545
Preserve truncated tokens in BFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/4632
Support async reward functions and parallelize call to reward functions. by @pramodith in https://github.com/huggingface/trl/pull/4567
RLOO supports async rewards. by @pramodith in https://github.com/huggingface/trl/pull/4718
Support vLLM 0.12.0 by @jiqing-feng in https://github.com/huggingface/trl/pull/4117
feat: DeepSeek V3.2 Off-policy sequence masking by @casinca in https://github.com/huggingface/trl/pull/4689
🎭 Up to 50% less VRAM during forward with forward_masked_logits function by @qgallouedec in https://github.com/huggingface/trl/pull/4729
[GRPO] Add a config to limit the number of tool calling iterations by @pramodith in https://github.com/huggingface/trl/pull/4761
Switch gradient checkpointing default to use_reentrant=False (PyTorch recommended) by @qgallouedec in https://github.com/huggingface/trl/pull/4811
Add support for GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization by @nbasyl in https://github.com/huggingface/trl/pull/4785

Experimental

Move AutoModelForCausalLMWithValueHead and AutoModelForSeq2SeqLMWithValueHead to experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4654
Move DPODataCollatorWithPadding to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4667
Move DataCollatorForChatML to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4668
Move add_bos_token_if_needed and add_eos_token_if_needed to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4674
Move truncate_right and SIMPLE_CHAT_TEMPLATE to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4677
Move prepare_model_for_kbit_training, enable_gradient_checkpointing, prepare_peft_model to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4704
Move get_reward function to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4683
Remove experimental imports from testing_utils by @albertvillanova in https://github.com/huggingface/trl/pull/4727
ORPO: Avoid catastrophic cancellation in loss function by @hartmans in https://github.com/huggingface/trl/pull/4763
Refactor KTO [1/N]: Modernize model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/4783
[GOLD] add probability merging fix to implement chain rule by @kashif in https://github.com/huggingface/trl/pull/4765
Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support by @albertvillanova in https://github.com/huggingface/trl/pull/4792
Refactor KTO coordinated with DPO [b/N]: Simplify truncation logic by @albertvillanova in https://github.com/huggingface/trl/pull/4808

Fixes

Accounting for case num_generations_eval=1 in the calculation of the advantage by @qgallouedec in https://github.com/huggingface/trl/pull/4662
Fix vLLM error for tools usage not supported when running GRPO training by @apalmas-saifh in https://github.com/huggingface/trl/pull/4663
Fix GRPO config validation in case num_generations_eval is specified and different than num_generations by @apalmas-saifh in https://github.com/huggingface/trl/pull/4682
Fix top_k default value to 0 for disabling top-k filtering by @albertvillanova in https://github.com/huggingface/trl/pull/4695
Include generation_config for tiny model uploads by @qgallouedec in https://github.com/huggingface/trl/pull/4643
Fix KeyError with transformers 5.0.0+ where push_to_hub_token is removed by @Manodeepray in https://github.com/huggingface/trl/pull/4691
Overwrite model default generation config used by model.generate by @albertvillanova in https://github.com/huggingface/trl/pull/4647
Fix: handle multiple tool calls in qwen3_schema by @mattbui in https://github.com/huggingface/trl/pull/4709
Fix bugs when using multi-gpu: dataset streaming for offline trainers + dtype initialization by @kaixuanliu in https://github.com/huggingface/trl/pull/3950
Ensure llm-blender is importable with transformers >= v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4781
Monkey patch for HybridCache in Liger-Kernel with transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4798
[fix] GRPOTrainer: proper access args by @carlyou in https://github.com/huggingface/trl/pull/4801
Fix vllm compat patches to be applied only to affected versions by @albertvillanova in https://github.com/huggingface/trl/pull/4815
fix bug when sft calc outputs.token_accuracy by @kaixuanliu in https://github.com/huggingface/trl/pull/4814
fix xpu vllm client server by @jiqing-feng in https://github.com/huggingface/trl/pull/4780

Documentation and Examples

docs: add RapidFire AI integration section to SFT Trainer by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4661
Fix environment image name for BrowserGym example script by @sergiopaniego in https://github.com/huggingface/trl/pull/4680
Docs(grpo_trainer.md): Added Qwen SAPO details under Loss Types by @casinca in https://github.com/huggingface/trl/pull/4681
[docs] Adds GRPO, RSO and LoRA to Paper Index by @SSusantAchary in https://github.com/huggingface/trl/pull/4441
Enable zero3 init and 16-bit model saving for ds ulysses config by @edbeeching in https://github.com/huggingface/trl/pull/4701
Set version to packaged one in notebooks by @sergiopaniego in https://github.com/huggingface/trl/pull/4648
BrowserGym example for LLMs (no vision) by @sergiopaniego in https://github.com/huggingface/trl/pull/4696
docs: Add RapidFire AI cross-references to DPO and GRPO trainer docs by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4705
[docs] Fix RapidFire AI position in documentation by @qgallouedec in https://github.com/huggingface/trl/pull/4715
Add inference example to GRPO agent training notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4710
Upload FunctionGemma notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4721
Update agents notebook dependencies by @sergiopaniego in https://github.com/huggingface/trl/pull/4724
Add uv/hf jobs support to OpenEnv scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4720
Add GRPO QLoRA free notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4660
Hotfix for browsergym openenv notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4740
docs: fix "Good Second Issue" redirection link by @casinca in https://github.com/huggingface/trl/pull/4749
[Docs] Add SRL (Supervised Reinforcement Learning) to Community Tutorials by @s23deepak in https://github.com/huggingface/trl/pull/4758
Add LFM2.5 to GRPO notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4793
Sudoku GRPO example script using TextArena by @sergiopaniego in https://github.com/huggingface/trl/pull/4762
[EXAMPLES] Update wordle to new openenv release by @burtenshaw in https://github.com/huggingface/trl/pull/4791
Update the typos in docs/source/grpo_trainer.md by @Tianyi-Billy-Ma in https://github.com/huggingface/trl/pull/4804
Updat examples to new OpenEnv version by @sergiopaniego in https://github.com/huggingface/trl/pull/4796
Update GRPO example to use Qwen2.5 instead of Qwen2 by @BurnyCoder in https://github.com/huggingface/trl/pull/4803

Deprecations

Remove deprecated functions and parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4651
Remove MergeModelCallback from import structure by @qgallouedec in https://github.com/huggingface/trl/pull/4664
Remove ChatMlSpecialTokens by @qgallouedec in https://github.com/huggingface/trl/pull/4666
Remove unused _win_rate_completions_df function from callbacks by @qgallouedec in https://github.com/huggingface/trl/pull/4672
Deprecate max_prompt_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4703
Small fix on contributing docs by @murilo-cunha in https://github.com/huggingface/trl/pull/4753
Remove DbrxForCausalLM support by @qgallouedec in https://github.com/huggingface/trl/pull/4799

CI Improvements

Hotfix CI due to generation config by setting tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4657
Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4734
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/huggingface/trl/pull/4733
Include data type for tiny models and update tests by @qgallouedec in https://github.com/huggingface/trl/pull/4728
Change tiny model dtype from float16 to bfloat16 to fix CUDA error by @albertvillanova in https://github.com/huggingface/trl/pull/4745
Add revision override mechanism for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4769
Hotfix: Set float32 as default dtype for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4770
Hotfix CI with dev dependencies: xfail test_training_vlm_and_liger by @albertvillanova in https://github.com/huggingface/trl/pull/4777
Add initial multi-GPU CI tests for distributed training by @qgallouedec in https://github.com/huggingface/trl/pull/4784
Set dtype default to float32 by @albertvillanova in https://github.com/huggingface/trl/pull/4778
Test FSDP2 by @qgallouedec in https://github.com/huggingface/trl/pull/4813
Test ZeRO Stage 3 by @qgallouedec in https://github.com/huggingface/trl/pull/4821
Hotfix CI main tests: Pin transformers 4.57.4 by @albertvillanova in https://github.com/huggingface/trl/pull/4830
Hotfix CI distributed smoke tests: xfail test_sft_peft[zero3] by @albertvillanova in https://github.com/huggingface/trl/pull/4831
Test ZeRO Stage 2 by @qgallouedec in https://github.com/huggingface/trl/pull/4822

Miscellaneous

Move compute_accuracy to PRM Trainer file by @qgallouedec in https://github.com/huggingface/trl/pull/4656
Move clone_chat_template to chat_template_utils by @qgallouedec in https://github.com/huggingface/trl/pull/4653
Move GeometricMixtureWrapper to nash_md_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4670
Move exact_div, print_rich_table, truncate_response, forward to ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4676
Merge OnPolicyConfig and PPOConfig and move OnlineTrainerState by @qgallouedec in https://github.com/huggingface/trl/pull/4671
Move PEFT tests for AutoModelForCausalLMWithValueHead to test_ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4678
Move generate and batch_generation to ppo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4675
Import TrainerCallback from top-level transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4694
Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4690
Align import utils with transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4684
Align stable trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4687
Align GRPO and RLOO initialization by @qgallouedec in https://github.com/huggingface/trl/pull/4685
Align use of vllm_max_model_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4702
Align RLOO with GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/4706
Fix test assertion for top_k parameter in OnlineDPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4714
Disallow PeftModel + peft_config in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4713
Fix deprecation version for RLOO max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/4726
Refactor vLLM generation [3/N]: Decouple profiling from trainer by @albertvillanova in https://github.com/huggingface/trl/pull/4717
Avoid docstyle formatting for TestParseResponse by @qgallouedec in https://github.com/huggingface/trl/pull/4736
🥂 Happy New Year by @qgallouedec in https://github.com/huggingface/trl/pull/4775

New Contributors

@pointerhacker made their first contribution in https://github.com/huggingface/trl/pull/4545
@apalmas-saifh made their first contribution in https://github.com/huggingface/trl/pull/4663
@Manodeepray made their first contribution in https://github.com/huggingface/trl/pull/4691
@salmanmkc made their first contribution in https://github.com/huggingface/trl/pull/4734
@mattbui made their first contribution in https://github.com/huggingface/trl/pull/4709
@murilo-cunha made their first contribution in https://github.com/huggingface/trl/pull/4753
@hartmans made their first contribution in https://github.com/huggingface/trl/pull/4763
@s23deepak made their first contribution in https://github.com/huggingface/trl/pull/4758
@Tianyi-Billy-Ma made their first contribution in https://github.com/huggingface/trl/pull/4804
@carlyou made their first contribution in https://github.com/huggingface/trl/pull/4801
@BurnyCoder made their first contribution in https://github.com/huggingface/trl/pull/4803

Full Changelog: https://github.com/huggingface/trl/compare/v0.26.0...v0.27.0

Features

Add vllm_group_port argument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545

Preserve truncated tokens in BFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/4632

Support async reward functions and parallelize call to reward functions. by @pramodith in https://github.com/huggingface/trl/pull/4567

RLOO supports async rewards. by @pramodith in https://github.com/huggingface/trl/pull/4718

Support vLLM 0.12.0 by @jiqing-feng in https://github.com/huggingface/trl/pull/4117

feat: DeepSeek V3.2 Off-policy sequence masking by @casinca in https://github.com/huggingface/trl/pull/4689

🎭 Up to 50% less VRAM during forward with forward_masked_logits function by @qgallouedec in https://github.com/huggingface/trl/pull/4729

[GRPO] Add a config to limit the number of tool calling iterations by @pramodith in https://github.com/huggingface/trl/pull/4761

Switch gradient checkpointing default to use_reentrant=False (PyTorch recommended) by @qgallouedec in https://github.com/huggingface/trl/pull/4811

Add support for GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization by @nbasyl in https://github.com/huggingface/trl/pull/4785

Experimental

Move AutoModelForCausalLMWithValueHead and AutoModelForSeq2SeqLMWithValueHead to experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4654

Move DPODataCollatorWithPadding to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4667

Move DataCollatorForChatML to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4668

Move add_bos_token_if_needed and add_eos_token_if_needed to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4674

Move truncate_right and SIMPLE_CHAT_TEMPLATE to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4677

Move prepare_model_for_kbit_training, enable_gradient_checkpointing, prepare_peft_model to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4704

Move get_reward function to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4683

Remove experimental imports from testing_utils by @albertvillanova in https://github.com/huggingface/trl/pull/4727

ORPO: Avoid catastrophic cancellation in loss function by @hartmans in https://github.com/huggingface/trl/pull/4763

Refactor KTO [1/N]: Modernize model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/4783

[GOLD] add probability merging fix to implement chain rule by @kashif in https://github.com/huggingface/trl/pull/4765

Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support by @albertvillanova in https://github.com/huggingface/trl/pull/4792

Refactor KTO coordinated with DPO [b/N]: Simplify truncation logic by @albertvillanova in https://github.com/huggingface/trl/pull/4808

Fixes

Accounting for case num_generations_eval=1 in the calculation of the advantage by @qgallouedec in https://github.com/huggingface/trl/pull/4662

Fix vLLM error for tools usage not supported when running GRPO training by @apalmas-saifh in https://github.com/huggingface/trl/pull/4663

Fix GRPO config validation in case num_generations_eval is specified and different than num_generations by @apalmas-saifh in https://github.com/huggingface/trl/pull/4682

Fix top_k default value to 0 for disabling top-k filtering by @albertvillanova in https://github.com/huggingface/trl/pull/4695

Include generation_config for tiny model uploads by @qgallouedec in https://github.com/huggingface/trl/pull/4643

Fix KeyError with transformers 5.0.0+ where push_to_hub_token is removed by @Manodeepray in https://github.com/huggingface/trl/pull/4691

Overwrite model default generation config used by model.generate by @albertvillanova in https://github.com/huggingface/trl/pull/4647

Fix: handle multiple tool calls in qwen3_schema by @mattbui in https://github.com/huggingface/trl/pull/4709

Fix bugs when using multi-gpu: dataset streaming for offline trainers + dtype initialization by @kaixuanliu in https://github.com/huggingface/trl/pull/3950

Ensure llm-blender is importable with transformers >= v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4781

Monkey patch for HybridCache in Liger-Kernel with transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4798

[fix] GRPOTrainer: proper access args by @carlyou in https://github.com/huggingface/trl/pull/4801

Fix vllm compat patches to be applied only to affected versions by @albertvillanova in https://github.com/huggingface/trl/pull/4815

fix bug when sft calc outputs.token_accuracy by @kaixuanliu in https://github.com/huggingface/trl/pull/4814

fix xpu vllm client server by @jiqing-feng in https://github.com/huggingface/trl/pull/4780

Documentation and Examples

docs: add RapidFire AI integration section to SFT Trainer by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4661

Fix environment image name for BrowserGym example script by @sergiopaniego in https://github.com/huggingface/trl/pull/4680

Docs(grpo_trainer.md): Added Qwen SAPO details under Loss Types by @casinca in https://github.com/huggingface/trl/pull/4681

[docs] Adds GRPO, RSO and LoRA to Paper Index by @SSusantAchary in https://github.com/huggingface/trl/pull/4441

Enable zero3 init and 16-bit model saving for ds ulysses config by @edbeeching in https://github.com/huggingface/trl/pull/4701

Set version to packaged one in notebooks by @sergiopaniego in https://github.com/huggingface/trl/pull/4648

BrowserGym example for LLMs (no vision) by @sergiopaniego in https://github.com/huggingface/trl/pull/4696

docs: Add RapidFire AI cross-references to DPO and GRPO trainer docs by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4705

[docs] Fix RapidFire AI position in documentation by @qgallouedec in https://github.com/huggingface/trl/pull/4715

Add inference example to GRPO agent training notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4710

Upload FunctionGemma notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4721

Update agents notebook dependencies by @sergiopaniego in https://github.com/huggingface/trl/pull/4724

Add uv/hf jobs support to OpenEnv scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4720

Add GRPO QLoRA free notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4660

Hotfix for browsergym openenv notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4740

docs: fix "Good Second Issue" redirection link by @casinca in https://github.com/huggingface/trl/pull/4749

[Docs] Add SRL (Supervised Reinforcement Learning) to Community Tutorials by @s23deepak in https://github.com/huggingface/trl/pull/4758

Add LFM2.5 to GRPO notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4793

Sudoku GRPO example script using TextArena by @sergiopaniego in https://github.com/huggingface/trl/pull/4762

[EXAMPLES] Update wordle to new openenv release by @burtenshaw in https://github.com/huggingface/trl/pull/4791

Update the typos in docs/source/grpo_trainer.md by @Tianyi-Billy-Ma in https://github.com/huggingface/trl/pull/4804

Updat examples to new OpenEnv version by @sergiopaniego in https://github.com/huggingface/trl/pull/4796

Update GRPO example to use Qwen2.5 instead of Qwen2 by @BurnyCoder in https://github.com/huggingface/trl/pull/4803

Deprecations

Remove deprecated functions and parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4651

Remove MergeModelCallback from import structure by @qgallouedec in https://github.com/huggingface/trl/pull/4664

Remove ChatMlSpecialTokens by @qgallouedec in https://github.com/huggingface/trl/pull/4666

Remove unused _win_rate_completions_df function from callbacks by @qgallouedec in https://github.com/huggingface/trl/pull/4672

Deprecate max_prompt_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4703

Small fix on contributing docs by @murilo-cunha in https://github.com/huggingface/trl/pull/4753

Remove DbrxForCausalLM support by @qgallouedec in https://github.com/huggingface/trl/pull/4799

CI Improvements

Hotfix CI due to generation config by setting tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4657

Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4734

Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/huggingface/trl/pull/4733

Include data type for tiny models and update tests by @qgallouedec in https://github.com/huggingface/trl/pull/4728

Change tiny model dtype from float16 to bfloat16 to fix CUDA error by @albertvillanova in https://github.com/huggingface/trl/pull/4745

Add revision override mechanism for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4769

Hotfix: Set float32 as default dtype for testing tiny models by @albertvillanova in https://github.com/huggingface/trl/pull/4770

Hotfix CI with dev dependencies: xfail test_training_vlm_and_liger by @albertvillanova in https://github.com/huggingface/trl/pull/4777

Add initial multi-GPU CI tests for distributed training by @qgallouedec in https://github.com/huggingface/trl/pull/4784

Set dtype default to float32 by @albertvillanova in https://github.com/huggingface/trl/pull/4778

Test FSDP2 by @qgallouedec in https://github.com/huggingface/trl/pull/4813

Test ZeRO Stage 3 by @qgallouedec in https://github.com/huggingface/trl/pull/4821

Hotfix CI main tests: Pin transformers 4.57.4 by @albertvillanova in https://github.com/huggingface/trl/pull/4830

Hotfix CI distributed smoke tests: xfail test_sft_peft[zero3] by @albertvillanova in https://github.com/huggingface/trl/pull/4831

Test ZeRO Stage 2 by @qgallouedec in https://github.com/huggingface/trl/pull/4822

Miscellaneous

Move compute_accuracy to PRM Trainer file by @qgallouedec in https://github.com/huggingface/trl/pull/4656

Move clone_chat_template to chat_template_utils by @qgallouedec in https://github.com/huggingface/trl/pull/4653

Move GeometricMixtureWrapper to nash_md_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4670

Move exact_div, print_rich_table, truncate_response, forward to ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4676

Merge OnPolicyConfig and PPOConfig and move OnlineTrainerState by @qgallouedec in https://github.com/huggingface/trl/pull/4671

Move PEFT tests for AutoModelForCausalLMWithValueHead to test_ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4678

Move generate and batch_generation to ppo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4675

Import TrainerCallback from top-level transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4694

Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4690

Align import utils with transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4684

Align stable trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4687

Align GRPO and RLOO initialization by @qgallouedec in https://github.com/huggingface/trl/pull/4685

Align use of vllm_max_model_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4702

Align RLOO with GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/4706

Fix test assertion for top_k parameter in OnlineDPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4714

Disallow PeftModel + peft_config in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4713

Fix deprecation version for RLOO max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/4726

Refactor vLLM generation [3/N]: Decouple profiling from trainer by @albertvillanova in https://github.com/huggingface/trl/pull/4717

Avoid docstyle formatting for TestParseResponse by @qgallouedec in https://github.com/huggingface/trl/pull/4736

🥂 Happy New Year by @qgallouedec in https://github.com/huggingface/trl/pull/4775

New Contributors

@pointerhacker made their first contribution in https://github.com/huggingface/trl/pull/4545

@apalmas-saifh made their first contribution in https://github.com/huggingface/trl/pull/4663

@Manodeepray made their first contribution in https://github.com/huggingface/trl/pull/4691

@salmanmkc made their first contribution in https://github.com/huggingface/trl/pull/4734

@mattbui made their first contribution in https://github.com/huggingface/trl/pull/4709

@murilo-cunha made their first contribution in https://github.com/huggingface/trl/pull/4753

@hartmans made their first contribution in https://github.com/huggingface/trl/pull/4763

@s23deepak made their first contribution in https://github.com/huggingface/trl/pull/4758

@Tianyi-Billy-Ma made their first contribution in https://github.com/huggingface/trl/pull/4804

@carlyou made their first contribution in https://github.com/huggingface/trl/pull/4801

@BurnyCoder made their first contribution in https://github.com/huggingface/trl/pull/4803

Full Changelog: https://github.com/huggingface/trl/compare/v0.26.0...v0.27.0

trl

Features

Experimental

Fixes

Documentation and Examples

Deprecations

CI Improvements

Miscellaneous

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

v0.27.0

Features

Experimental

Fixes

Documentation and Examples

Deprecations

CI Improvements

Miscellaneous

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp