v0.27.1

What's Changed

Fix: undefined current_gradient_accumulation_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4852
fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887

@kdubovikov made their first contribution in https://github.com/huggingface/trl/pull/4873
@liyc-ai made their first contribution in https://github.com/huggingface/trl/pull/4887

Full Changelog: https://github.com/huggingface/trl/compare/v0.27.0...v0.27.1

Fix: undefined current_gradient_accumulation_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4852
fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887

@kdubovikov made their first contribution in https://github.com/huggingface/trl/pull/4873
@liyc-ai made their first contribution in https://github.com/huggingface/trl/pull/4887

Full Changelog: https://github.com/huggingface/trl/compare/v0.27.0...v0.27.1