New
v0.27.1
What's Changed
- Fix: undefined
current_gradient_accumulation_stepsby @qgallouedec in https://github.com/huggingface/trl/pull/4852 - fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
- Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
- Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
- Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887
New Contributors
- @kdubovikov made their first contribution in https://github.com/huggingface/trl/pull/4873
- @liyc-ai made their first contribution in https://github.com/huggingface/trl/pull/4887
Full Changelog: https://github.com/huggingface/trl/compare/v0.27.0...v0.27.1