gpt-oss Reinforcement Learning + Auto Kernel Notebook
We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning
- Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
- Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
- gpt-oss-20b GSPO free Colab notebook
- This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.
- We previously released Vision RL with GSPO support
- ⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.
- DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here
- Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
- Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
- BERT is now fixed! Feel free to use our BERT fine-tuning notebook
Don't forget to also join our Reddit: r/unsloth 🥰
What's Changed
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3329
- Fix QAT + LoRA fast path, add tests by @andrewor14 in https://github.com/unslothai/unsloth/pull/3307
- Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in https://github.com/unslothai/unsloth/pull/3332
- Synthetic Data updates by @mmathew23 in https://github.com/unslothai/unsloth/pull/3333
- Fix loading issues for BERT by @Etherll in https://github.com/unslothai/unsloth/pull/3339
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3335
- peft_config before model_config by @mmathew23 in https://github.com/unslothai/unsloth/pull/3342
- specify different tokenizer_path/name by @mmathew23 in https://github.com/unslothai/unsloth/pull/3343
- correct python support statement by @laz-001 in https://github.com/unslothai/unsloth/pull/3374
- GPT OSS RL by @danielhanchen in https://github.com/unslothai/unsloth/pull/3362
New Contributors
- @laz-001 made their first contribution in https://github.com/unslothai/unsloth/pull/3374
Full Changelog: https://github.com/unslothai/unsloth/compare/September-2025-v2...September-2025-v3