gpt-oss Reinforcement Learning + Auto Kernel Notebook

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
gpt-oss-20b GSPO free Colab notebook
This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.

We previously released Vision RL with GSPO support
⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.
DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here
Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
BERT is now fixed! Feel free to use our BERT fine-tuning notebook

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3329
Fix QAT + LoRA fast path, add tests by @andrewor14 in https://github.com/unslothai/unsloth/pull/3307
Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in https://github.com/unslothai/unsloth/pull/3332
Synthetic Data updates by @mmathew23 in https://github.com/unslothai/unsloth/pull/3333
Fix loading issues for BERT by @Etherll in https://github.com/unslothai/unsloth/pull/3339
Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3335
peft_config before model_config by @mmathew23 in https://github.com/unslothai/unsloth/pull/3342
specify different tokenizer_path/name by @mmathew23 in https://github.com/unslothai/unsloth/pull/3343
correct python support statement by @laz-001 in https://github.com/unslothai/unsloth/pull/3374
GPT OSS RL by @danielhanchen in https://github.com/unslothai/unsloth/pull/3362

New Contributors

@laz-001 made their first contribution in https://github.com/unslothai/unsloth/pull/3374

Full Changelog: https://github.com/unslothai/unsloth/compare/September-2025-v2...September-2025-v3

What's Changed

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3329

Fix QAT + LoRA fast path, add tests by @andrewor14 in https://github.com/unslothai/unsloth/pull/3307

Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in https://github.com/unslothai/unsloth/pull/3332

Synthetic Data updates by @mmathew23 in https://github.com/unslothai/unsloth/pull/3333

Fix loading issues for BERT by @Etherll in https://github.com/unslothai/unsloth/pull/3339

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3335

peft_config before model_config by @mmathew23 in https://github.com/unslothai/unsloth/pull/3342

specify different tokenizer_path/name by @mmathew23 in https://github.com/unslothai/unsloth/pull/3343

correct python support statement by @laz-001 in https://github.com/unslothai/unsloth/pull/3374

GPT OSS RL by @danielhanchen in https://github.com/unslothai/unsloth/pull/3362

unsloth

What's Changed

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

gpt-oss Reinforcement Learning + Auto Kernel Notebook

What's Changed

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp