Vision Reinforcement Learning + Memory Efficient RL

We're excited to support Vision models for RL and even more memory efficient + faster RL! sloth magnify

Unsloth now supports vision/multimodal RL with Gemma 3, Qwen2.5-VL and other vision models. Due to Unsloth's unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, uses 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss. Qwen2.5-VL GSPO notebook Gemma 3 (4B) Vision GSPO notebook

Full details in our blogpost: https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl

This update also introduces Qwen's GSPO algorithm.
Our new vision RL support also comes now even faster & more memory efficient! Our new kernels & algos allows faster RL for text and vision LLMs with 50% less VRAM & 10× more context.
Introducing a new RL feature called 'Standby'. Before, RL requires GPU splitting between training & inference. With Unsloth Standby, you no longer have to & 'Unsloth Standby' uniquely limits speed degradation compared to other implementations and sometimes makes training even faster! Read our Blog
We released Aider Polyglot benchmarks for our DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants perform consistently better than others. Blog

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

GPT OSS Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3231
tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 by @rolandtannous in https://github.com/unslothai/unsloth/pull/3223
Update mistral.py, showed flag to not call cut cross entropy by @pluesclues in https://github.com/unslothai/unsloth/pull/3233
Remove old version constraint in dependency list by @timkpaine in https://github.com/unslothai/unsloth/pull/3237
chore: Fix Typos by @DefiWimar7 in https://github.com/unslothai/unsloth/pull/3246
Fix incorrect function call in test_qwen3_grpo.py by @stevenxdavis in https://github.com/unslothai/unsloth/pull/3212
[Intel] make intel device support ROPE by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3164
Support saving locally in model.save_pretrained_torchao by @jerryzh168 in https://github.com/unslothai/unsloth/pull/3263
fixed save_pretrained_torchao and associated tests by @rolandtannous in https://github.com/unslothai/unsloth/pull/3264
patch sftrainer to disable _is_vlm by @mmathew23 in https://github.com/unslothai/unsloth/pull/3265
Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3266
Filter vllm executor log by @Datta0 in https://github.com/unslothai/unsloth/pull/3268
llama vision inference fix by @mmathew23 in https://github.com/unslothai/unsloth/pull/3270
Add TorchAO quantization tests with FP16 models and serialization workarounds by @rolandtannous in https://github.com/unslothai/unsloth/pull/3269
GptAttention turn training off during inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/3289
Add support for QAT full fine-tuning by @andrewor14 in https://github.com/unslothai/unsloth/pull/3238
simplify unsloth_base_fast_generate by @mmathew23 in https://github.com/unslothai/unsloth/pull/3291
Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3295
[ROCm] add hip device path by @billishyahao in https://github.com/unslothai/unsloth/pull/3301
Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3322
Add support for modules_to_save in FastModel.get_peft_model by @l1ghtsource in https://github.com/unslothai/unsloth/pull/3317
Fast Inference with vLLM for VLMs by @Datta0 in https://github.com/unslothai/unsloth/pull/2975
TRL Updated version of VLM GRPO update along with GSPO by @pluesclues in https://github.com/unslothai/unsloth/pull/3132

New Contributors

@timkpaine made their first contribution in https://github.com/unslothai/unsloth/pull/3237
@stevenxdavis made their first contribution in https://github.com/unslothai/unsloth/pull/3212
@l1ghtsource made their first contribution in https://github.com/unslothai/unsloth/pull/3317

Full Changelog: https://github.com/unslothai/unsloth/compare/August-2025-v2...September-2025-v2

What's Changed

GPT OSS Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3231

tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 by @rolandtannous in https://github.com/unslothai/unsloth/pull/3223

Update mistral.py, showed flag to not call cut cross entropy by @pluesclues in https://github.com/unslothai/unsloth/pull/3233

Remove old version constraint in dependency list by @timkpaine in https://github.com/unslothai/unsloth/pull/3237

chore: Fix Typos by @DefiWimar7 in https://github.com/unslothai/unsloth/pull/3246

Fix incorrect function call in test_qwen3_grpo.py by @stevenxdavis in https://github.com/unslothai/unsloth/pull/3212

[Intel] make intel device support ROPE by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3164

Support saving locally in model.save_pretrained_torchao by @jerryzh168 in https://github.com/unslothai/unsloth/pull/3263

fixed save_pretrained_torchao and associated tests by @rolandtannous in https://github.com/unslothai/unsloth/pull/3264

patch sftrainer to disable _is_vlm by @mmathew23 in https://github.com/unslothai/unsloth/pull/3265

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3266

Filter vllm executor log by @Datta0 in https://github.com/unslothai/unsloth/pull/3268

llama vision inference fix by @mmathew23 in https://github.com/unslothai/unsloth/pull/3270

Add TorchAO quantization tests with FP16 models and serialization workarounds by @rolandtannous in https://github.com/unslothai/unsloth/pull/3269

GptAttention turn training off during inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/3289

Add support for QAT full fine-tuning by @andrewor14 in https://github.com/unslothai/unsloth/pull/3238

simplify unsloth_base_fast_generate by @mmathew23 in https://github.com/unslothai/unsloth/pull/3291

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3295

[ROCm] add hip device path by @billishyahao in https://github.com/unslothai/unsloth/pull/3301

Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3322

Add support for modules_to_save in FastModel.get_peft_model by @l1ghtsource in https://github.com/unslothai/unsloth/pull/3317

Fast Inference with vLLM for VLMs by @Datta0 in https://github.com/unslothai/unsloth/pull/2975

TRL Updated version of VLM GRPO update along with GSPO by @pluesclues in https://github.com/unslothai/unsloth/pull/3132

New Contributors

@timkpaine made their first contribution in https://github.com/unslothai/unsloth/pull/3237

@stevenxdavis made their first contribution in https://github.com/unslothai/unsloth/pull/3212

@l1ghtsource made their first contribution in https://github.com/unslothai/unsloth/pull/3317

Full Changelog: https://github.com/unslothai/unsloth/compare/August-2025-v2...September-2025-v2

unsloth

What's Changed

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

Vision Reinforcement Learning + Memory Efficient RL

What's Changed

New Contributors

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp