Unsloth Flex Attention + Long context gpt-oss Training
We’re excited to introduce Unsloth Flex Attention support for OpenAI gpt-oss training that enables >8× longer context lengths, >50% less VRAM usage and >1.5× faster training compared to all implementations including those using Flash Attention 3 (FA3). Unsloth Flex Attention makes it possible to train with a 60K context length on just 80GB of VRAM for BF16 LoRA. Also:
- You can now export/save your QLoRA fine-tuned gpt-oss model to llama.cpp, vLLM, or HF.
- We fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab)
- We fixed gpt-oss implementation issues, most notably ensuring that
swiglu_limit = 7.0is properly applied during MXFP4 inference in transformers - Unsloth Flex Attention scales with context, longer sequences yield bigger savings in both VRAM and training time
Full details in our blogpost: https://docs.unsloth.ai/basics/long-context-gpt-oss-training
What's Changed
- Add Qwen3 Instruct / Thinking chat templates by @Etherll in https://github.com/unslothai/unsloth/pull/3110
- Add Qwen3 4B to mapper.py by @Etherll in https://github.com/unslothai/unsloth/pull/3120
- Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3148
- Fix GPT OSS by @danielhanchen in https://github.com/unslothai/unsloth/pull/3154
- Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3169
- Update Blackwell install instructions for latest vLLM release by @qingy1337 in https://github.com/unslothai/unsloth/pull/3175
- Fix potential generator exhaustion bug in model loading file detection by @rolandtannous in https://github.com/unslothai/unsloth/pull/3167
- Fix vision model GGUF quantization_method error type by @rolandtannous in https://github.com/unslothai/unsloth/pull/3173
- Replace back ticks with single quotes by @rnowling in https://github.com/unslothai/unsloth/pull/3157
- Fix original_push_to_hub fallback by @Thiraput01 in https://github.com/unslothai/unsloth/pull/3115
- Add support for QAT + LoRA by @andrewor14 in https://github.com/unslothai/unsloth/pull/2976
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3180
- Torch 2.8 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3186
- Fix extras transformers typo in pyproject.toml by @parth2510 in https://github.com/unslothai/unsloth/pull/3187
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3195
- allow torch.float32 dtype in FastLanguageModel by @mmathew23 in https://github.com/unslothai/unsloth/pull/3204
- fix is casual for qwen3 by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3213
- Support
model.save_pretrained_torchaoby @jerryzh168 in https://github.com/unslothai/unsloth/pull/3111 - Fix gemma-3n by @mmathew23 in https://github.com/unslothai/unsloth/pull/3219
- Handle transformers move to dtype from torch_dtype by @mmathew23 in https://github.com/unslothai/unsloth/pull/3225
- chore: Fix Typos by @DefiWimar7 in https://github.com/unslothai/unsloth/pull/3224
New Contributors
- @rnowling made their first contribution in https://github.com/unslothai/unsloth/pull/3157
- @Thiraput01 made their first contribution in https://github.com/unslothai/unsloth/pull/3115
- @andrewor14 made their first contribution in https://github.com/unslothai/unsloth/pull/2976
- @parth2510 made their first contribution in https://github.com/unslothai/unsloth/pull/3187
- @jerryzh168 made their first contribution in https://github.com/unslothai/unsloth/pull/3111
- @DefiWimar7 made their first contribution in https://github.com/unslothai/unsloth/pull/3224
Full Changelog: https://github.com/unslothai/unsloth/compare/August-2025...August-2025-v2