v0.5.7

Highlights

New Model Support:
- Day 0 Support for Mimo-V2-Flash: #15207, https://lmsys.org/blog/2025-12-16-mimo-v2-flash/
- Day 0 Support for Nemotron-Nano-v3: https://lmsys.org/blog/2025-12-15-run-nvidia-nemotron-3-nano/
- Day 0 Support for LLaDA 2.0: https://lmsys.org/blog/2025-12-19-diffusion-llm/
- [SGLang-Diffusion] Day 0 Support for Qwen-Image-Edit-2509, Qwen-Image-Edit-2511, Qwen-Image-2512 and Qwen-Image-Layered
- EAGLE 3 speculative decoding draft models for popular models: https://lmsys.org/blog/2025-12-23-spec-bundle-phase-1/
Model Gateway v0.3.0 Release: https://docs.sglang.io/advanced_features/sgl_model_gateway.html
Scalable pipeline parallelism with dynamic chunking support for ultra-long contexts (PP Refactor Roadmap #11857）
Encoder Disaggregation for Multi-modal models (Roadmap #15118)
SGLang-Diffusion:
- Set --dit-layerwise-offload true to reduce peak VRAM usage by up to 30GB, and improve performance by up to 58% for all models
- Significantly reduce the latency of Qwen-Image-Edit, making it one-of-the-fastest among all open-source solutions. More improvements are on the way
- Add support for AMD/4090/5090, along with additional attention choices (sage-attn, sage-attn3), more parallelism options (TP) and enhancements to HTTP API (Google vertex supported)
- Cache-dit integration to improve performance by up to 165%

What's Changed

Refactor custom allreduce logics by @iforgetmyname in https://github.com/sgl-project/sglang/pull/13710
[Doc] Update DeepSeek-V3.2 document by @Fridge003 in https://github.com/sgl-project/sglang/pull/14321
Feature/support distilled vae generic by @baonudesifeizhai in https://github.com/sgl-project/sglang/pull/14195
[Performance] Optimize NSA Indexer K/S Buffer Access with Fused Triton Kernels by @Johnsonms in https://github.com/sgl-project/sglang/pull/13812
Update CODEOWNERS for multimodal by @mickqian in https://github.com/sgl-project/sglang/pull/14329
[bug fix] use npu phy id in container env by @jinke446 in https://github.com/sgl-project/sglang/pull/14266
[model-gateway] multimodality initialization by @slin1237 in https://github.com/sgl-project/sglang/pull/13350
[Doc] Fix DeepSeek V32 Doc by @Fridge003 in https://github.com/sgl-project/sglang/pull/14336
sync attention, deepseek doc by @b8zhong in https://github.com/sgl-project/sglang/pull/14335
[PD] Support decode pp for PD disaggregation by @ShangmingCai in https://github.com/sgl-project/sglang/pull/14265
[model-gateway] add image processor and transformer structure by @slin1237 in https://github.com/sgl-project/sglang/pull/14344
[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next by @Valentine233 in https://github.com/sgl-project/sglang/pull/12441
[bugfix] Fix prefill tbo disabled when --deepep-mode=auto by @yuhyao in https://github.com/sgl-project/sglang/pull/14333
[CI] update estimated elapsed time of some unittests by @ch-wan in https://github.com/sgl-project/sglang/pull/14347
[NPU] bug fix: w_vc need contiguous for NPU batch_matmul_transpose ops by @ZhengdQin in https://github.com/sgl-project/sglang/pull/13980
[bugfix] NpuFuseEPMoE miss initialization parameters by @chenxu140 in https://github.com/sgl-project/sglang/pull/14295
[Ascend] fix AscendAttnMaskBuilder bug to support float16 models by @MichelleWu351 in https://github.com/sgl-project/sglang/pull/14271
Tiny adjust CI testcases by @hnyls2002 in https://github.com/sgl-project/sglang/pull/14362
[NPU][Doc] updated installation guide for Ascend NPU by @VDV1985 in https://github.com/sgl-project/sglang/pull/13585
Feature/add vae path to cli doc#14004 by @baonudesifeizhai in https://github.com/sgl-project/sglang/pull/14355
[CPU] add fused_qkvzba_split_reshape_cat kernel for Qwen3-next by @blzheng in https://github.com/sgl-project/sglang/pull/12330
Single Batch Overlap for MoE Models by @Sulfur6 in https://github.com/sgl-project/sglang/pull/9660
Move custom_ops under layers; move _custom_ops.py → custom_all_reduce_ops.py by @merrymercy in https://github.com/sgl-project/sglang/pull/14326
[model-gateway] add llava model image processor and tests by @slin1237 in https://github.com/sgl-project/sglang/pull/14371

New Contributors

@baonudesifeizhai made their first contribution in https://github.com/sgl-project/sglang/pull/14195
@jinke446 made their first contribution in https://github.com/sgl-project/sglang/pull/14266
@Valentine233 made their first contribution in https://github.com/sgl-project/sglang/pull/12441
@dcampora made their first contribution in https://github.com/sgl-project/sglang/pull/14213
@rauletorresc made their first contribution in https://github.com/sgl-project/sglang/pull/14225
@cherryblo made their first contribution in https://github.com/sgl-project/sglang/pull/14143
@gmixiaojin made their first contribution in https://github.com/sgl-project/sglang/pull/13996
@Brain97 made their first contribution in https://github.com/sgl-project/sglang/pull/14234
@gwarmstrong made their first contribution in https://github.com/sgl-project/sglang/pull/14555
@btw616 made their first contribution in https://github.com/sgl-project/sglang/pull/14412
@momaek made their first contribution in https://github.com/sgl-project/sglang/pull/14573
@Prozac614 made their first contribution in https://github.com/sgl-project/sglang/pull/14606
@MingxuZh made their first contribution in https://github.com/sgl-project/sglang/pull/14687
@wplf made their first contribution in https://github.com/sgl-project/sglang/pull/14830
@trangdough made their first contribution in https://github.com/sgl-project/sglang/pull/14554
@yuchengz816-bot made their first contribution in https://github.com/sgl-project/sglang/pull/14313
@luketong777 made their first contribution in https://github.com/sgl-project/sglang/pull/14877
@Vladimir221 made their first contribution in https://github.com/sgl-project/sglang/pull/12287
@MikukuOvO made their first contribution in https://github.com/sgl-project/sglang/pull/14659
@cynial made their first contribution in https://github.com/sgl-project/sglang/pull/13989
@JamesBrianD made their first contribution in https://github.com/sgl-project/sglang/pull/15056
@thenumberouscode made their first contribution in https://github.com/sgl-project/sglang/pull/13969
@danielafrimi made their first contribution in https://github.com/sgl-project/sglang/pull/15113
@Ratish1 made their first contribution in https://github.com/sgl-project/sglang/pull/15052
@mmdbhs made their first contribution in https://github.com/sgl-project/sglang/pull/13914
@Goalina made their first contribution in https://github.com/sgl-project/sglang/pull/15152
@RuixiangMa made their first contribution in https://github.com/sgl-project/sglang/pull/15017
@XDaoHong made their first contribution in https://github.com/sgl-project/sglang/pull/13410

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.5.6...v0.5.7

sglang

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

Highlights

What's Changed

New Contributors