Release Gateway-v0.2.3
๐ SGLang Model Gateway - New Release!
We're excited to announce another powerful update to SGLang Model Gateway with performance improvements and expanded database support!
โจ Headline Features
โก Bucket Mode Routing - 20-30% Performance Boost Introducing our new bucket-based routing algorithm that dramatically improves performance in PD mode. See up to 20-30% improvements in TTFT (Time To First Token) and overall throughput
๐พ PostgreSQL Support for Chat History Management Flexibility in data storage! We now support PostgreSQL alongside OracleDB and in-memory storage for chat history management.
๐ ๏ธ Enhanced Model Tool & Structured Output Support
- MinMax M2 model support!
- Structured model output for OpenAI and gRPC router
- Streaming parsing with Tool Choice in chat completions API
- Tool_choice support for Responses API
- OutputItemDone events with output item array storage for better observability
๐ Stability & Quality Improvements
Multiple bug fixes for model validation, streaming logic, reasoning content indexing, and CI stability enhancements.
๐ง Code Quality Enhancements
Refactored builders for chat and responses, restructured modules for better maintainability, and consolidated error handling.
Try the latest version: pip install sglang-router --upgrade
What's Changed in Gateway
Gateway Changes (45 commits)
- [model-gateway] smg release 0.2.3 (#13312) by @slin1237 in https://github.com/sgl-project/sglang/pull/13312
- [router]Replace requests lib with openai in e2e_response_api (#13293) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/13293
- fix outdated router doc (#13255) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/13255
- [router][grpc] Refine docs in minimax_m2 to match other parsers (#13218) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13218
- fix: display served_model_name in /v1/models (#13155) by @Sunhaihua1 in https://github.com/sgl-project/sglang/pull/13155
- [router] minmax-m2 xml tool parser (#13148) by @slin1237 in https://github.com/sgl-project/sglang/pull/13148
- [router] remove worker url requirement (#13172) by @slin1237 in https://github.com/sgl-project/sglang/pull/13172
- [router] Fix Flaky test_circuit_breaker_opens_and_recovers (#13164) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/13164
- [router] Add comprehensive validation to Responses API (#13127) by @key4ng in https://github.com/sgl-project/sglang/pull/13127
- bugfix: multi-model routing for /generate api (#12979) by @SYChen123 in https://github.com/sgl-project/sglang/pull/12979
- [router][grpc] Support vllm backend for grpc router (#13120) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13120
- [router] add minmax m2 reasoning parser (#13137) by @slin1237 in https://github.com/sgl-project/sglang/pull/13137
- [router] Support complex assistant and tool messages in /chat/completions (#12860) by @hellodanylo in https://github.com/sgl-project/sglang/pull/12860
- [router] move radix tree to policy crate and addreses some code styles (#13131) by @slin1237 in https://github.com/sgl-project/sglang/pull/13131
- [Router] use call_id instead of id for matching function calls in Responses API for Harmony (#13056) by @zhaowenzi in https://github.com/sgl-project/sglang/pull/13056
- Revert "fix: display served_model_name in /v1/models" (#13093) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13093
- fix: display served_model_name in /v1/models (#13063) by @Sunhaihua1 in https://github.com/sgl-project/sglang/pull/13063
- [router] add postgres databases data connector (#12218) by @lengrongfu in https://github.com/sgl-project/sglang/pull/12218
- [router][ci] Quick Improvement to make CI more stable (#12869) by @key4ng in https://github.com/sgl-project/sglang/pull/12869
- [router][ci] Fix maturin build (#13012) by @key4ng in https://github.com/sgl-project/sglang/pull/13012
- [router] bucket policy (#11719) by @syy-hw in https://github.com/sgl-project/sglang/pull/11719
- [router] Switch MCP tests from DeepWiki to self-hosted Brave search server (#12849) by @key4ng in https://github.com/sgl-project/sglang/pull/12849
- [router][grpc] Move all error logs to their call sites (#12859) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12859
- [router][grpc] Refactor: Add builders for chat and responses (#12852) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12852
- [router] Support structured model output for openai and grpc router (#12431) by @key4ng in https://github.com/sgl-project/sglang/pull/12431
- [router][grpc] Add more mcp test cases to responses api (#12749) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12749
- fix ci (#12760) by @key4ng in https://github.com/sgl-project/sglang/pull/12760
- Add timing metrics for requests (#12646) by @cicirori in https://github.com/sgl-project/sglang/pull/12646
- [router][ci] Disable cache (#12752) by @key4ng in https://github.com/sgl-project/sglang/pull/12752
- [router][grpc] Support mixin tool calls in Responses API (#12736) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12736
- Revert "[router] web_search_preview tool basic implementation" (#12716) by @key4ng in https://github.com/sgl-project/sglang/pull/12716
- [router] add basic ci tests for gpt-oss model support (#12651) by @key4ng in https://github.com/sgl-project/sglang/pull/12651
- [router][quick fix] Add minimal option for reasoning effort in spec (#12711) by @key4ng in https://github.com/sgl-project/sglang/pull/12711
- [router][grpc] Make harmony parser checks recipient first before channel (#12713) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12713
- [router][ci] speed up python binding to 1.5 min (#12673) by @key4ng in https://github.com/sgl-project/sglang/pull/12673
- [router] fix: validate HTTP status codes in health check (#12631) by @wyx-0203 in https://github.com/sgl-project/sglang/pull/12631
- [router][grpc] Support streaming parsing with Tool Choice in chat completions API (#12677) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12677
- [router][grpc] Implement tool_choice support for Responses API (#12668) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12668
- [router][grpc] Emit OutputItemDone event and store output item array (#12656) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12656
- [router][grpc] Fix index issues in reasoning content and missing streaming events (#12650) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12650
- [router][grpc] Fix model validation, tool call check, streaming logic and misc in responses (#12616) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12616
- Support aggregating engine metrics in sgl-router (#11456) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/11456
- [router][grpc] Restructure modules and code clean up (#12598) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12598
- [router][grpc] Consolidate error messages build in error.rs (#12301) by @CatherineSue in https://github.com/sgl-project/sglang/pull/12301
- [ci] install released version router (#12410) by @key4ng in https://github.com/sgl-project/sglang/pull/12410
New Contributors
- @XinyueZhang369 made their first contribution in https://github.com/sgl-project/sglang/commit/2cdde3d46
- @Sunhaihua1 made their first contribution in https://github.com/sgl-project/sglang/commit/a06c44f90
- @zhaowenzi made their first contribution in https://github.com/sgl-project/sglang/commit/7b877ab83
- @cicirori made their first contribution in https://github.com/sgl-project/sglang/commit/58095cb00
- @wyx-0203 made their first contribution in https://github.com/sgl-project/sglang/commit/3651cfbf6
- @syy-hw made their first contribution in https://github.com/sgl-project/sglang/commit/611a4fd08
- @SYChen123 made their first contribution in https://github.com/sgl-project/sglang/commit/4ef439054
- @hellodanylo made their first contribution in https://github.com/sgl-project/sglang/commit/d28caaf60
Paths Included
sgl-routerpython/sglang/srt/grpcpython/sglang/srt/entrypoints/grpc_server.py
Full Changelog: https://github.com/sgl-project/sglang/compare/gateway-v0.2.2...gateway-v0.2.3