We're excited to announce SGLang Model Gateway v0.2.4 โ a massive release focused on performance, security, and production-ready observability!
โจ Headline Features
โก Major Performance Optimizations
We've invested heavily in performance across the entire stack:
Optimized radix tree for cache-aware load balancing โ Smarter routing decisions with lower overhead
Tokenizer optimization โ Dramatically reduced CPU and memory footprint during tokenization
Core module optimization โ HTTP and gRPC routers now run leaner and faster
Efficient OTEL implementation โ Production-grade observability with minimal performance impact
๐ Industry-First WASM Middleware Support
Programmable middleware using WebAssembly! Extend your gateway with safe, isolated plugins. Build custom routing logic, transform requests/responses, or integrate proprietary systems โ all without touching core code. Your gateway, your rules.
๐ Production-Grade Observability
Full OpenTelemetry integration with distributed tracing for both HTTP and gRPC. Track requests across your entire inference stack with native trace context propagation. Finally, real visibility into your LLM infrastructure.
โก Built for speed. Hardened for security. Ready for production.
Gateway Changes (98 commits)
[model-gateway] release gateway 0.2.4 (#14763) by @slin1237 in https://github.com/sgl-project/sglang/pull/14763
[Perf] Optimize radix tree for cache-aware load balancin (#14758) by @slin1237 in https://github.com/sgl-project/sglang/pull/14758
[SMG] perf: optimize tokenizer for reduced CPU and memory overhead (#14752) by @slin1237 in https://github.com/sgl-project/sglang/pull/14752
[model-gateway] optimize core modules (#14751) by @slin1237 in https://github.com/sgl-project/sglang/pull/14751
Tiny extract select_worker_min_load (#14648) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14648
[ci][smg] fix docker release ci and add it to pr test (#14683) by @slin1237 in https://github.com/sgl-project/sglang/pull/14683
Tiny support sgl-router http response status code metrics (#14689) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14689
[SMG]feat: implement TokenGuardBody for managing token return (#14653) by @jimmy-evo in https://github.com/sgl-project/sglang/pull/14653
[model-gateway] add OTEL integration to grpc router (#14671) by @slin1237 in https://github.com/sgl-project/sglang/pull/14671
Fix cache-aware router should pick min load instead of min tenant size (#14650) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14650
[model-gateway] Optimize memory usage in HTTP router (#14667) by @slin1237 in https://github.com/sgl-project/sglang/pull/14667
[model-gateway] fix WASM arbitrary file read security vol (#14664) by @slin1237 in https://github.com/sgl-project/sglang/pull/14664
[model-gateway] reduce cpu overhead in grpc router (#14663) by @slin1237 in https://github.com/sgl-project/sglang/pull/14663
[model-gateway] reducing cpu overhead in various of places (#14658) by @slin1237 in https://github.com/sgl-project/sglang/pull/14658
Fix dp-aware incompatible with service-discovery (#14629) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14629
Super tiny fix unused code in router (#14618) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14618
[model-gateway] fix WASM unbounded request/response body read vuln (#14612) by @slin1237 in https://github.com/sgl-project/sglang/pull/14612
Super tiny remove unused select_worker_pair (#14609) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14609
[model-gateway] refactor otel to be more efficient (#14604) by @slin1237 in https://github.com/sgl-project/sglang/pull/14604
Tiny fix missing policy decision recording (#14605) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14605
[model-gateway] fix WASM memory limit per module (#14600) by @slin1237 in https://github.com/sgl-project/sglang/pull/14600
[model-gateway] reorganize metrics, logging, and otel to its own module (#14590) by @slin1237 in https://github.com/sgl-project/sglang/pull/14590
New Contributors
@tonyluj made their first contribution in https://github.com/sgl-project/sglang/commit/6bad6a365
@tom-jerr made their first contribution in https://github.com/sgl-project/sglang/commit/a95a38078
@RiversJin made their first contribution in https://github.com/sgl-project/sglang/commit/2a5773440
@jimmy-evo made their first contribution in https://github.com/sgl-project/sglang/commit/6f657070e
@dcampora made their first contribution in https://github.com/sgl-project/sglang/commit/842807843
@alisonshao made their first contribution in https://github.com/sgl-project/sglang/commit/cee93a6f2
Full Changelog: https://github.com/sgl-project/sglang/compare/gateway-v0.2.3...gateway-v0.2.4
[model-gateway] Fixed WASM Security Vulnerability - Execution Timeout (#14588) by @slin1237 in https://github.com/sgl-project/sglang/pull/14588
[model-gateway] extra accumulator and tool handler in oai router (#14587) by @slin1237 in https://github.com/sgl-project/sglang/pull/14587
[Bug fix] Add /model_info endpoint to mini_lb (#14535) by @alisonshao in https://github.com/sgl-project/sglang/pull/14535
[model-gateway][tracing]: implement request tracing using OpenTelemetry with trace context propagation (HTTP) (#13897) by @sufeng-buaa in https://github.com/sgl-project/sglang/pull/13897
[model-gateway] fix left over sgl-router names in wasm (#14514) by @slin1237 in https://github.com/sgl-project/sglang/pull/14514
[model-gateway] fix logs in smg workflow (#14513) by @slin1237 in https://github.com/sgl-project/sglang/pull/14513
[model-gateway] fix left over sgl-router names to sgl-model-gateway (#14512) by @slin1237 in https://github.com/sgl-project/sglang/pull/14512
[model-gateway] change sgl-router to sgl-model-gateway (#14312) by @slin1237 in https://github.com/sgl-project/sglang/pull/14312
[model-gateway] Make Tokenizer Builder Aware of Env Vars Like HF_ENDPOINT (#14405) by @xuwenyihust in https://github.com/sgl-project/sglang/pull/14405
Fix removing worker will make it healthy forever in prometheus metrics (#14420) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14420
[model-gateway] fix server info comment (#14508) by @slin1237 in https://github.com/sgl-project/sglang/pull/14508
[model-gateway] reorganized conversation handler (#14507) by @slin1237 in https://github.com/sgl-project/sglang/pull/14507
[model-gateway] Add WASM support for middleware (#12471) by @tonyluj in https://github.com/sgl-project/sglang/pull/12471
[model-gateway] move conversation to first class routing (#14506) by @slin1237 in https://github.com/sgl-project/sglang/pull/14506
[misc] add model arch and type to server info and use it for harmony (#14456) by @slin1237 in https://github.com/sgl-project/sglang/pull/14456
[model-gateway] grpc to leverage event type (#14450) by @slin1237 in https://github.com/sgl-project/sglang/pull/14450
[model-gateway] add mistral 3 image processor (#14445) by @slin1237 in https://github.com/sgl-project/sglang/pull/14445
[model-gateway] move all responses api event from oai to proto (#14446) by @slin1237 in https://github.com/sgl-project/sglang/pull/14446
[model-gateway] move oai header util to router header util (#14441) by @slin1237 in https://github.com/sgl-project/sglang/pull/14441
[model-gateway] extract conversation out of oai router (#14440) by @slin1237 in https://github.com/sgl-project/sglang/pull/14440
[model-gateway] add llama4 vision image processor (#14438) by @slin1237 in https://github.com/sgl-project/sglang/pull/14438
[model-gateway] introduce request ctx for oai router (#14434) by @slin1237 in https://github.com/sgl-project/sglang/pull/14434
[model-gateway] add phi4 vision image processor (#14430) by @slin1237 in https://github.com/sgl-project/sglang/pull/14430
Add Mistral Large 3 support. (#14213) by @dcampora in https://github.com/sgl-project/sglang/pull/14213
[model-gateway] introduce provider in openai router (#14394) by @slin1237 in https://github.com/sgl-project/sglang/pull/14394
[model-gateway] add phi3 vision image processor (#14381) by @slin1237 in https://github.com/sgl-project/sglang/pull/14381
[model-gateway][doc] Add STDIO Explicitly to Example in README (#14393) by @xuwenyihust in https://github.com/sgl-project/sglang/pull/14393
Fix sgl-router silently parse selector wrongly causing OME fail to discover pods (#14359) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14359
[model-gateway] add qwen3_vl model image processor (#14377) by @slin1237 in https://github.com/sgl-project/sglang/pull/14377
[model-gateway] use worker crate in openai router (#14330) by @slin1237 in https://github.com/sgl-project/sglang/pull/14330
[model-gateway] add qwen2.5_vl model image processor (#14375) by @slin1237 in https://github.com/sgl-project/sglang/pull/14375
[model-gateway] add qwen2_vl model image processor and tests (#14374) by @slin1237 in https://github.com/sgl-project/sglang/pull/14374
[model-gateway] add llava model image processor and tests (#14371) by @slin1237 in https://github.com/sgl-project/sglang/pull/14371
[model-gateway] add image processor and transformer structure (#14344) by @slin1237 in https://github.com/sgl-project/sglang/pull/14344
[model-gateway] multimodality initialization (#13350) by @slin1237 in https://github.com/sgl-project/sglang/pull/13350
[model-gateway] add workflow for external model providers (#14323) by @slin1237 in https://github.com/sgl-project/sglang/pull/14323
[model-gateway] change rust package name to sgl-model-gateway instead (#14283) by @slin1237 in https://github.com/sgl-project/sglang/pull/14283
[model-gateway] fix version output (#14276) by @slin1237 in https://github.com/sgl-project/sglang/pull/14276
[model-gateway] include smg version command in py binding (#14274) by @slin1237 in https://github.com/sgl-project/sglang/pull/14274
[model-gateway] add audio and moderation in model card (#14263) by @slin1237 in https://github.com/sgl-project/sglang/pull/14263
[model-gateway] Add e2e tests of streaming events and tool choice for response api (#13880) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/13880
[model-gateway] Migrate Worker trait to model-aware methods (#14250) by @slin1237 in https://github.com/sgl-project/sglang/pull/14250
[model-gateway] add ModelCard support to WorkerMetadata (#14243) by @slin1237 in https://github.com/sgl-project/sglang/pull/14243
[model-gateway] add ModelCard and ProviderType for model configuration (#14237) by @slin1237 in https://github.com/sgl-project/sglang/pull/14237
[model-gateway] add ModelType bitflags and Endpoint enum for worker (#14230) by @slin1237 in https://github.com/sgl-project/sglang/pull/14230
[model-gateway] fix v1/models response format to be oai compatible (#13693) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13693
[model-gateway] refactor oai router 1/n (#14228) by @slin1237 in https://github.com/sgl-project/sglang/pull/14228
[model-gateway] Avoid logging MCP connection token (#13887) by @xuwenyihust in https://github.com/sgl-project/sglang/pull/13887
[Minor] update docs (#14212) by @merrymercy in https://github.com/sgl-project/sglang/pull/14212
[model-gateway] support VL models in router (#14140) by @ooapex in https://github.com/sgl-project/sglang/pull/14140
Support numactl bind for CPU and memory before process starts (#14156) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/14156
[model-gateway] Add version command support to SMG (#12558) by @tonyluj in https://github.com/sgl-project/sglang/pull/12558
[model-gateway] allow refill rate to be zero (#14030) by @slin1237 in https://github.com/sgl-project/sglang/pull/14030
[model-gateway] Fix flaky test_circuit_breaker_half_open_failure_reopens (#14019) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/14019
[model-gateway][doc] Update transport terminology to protocol in README.md (#13872) by @xuwenyihust in https://github.com/sgl-project/sglang/pull/13872
[ci] allow manual label to trigger ci in rust, change ci order (#14016) by @slin1237 in https://github.com/sgl-project/sglang/pull/14016
[model gateway][grpc] Add tojson filter to override minijinja's tojson (#14013) by @CatherineSue in https://github.com/sgl-project/sglang/pull/14013
[model-gateway] fix xpu ci (#14012) by @slin1237 in https://github.com/sgl-project/sglang/pull/14012
[model-gateway] Add PostgreSQL support to binding (#13766) by @xuwenyihust in https://github.com/sgl-project/sglang/pull/13766
[Router bugfix] Fix router_manager selecting the wrong router when enable-igw. (#13572) by @SYChen123 in https://github.com/sgl-project/sglang/pull/13572
[model-gateway] Refactor router e2e responses tests (#13745) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/13745
[Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612) by @tom-jerr in https://github.com/sgl-project/sglang/pull/13612
[misc] Rename minilb install env & remove files & fix lint (#13831) by @hnyls2002 in https://github.com/sgl-project/sglang/pull/13831
[model-gateway] clean up router manager function order (#13776) by @slin1237 in https://github.com/sgl-project/sglang/pull/13776
Fix url: use https://roadmap.sglang.io for roadmap (#13733) by @merrymercy in https://github.com/sgl-project/sglang/pull/13733
[model-gateway] fix gateway cli arg parser to not use = (#13685) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13685
[model-gateway] add both python and rust cli alias (#13678) by @slin1237 in https://github.com/sgl-project/sglang/pull/13678
[router][grpc] Support num_reasoning_tokens in haromy models (#13047) by @CatherineSue in https://github.com/sgl-project/sglang/pull/13047
[model-gateway] use worker startup time out for worker registration (#13473) by @slin1237 in https://github.com/sgl-project/sglang/pull/13473
[model-gateway] Add Gateway Release Tooling (#13420) by @slin1237 in https://github.com/sgl-project/sglang/pull/13420
refactor: replace worker pool with semaphore-based concurrency in jobqueue (#13383) by @RiversJin in https://github.com/sgl-project/sglang/pull/13383
[router] bindings for go (#13384) by @whybeyoung in https://github.com/sgl-project/sglang/pull/13384
[model-gateway] fix SDist step readme path (#13373) by @slin1237 in https://github.com/sgl-project/sglang/pull/13373
[model-gateway] remove grpc feature flag and mark as default (#13330) by @slin1237 in https://github.com/sgl-project/sglang/pull/13330
[router] Fix flaky router e2e tests (#13306) by @XinyueZhang369 in https://github.com/sgl-project/sglang/pull/13306
[model-gateway] move python to binding folder (#13295) by @slin1237 in https://github.com/sgl-project/sglang/pull/13295