Release v0.2.36
FastChat v0.2.36
Breaking Changes
- Gradio server migrated to OpenAI v1 API client
Features
- SGLang worker for vision language models — lower latency, higher throughput
- Vision language WebUI with Gradio support
- OpenAI-compatible API server now accepts image input
- LightLLM worker integration for higher throughput
- Apple MLX worker with async support
- Yuan 2.0 model support
- OpenAI embedding support for topic clustering
- Training with custom templates (fixes tokenization mismatch)
- Google Colab Free Tier REST API enablement
Fixes
- Fixed ModelScope local path resolution
- Fixed BGE embedding pooling method
- Fixed vllm worker tokenizer configuration
- Fixed tokenization inconsistencies with Llama tokenizer
- Fixed
contenttype handling in requests - Removed duplicate API endpoints
Improvements
- Upgraded Gradio to 4.17
- Changed default tiktoken encoding to
cl100k_base - MLX worker updated to new
generate_stepfunction signature - WebUI copy button added
- Type hints corrected for
play_a_match_single
Highlights
- Added SGLang worker for vision language models, lower latency and higher throughput https://github.com/lm-sys/FastChat/pull/2928
- Vision langauge WebUI https://github.com/lm-sys/FastChat/pull/2960
- OpenAI-compatible API server now supports image input https://github.com/lm-sys/FastChat/pull/2928
- Added LightLLM worker for higher throughput https://github.com/lm-sys/FastChat/blob/main/docs/lightllm_integration.md
- Added Apple MLX worker https://github.com/lm-sys/FastChat/pull/2940
What's Changed
- fix specify local path issue use model from www.modelscope.cn by @liuyhwangyh in https://github.com/lm-sys/FastChat/pull/2934
- support openai embedding for topic clustering by @CodingWithTim in https://github.com/lm-sys/FastChat/pull/2729
- Remove duplicate API endpoint by @surak in https://github.com/lm-sys/FastChat/pull/2949
- Update Hermes Mixtral by @teknium1 in https://github.com/lm-sys/FastChat/pull/2938
- Enablement of REST API Usage within Google Colab Free Tier by @ggcr in https://github.com/lm-sys/FastChat/pull/2940
- Create a new worker implementation for Apple MLX by @aliasaria in https://github.com/lm-sys/FastChat/pull/2937
- feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System by @cauwulixuan in https://github.com/lm-sys/FastChat/pull/2936
- Fix the pooling method of BGE embedding model by @staoxiao in https://github.com/lm-sys/FastChat/pull/2926
- SGLang Worker by @BabyChouSr in https://github.com/lm-sys/FastChat/pull/2928
- Update mlx_worker to be async by @aliasaria in https://github.com/lm-sys/FastChat/pull/2958
- Integrate LightLLM into serve worker by @zeyugao in https://github.com/lm-sys/FastChat/pull/2888
- Copy button by @surak in https://github.com/lm-sys/FastChat/pull/2963
- feat: train with template by @congchan in https://github.com/lm-sys/FastChat/pull/2951
- fix content maybe a str by @zhouzaida in https://github.com/lm-sys/FastChat/pull/2968
- Adding download folder information in README by @dheeraj-326 in https://github.com/lm-sys/FastChat/pull/2972
- use cl100k_base as the default tiktoken encoding by @bjwswang in https://github.com/lm-sys/FastChat/pull/2974
- Update README.md by @merrymercy in https://github.com/lm-sys/FastChat/pull/2975
- Fix tokenizer for vllm worker by @Michaelvll in https://github.com/lm-sys/FastChat/pull/2984
- update yuan2.0 generation by @wangpengfei1013 in https://github.com/lm-sys/FastChat/pull/2989
- fix: tokenization mismatch when training with different templates by @congchan in https://github.com/lm-sys/FastChat/pull/2996
- fix: inconsistent tokenization by llama tokenizer by @congchan in https://github.com/lm-sys/FastChat/pull/3006
- Fix type hint for play_a_match_single by @MonkeyLeeT in https://github.com/lm-sys/FastChat/pull/3008
- code update by @infwinston in https://github.com/lm-sys/FastChat/pull/2997
- Update model_support.md by @infwinston in https://github.com/lm-sys/FastChat/pull/3016
- Update lightllm_integration.md by @eltociear in https://github.com/lm-sys/FastChat/pull/3014
- Upgrade gradio to 4.17 by @infwinston in https://github.com/lm-sys/FastChat/pull/3027
- Update MLX integration to use new generate_step function signature by @aliasaria in https://github.com/lm-sys/FastChat/pull/3021
- Update readme by @merrymercy in https://github.com/lm-sys/FastChat/pull/3028
- Update gradio version in
pyproject.tomland fix a bug by @merrymercy in https://github.com/lm-sys/FastChat/pull/3029 - Update gradio demo and API model providers by @merrymercy in https://github.com/lm-sys/FastChat/pull/3030
- Gradio Web Server for Multimodal Models by @BabyChouSr in https://github.com/lm-sys/FastChat/pull/2960
- Migrate the gradio server to openai v1 by @merrymercy in https://github.com/lm-sys/FastChat/pull/3032
- Update version to 0.2.36 by @merrymercy in https://github.com/lm-sys/FastChat/pull/3033
New Contributors
- @teknium1 made their first contribution in https://github.com/lm-sys/FastChat/pull/2938
- @ggcr made their first contribution in https://github.com/lm-sys/FastChat/pull/2940
- @aliasaria made their first contribution in https://github.com/lm-sys/FastChat/pull/2937
- @cauwulixuan made their first contribution in https://github.com/lm-sys/FastChat/pull/2936
- @staoxiao made their first contribution in https://github.com/lm-sys/FastChat/pull/2926
- @zhouzaida made their first contribution in https://github.com/lm-sys/FastChat/pull/2968
- @dheeraj-326 made their first contribution in https://github.com/lm-sys/FastChat/pull/2972
- @bjwswang made their first contribution in https://github.com/lm-sys/FastChat/pull/2974
- @MonkeyLeeT made their first contribution in https://github.com/lm-sys/FastChat/pull/3008
Full Changelog: https://github.com/lm-sys/FastChat/compare/v0.2.35...v0.2.36