Unclaimed project
Are you a maintainer of mistral.rs? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
mistral.rs
Fast, flexible LLM inference
v0.3.4 - mistral.rs Release Notes | AnnounceHQBack to changelogNew
v0.3.4
New features
- Qwen2-VL support
- Idefics 3/SmolVLM support
- ️🔥 6x prompt performance boost (all benchmarks faster than or comparable to MLX, llama.cpp)!
- 🗂️ More efficient non-PagedAttention KV cache implementation!
- Public tokenization API
Python wheels
The wheels now include support for Windows, Linux, and Mac with x84_64 and aarch64.
MSRV
1.79.0
What's Changed
- Update Dockerfile by @Reckon-11 in https://github.com/EricLBuehler/mistral.rs/pull/895
- Add the Qwen2-VL model by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/894
- ISQ for mistralrs-bench by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/902
- Use tokenizers v0.20 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/904
- Fix metal sdpa for v stride by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/905
- Better parsing of the image path by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/906
- Add some Metal kernels for HQQ dequant by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/907
Handle assistant messages with 'tool_calls' by @Jeadie in https://github.com/EricLBuehler/mistral.rs/pull/824Attention-fused softmax for Metal by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/908Metal qmatmul mat-mat product (5.4x performance increase) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/909Support --dtype in mistralrs bench by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/911Metal: Use mtl resource shared to avoid one copy by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/914Preallocated KV cache by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/916Fixes for kv cache grow by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/917Dont always compile with fp8, bf16 for cuda by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/920Expand attnmask on cuda by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/923Faster CUDA prompt speeds by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/925Paged Attention alibi support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/926Default to SDPA for faster VLlama PP T/s by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/927VLlama vision model ISQ support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/928Support fp8 on Metal by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/930Bump rustls from 0.23.15 to 0.23.18 by @dependabot in https://github.com/EricLBuehler/mistral.rs/pull/932Calculate perplexity of ISQ models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/931Integrate fast MLX kernel for SDPA with long seqlen by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/933Always cast image to rgb8 for qwenvl2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/936Fix etag missing in hf hub by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/934Fix some examples for vllama 3.2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/937Improve memory efficency of vllama by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/938Implement the Idefics 3 models (Idefics 3, SmolVLM-Instruct) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/939Expose a public tokenization API by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/940Prepare for v0.3.4 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/942New Contributors
- @Reckon-11 made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/895
Full Changelog: https://github.com/EricLBuehler/mistral.rs/compare/v0.3.2...v0.3.4