v0.6.0

Dockerfiles (CUDA, CPU): https://github.com/EricLBuehler/mistral.rs/pkgs/container/mistral.rs
PyPi packages (no features, cuda, mkl, metal, accelerate)

🔥 Highlights from v0.6.0

🚀 Major Features

Llama 4 support and Qwen 3 / MoE / VL models, including DeepSeek and DeepCoder integrations
Multimodal prefix caching, paged attention scheduler improvements, and faster Metal/CUDA backends
Web chat app with chat history, file uploads, speech generation, and revamped tool-calling/search
Fast sampler and CPU FlashAttention with improved performance and accuracy
Metal and CUDA: major improvements in quantization (AFQ, ISQ), UQFF handling, and memory optimizations
MCP (Model Context Protocol): new server endpoints, docs, and integrated client
Vision and audio expansion: support for SIGLIP, Dia 1.6b TTS, conformer backbone (Phi-4MM), auto loaders, and vision tool prefixes

🧠 Inference Optimizations

Lightning-fast AFQ on CPU, optimized Qwen 3 MoE on Metal, and paged attention fixes
Unified FlashAttention backend and automatic method selection for ISQ
Metal precompilation support and reduced autorelease thrashing

🧰 Dev Improvements

Refactored engine architecture, KV cache, attention backends, and device mapping logic
Centralized dependency management and cleaner internal abstractions
Streamlined and faster LoRA support

🎉 Other

Revamped README, AGENTS.md, and new benchmarking scripts
Interactive mode now shows throughput, supports Gumbel sampling, and better runtime sampling controls
Expanded quant and GGUF support: AWQ, Qwen3 GGUF, and prequantized MLX compatibility

⸻

What's Changed

Fix handling of Metal fused attn head dims by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1234
Support paged attn for vision model rust api by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1235
[Breaking] Support setting HF cache path by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1237
Support tool calling for DeepSeek models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1239
Server image processing refactor by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1244
Optimized CUDA RoPE kernels by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1247
Typo fix (add_speial_tokens to add_special_tokens) by @edwko in https://github.com/EricLBuehler/mistral.rs/pull/1246
Fixes for UQFF + distributed layers by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1250
Automatic agentic search integration (web_search_options) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1243
Format kernels by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1251
Add quantize guards for UQFF deserialize by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1252
Refactor cuBLASlt-related code by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1253
Update deps, bump pyo3 version by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1259
Faster cuda FP8 performance by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1257
Rust 1.86 clippy by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1260
Refactor engine arch by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1262
Revamped LoRA support - removing the Ordering system by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1263
Fast Metal-specific quantization method: AFQ by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1264
Support prequantized models from MLX by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1265
Automatic ISQ to select fastest & most accurate method by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1266
Improved usage metrics by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1267
Bump tokio from 1.44.1 to 1.44.2 by @dependabot in https://github.com/EricLBuehler/mistral.rs/pull/1270
Gather MM ops in mistralrs-quant by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1272
Improve performance of deepseek models by @guoqingbao in https://github.com/EricLBuehler/mistral.rs/pull/1274
Implement Llama 4 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1268

New Contributors

@edwko made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1246
@beeender made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1306
@Slowki made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1314
@omahs made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1329
@szepeviktor made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1331
@matthewhaynesonline made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1353
@sempervictus made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/1419

Full Changelog: https://github.com/EricLBuehler/mistral.rs/compare/v0.5.0...v0.6.0

mistral.rs

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

haze

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

haze

🔥 Highlights from v0.6.0

What's Changed

New Contributors