Unclaimed project
Are you a maintainer of mistral.rs? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
mistral.rs
Fast, flexible LLM inference
Back to changelogNew
v0.4.0
New features
- 🔥 New models!
- DeepSeek V2
- DeepSeek V3 and R1
- MiniCpm-O 2.6
- 🧮 Imatrix quantization
- ⚙️ Automatic device mapping
- BNB quantization
- Support blockwise FP8 dequantization and FP8 on Metal
- Integrate the llguidance library (@mmoskal)
- Metal PagedAttention
- Many fixes and improvements from contributors!
Breaking changes
- The Rust device mapping API has changed.
MSRV
The MSRV of this release is 1.83.0.
What's Changed
- Use CUDA_COMPUTE_CAP if nvidia-smi not found by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/944
- fix(docs): fix broken link by @sammcj in https://github.com/EricLBuehler/mistral.rs/pull/945
- Better diffusion interactive mode by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/948
- Implement Imatrix for ISQ by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/949
- Support imatrix quantization for vision models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/950
Perplexity calculations with imatrix by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/952set minimum rustc version to 1.82 by @mmoskal in https://github.com/EricLBuehler/mistral.rs/pull/957Fix append_sliding_window by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/958Fix completion api behavior of best_of by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/959Ensure support for cuda cc 5.3 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/960Improve test speeds on Windows by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/961use llguidance library for constraints (including json schemas) by @mmoskal in https://github.com/EricLBuehler/mistral.rs/pull/899Fix metal fp8 quantization by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/962Fix example gguf_locally to match chat template requirements by @msk in https://github.com/EricLBuehler/mistral.rs/pull/966Bitsandbytes quantization: loading and kernels by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/967updated the tokenizers dependency of core to 0.21 by @vkomenda in https://github.com/EricLBuehler/mistral.rs/pull/975Remove outdated binaries mention in the readme by @BafS in https://github.com/EricLBuehler/mistral.rs/pull/973Improve error handling by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/974Add None check to prevent panic in evict_all_to_cpu in prefix_cacher.rs by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/979Include start offset for metal bitwise ops by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/978Fail fast on TcpListener bind errors by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/982Inplace softmax long-seqlen attention optimizations by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/984Fix cuda cublaslt when using vllama mask by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/985Add cross attn quantization for mllama by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/987fix mistralrs-server ignoring interactive_mode arg by @haricot in https://github.com/EricLBuehler/mistral.rs/pull/990Adding streaming function to mistralrs server. by @Narsil in https://github.com/EricLBuehler/mistral.rs/pull/986Fixes for bnb and more apis in mistralrs-quant by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/972Support send + sync in loader by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/991More vllama optimizations by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/992Update docs by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/993Use metal autorelease to optimize memory usage by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/996Partial Fix for Sliding Window Attention by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/994Only dep on objc when building on metal by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/998Prefix cacher v2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1000Add --cpu flag to mistralrs-server by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/997Metal PagedAttention support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1001Fix cross attention + prefix cacher v2 support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1006Support for normal cache for mllama, phi3v, qwen2vl by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1007Cleaner creation of dummy pa input metadata by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1014Support BF16 kvcache, rope and attentions for inference of GGUF/GGML models by @guoqingbao in https://github.com/EricLBuehler/mistral.rs/pull/1009Support device mapping for Paged Attention by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/1011Prefix cacher fixes by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1018More fixes for the prefix cacher by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1019Support uqff for idefics3 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1020Prepare for v0.3.5 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1021Cleaner pipeline no prefix cache setting by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1022Support uqff load/save for idefics3 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1023Update license for 2025 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1024Implement DeepSeekV2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1010Use cudarc fork to fix CUDA build on Windows by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1032Fix metal paged attn phi3 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1033Use float8 mistralrs_cudarc_fork feature by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1034Patch prefix caching to fix incorrect outputs by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1035Allocate paged attn cache as empty instead of zeros by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1036Remove ug and cudarc transient dep by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1037Rename MemoryGpuConfig::Amount->MbAmount by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1038CUDA dequant kernels conditional compilation by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1039F16 support for mllama, introduce FloatInfo by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1041Automatic device mapping support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1042Support automatic device mapping for gguf models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1044Support loading models without ISQ using device map by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1045Fix GGUF auto device mapping by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1047More efficient loading of safetensors when casting by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1048Fix Loading and Running on CPU by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/1052Work on better device mapping for mllama by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1049Mention interactive mode or server port in readme for gguf by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1055Fix panic in mistralrs-server by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/981Include device memory avail in device map err by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1060Fix --cpu on cuda by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/1056Improve pagedattn support in mistralrs bench by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1063Paged attention support for multi gpu by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1059Ergonomic automatic device mapping support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1054Examples for automatic device mapping by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1065Fix metal pagedattn half8 vec impl by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1067Improve support for GGUF auto device map by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1069Fix missing field in idefics3 during loading by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1070Fix missing field in idefics3 during loading by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1072Fix paged attention for vision models on multiple devices by @cdoko in https://github.com/EricLBuehler/mistral.rs/pull/1071Fixes for idefics3 and idefics2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1073Improve automatic device map by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1076Implement the DeepSeekV3 model (support full DeepSeek R1) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1077Don't print GGUF model metadata when silent=true by @Jeadie in https://github.com/EricLBuehler/mistral.rs/pull/1079Allow ChatCompletionChunkResponse (and therefore streaming) to have Usage. by @Jeadie in https://github.com/EricLBuehler/mistral.rs/pull/1078Support loading blockwise quantized fp8 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1080Implement MiniCpm-O 2.6 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1074Bump version to v0.4.0 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/1081New Contributors
- @sammcj made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/945
- @mmoskal made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/957
- @vkomenda made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/975
- @BafS made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/973
- @cdoko made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/974
- @Narsil made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/986
Full Changelog: https://github.com/EricLBuehler/mistral.rs/compare/v0.3.4...v0.4.0