v0.3.4

## New features - Qwen2-VL support - Idefics 3/SmolVLM support - ️🔥 6x prompt performance boost (all benchmarks faster than or comparable to MLX, llama.cpp)! - 🗂️ More efficient non-PagedAttention KV cache implementation! - Public tokenization API ## Python wheels The wheels now include support for Windows, Linux, and Mac with x84_64 and aarch64. ## MSRV 1.79.0 ## What's Changed * Update Dockerfile by @Reckon-11 in https://github.com/EricLBuehler/mistral.rs/pull/895 * Add the Qwen2-VL model by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/894 * ISQ for mistralrs-bench by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/902 * Use tokenizers v0.20 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/904 * Fix metal sdpa for v stride by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/905 * Better parsing of the image path by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/906 * Add some Metal kernels for HQQ dequant by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/907 * Handle assistant messages with 'tool_calls' by @Jeadie in https://github.com/EricLBuehler/mistral.rs/pull/824 * Attention-fused softmax for Metal by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/908 * Metal qmatmul mat-mat product (5.4x performance increase) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/909 * Support --dtype in mistralrs bench by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/911 * Metal: Use mtl resource shared to avoid one copy by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/914 * Preallocated KV cache by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/916 * Fixes for kv cache grow by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/917 * Dont always compile with fp8, bf16 for cuda by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/920 * Expand attnmask on cuda by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/923 * Faster CUDA prompt speeds by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/925 * Paged Attention alibi support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/926 * Default to SDPA for faster VLlama PP T/s by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/927 * VLlama vision model ISQ support by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/928 * Support fp8 on Metal by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/930 * Bump rustls from 0.23.15 to 0.23.18 by @dependabot in https://github.com/EricLBuehler/mistral.rs/pull/932 * Calculate perplexity of ISQ models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/931 * Integrate fast MLX kernel for SDPA with long seqlen by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/933 * Always cast image to rgb8 for qwenvl2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/936 * Fix etag missing in hf hub by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/934 * Fix some examples for vllama 3.2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/937 * Improve memory efficency of vllama by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/938 * Implement the Idefics 3 models (Idefics 3, SmolVLM-Instruct) by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/939 * Expose a public tokenization API by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/940 * Prepare for v0.3.4 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/942 ## New Contributors * @Reckon-11 made their first contribution in https://github.com/EricLBuehler/mistral.rs/pull/895 **Full Changelog**: https://github.com/EricLBuehler/mistral.rs/compare/v0.3.2...v0.3.4

mistral.rs

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

JPProject.IdentityServer4.SSO

New features

Python wheels

MSRV

What's Changed

New Contributors

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

JPProject.IdentityServer4.SSO