Highlights
- New CLI:
mistralrs-cli - Prefix Caching: We have implemented Prefix Caching for PagedAttention (#1750). This significantly accelerates multi-turn conversations and RAG workflows by reusing KV cache for shared prompt prefixes.
- Major model expanstion: Support for the Embedding Gemma, Qwen 3 Embedding, Gemma 3n, GLM-4, Granite Hybrid MoE, GLM-4 MoE, GLM-4 MoE Lite...