Unclaimed project
Are you a maintainer of MNN ? Claim this project to take control of your public changelog and roadmap.
Claim this project Changelog
MNN MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/README.md). MNN TaoAvatar Android - Local 3D Avatar Intelligence: apps/Android/Mnn3dAvatar/README.md
© 2026 AnnounceHQ. All rights reserved.
3.2.5 - MNN Release Notes | AnnounceHQ
Back to changelogMNN 3.2.5 Release Note
核心功能更新
1. 新增HQQ量化算法支持
在MNNConvert工具中集成HQQ量化算法,可通过--hqq参数启用
HQQ量化支持非对称量化,能显著提升量化模型的精度
支持与分块量化结合使用,进一步优化模型精度
2. 支持EAGLE-3推测解码算法
新增EAGLE-3推测解码算法实现,提升大语言模型推理效率
实现了EagleGeneration类,支持基于草稿模型的推测解码
提供了Eagle模型导出工具,支持导出eagle、eagle_fc和eagle_d2t三个组件
3. Qwen系列模型增强支持
修复并优化Qwen3-Embedding模型的推理问题
新增对Qwen3-VL多模态大模型的支持
完善了llmexport工具对Qwen系列模型的导出支持
详细变更内容
模型推理优化
重构了LLM模型加载逻辑,在Llm::load()方法中增加了更完善的错误处理
优化了KV Cache管理器的实现,提升了推理过程中的内存管理效率
改进了Metal后端的注意力机制实现
优化了OpenCL后端的卷积执行效率
量化工具改进
在WeightQuantAndCoding.cpp中集成了HQQ量化器,支持更精确的权重量化
优化了量化参数配置逻辑,当启用HQQ时自动设置非对称量化
修复了量化过程中的一些bug,提升了量化稳定性
模型导出增强
完善了llmexport工具的错误处理和日志输出
优化了模型导出流程,提升了导出稳定性
修订了压缩工具相关文档,增加了HQQ量化使用说明
Core Feature Updates
1. Added Support for HQQ Quantization Algorithm
Integrated HQQ quantization algorithm into MNNConvert tool, which can be enabled via the --hqq parameter
HQQ quantization supports asymmetric quantization, significantly improving the accuracy of quantized models
Supports combination with block-wise quantization to further optimize model accuracy
2. Added Support for EAGLE-3 Speculative Decoding Algorithm
Implemented EAGLE-3 speculative decoding algorithm to improve large language model inference efficiency
Implemented EagleGeneration class to support draft model-based speculative decoding
Provided Eagle model export tools supporting export of three components: eagle, eagle_fc, and eagle_d2t
3. Enhanced Support for Qwen Series Models
Fixed and optimized inference issues with Qwen3-Embedding model
Added support for Qwen3-VL multimodal large model
Improved llmexport tool's export support for Qwen series models
Detailed Changes
Model Inference Optimization
Refactored LLM model loading logic with enhanced error handling in the Llm::load() method
Optimized KV Cache manager implementation to improve memory management efficiency during inference
Improved attention mechanism implementation in Metal backend
Optimized convolution execution efficiency in OpenCL backend
Quantization Tool Improvements
Integrated HQQ quantizer in WeightQuantAndCoding.cpp for more precise weight quantization
Optimized quantization parameter configuration logic to automatically set asymmetric quantization when HQQ is enabled
Fixed bugs in the quantization process, improving quantization stability
Model Export Enhancements
Improved error handling and log output in llmexport tool
Optimized model export process to improve export stability
Revised compression tool documentation with added HQQ quantization usage instructions