Unclaimed project
Are you a maintainer of tpu-mlir? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
Back to changelogNew
v1.26
🚀 New Features
Model Support
• Added support for Qwen3VL (now using static ViT)
• Qwen3VL now supports multi-ViT processing
• Enabled Janus model compatibility
• Improved MiniCPM4 model handling
• Enhanced VLM dynamic compilation for ViT models
• Added support for bm1690e platform
• Introduced Lightstereo model support
• Enhanced LoRA support with refined paths
• Added contiguous_halves rope mode support
• Enabled multi-prefill input length support
Core Optimizations
• Implemented ppl-based codegen for bm1690 ops
• Added HdimIsBatch mm pattern cases
• Enhanced auto-round support for LLM models
• Improved AddPostprocess with additional information
• Added same_addr parameter in model_deploy
• CV184x now supports LSTM BF16 operations
• Introduced address mode 'in_reuse' for memory optimization
• Enabled 4K alignment for weights and binaries in bmodel files
• Added out_fixed mode in bmodel_checker
• Model_tool can now refresh bmodel to 4K aligned
• Enhanced layer group method optimization
• Added dynamic group quantization for bm1684x tpu.mlir
• Improved custom ops param support
• Added CUDA inference support
• Added debugger=5 option for enhanced debugging
• Enhanced MLIR inference with CUDA capability
• Weights are now 4K aligned in bmodel files
• All binaries are 4K aligned in bmodel files
• LoRA bmodel uses io_alone mode
• Refactored lora support for better performance
• Added sample_head dynamic usage optimization
• Unified gen_mlirs and compiles in LLM for consistency
• LLM now supports compressed-tensors mode
⚡ Performance Improvements
• Optimized FAttention with keep_dims attribute
• Enhanced Qwen3VL with static ViT implementation
• Improved dynamic codegen for UpSample and other ops
• Refined shape pattern search for qtable optimization
• Strengthened Rope operation logic
• Improved IGEV pass handling
• Enhanced profile test regression capabilities
• Optimized layer group method for better performance
• Added more operations to GROUP_SMALL_C
• Fixed BM1688 GELU operation (now uses F32)
• Addressed CV184x GELU BF16 lowering
• Resolved BM1684 conv failure issues
• Fixed Mars3 global depthwise deconv bug
• Corrected CV184x interp input coord (now uses FP32/UINT32)
• Fixed TPUReshapeReorderPattern bug
• Fixed BM1684X-8119 issue
• Fixed BM1690e resnet50 e5m2 comparison failure
• Fixed CV186AHDEV-519 reference inference overflow with F16
• Addressed BM1690e backend issues
• Fixed CV184x LSTM int8 bug
• Fixed depthwise_asym_quant_data_split bug
• Resolved RCNN deploy failure issues
• Fixed MMDiT cmodel vs board result comparison failure
• Fixed fattention head slice error
• Addressed multi-deform_attn for broader cases
• Resolved ppl cv184x integration issues
• Fixed cv184x codegen issues
• Corrected bias_correction conv bug
• Fixed ViT models deploy failure in w4a16/w8a16 mode
• Addressed tensor mul vector of matmul pass issues
• Fixed time_fixed_subnet errors
• Corrected postprocess shape issues
• Fixed io address assignment in io_alone mode
• Addressed MLIR-742 by adding parameter to force quant input/output int8/uint8
• Added PixelNorm3 test case for model validation
• Fixed io placement in io_alone mode
• Addressed memory leaks in repeated invoke calls
• Resolved MLIR-693 qtable optimization
• Fixed final qtable bugs
• Corrected tune_num=0 issue in search qtable
• Addressed tpulang no_save mode logging issues
• Fixed inplaceOp liveRange issues
• Resolved dynamic local codegen for ReduceOp
• Fixed afterlayergroup pass no_save errors
• Handled all-zero tensor cases
• Fixed inf match in op name issues
• Added chip support for data types
• Enhanced MLIR-660 support
• Updated backend libraries
🛠️ Infrastructure Updates
• Added test cases for bm1690e ops and models
• Improved manual documentation
• Enhanced backend support for bm1684x/bm1688
• Removed deprecated 8ch interleave code
• Added shape parameter in set_tensor
• Added profile test regression capabilities
• Removed unused sophgo-mq version
• Enhanced bmodel_checker with out_fixed mode
• Added debugger=5 option
• Updated runtime libraries
• Removed sensitive words from codebase
• Added backend updates for compatibility
• Updated release records
• Added quant.drawio documentation
• Improved manual updates for calibration
• Fixed various documentation issues
• Added manual updates for new features
• Added PixelNorm3 test case for model validation
• Enhanced profile test regression
• Improved model validation capabilities
• Updated test infrastructure