onnxruntime

## 📢 Announcements & Breaking Changes ### Platform Support Changes - **Python 3.10 wheels are no longer published** — Please upgrade to Python 3.11+ - **Python 3.14 support added** - **Free-threaded Python (PEP 703)** — Added support for Python 3.13t and 3.14t in Linux ([#26786](https://github.com/microsoft/onnxruntime/pull/26786)) - **x86_64 binaries for macOS/iOS are no longer provided and minimum macOS is raised to 14.0** ### API Version - **ORT_API_VERSION** updated to **24** ([#26418](https://github.com/microsoft/onnxruntime/pull/26418)) --- ## ✨ New Features ### 🤖 Execution Provider (EP) Plugin API A major infrastructure enhancement enabling plugin-based EPs with dynamic loading: - Initial kernel-based EP support ([#26206](https://github.com/microsoft/onnxruntime/pull/26206)) - Weight pre-packing support for plugin EPs ([#26754](https://github.com/microsoft/onnxruntime/pull/26754)) - EP Context model support ([#25124](https://github.com/microsoft/onnxruntime/pull/25124)) - Control flow kernel APIs ([#26927](https://github.com/microsoft/onnxruntime/pull/26927)) - `OrtKernelInfo` APIs for kernel-based plugin EPs ([#26803](https://github.com/microsoft/onnxruntime/pull/26803)) ### 🔧 Core APIs - **`OrtApi::CreateEnvWithOptions()`** and **`OrtEpApi::GetEnvConfigEntries()`** ([#26971](https://github.com/microsoft/onnxruntime/pull/26971)) - **EP Device Compatibility APIs** ([#26922](https://github.com/microsoft/onnxruntime/pull/26922)) - **External Resource Importer API** for D3D12 shared resources ([#26828](https://github.com/microsoft/onnxruntime/pull/26828)) - Session config access from `KernelInfo` ([#26589](https://github.com/microsoft/onnxruntime/pull/26589)) ### 📊 Dependencies & Integration - **ONNX upgraded to 1.20.1** ([#26579](https://github.com/microsoft/onnxruntime/pull/26579)) - **Protobuf updated** from 3.20.3 → **4.25.8** ([#26910](https://github.com/microsoft/onnxruntime/pull/26910)) - **CUDA Graph enabled by default** ([#26929](https://github.com/microsoft/onnxruntime/pull/26929)) --- ## 🖥️ Execution Provider Updates ### NVIDIA - **CUDA EP:** Flash Attention updates, GQA kernel fusion, BF16 support for MoE/qMoE/MatMulNBits, CUDA 13.0 support - **TensorRT EP:** Upgraded to TensorRT 10.14, automatic plugin loading, NVFP4 custom ops - **TensorRT RTX EP:** RTX runtime caching, CUDA graph support, BFloat16, memory-mapped engines ### Qualcomm QNN EP - QNN SDK upgraded to **2.42.0** with new ops (RMSNorm, ScatterElements, GatherND, STFT, RandomUniformLike) - Gelu pattern fusion, LPBQ quantization support, ARM64 wheel builds, v81 device support ### Intel & AMD - **OpenVINO EP:** Upgraded to 2025.4.1 - **VitisAI EP:** External EP loader, compiled model compatibility API - **MIGraphX EP:** QuickGelu, multihead attention, QLinear pooling ops ### ArmNN EP Arm is formally deprecating the Arm NN Execution Provider (EP) in ONNX Runtime. The Arm NN EP is still experimental and depends on technology that is no longer actively maintained. Keeping it available now only adds complexity and potential confusion for users. What to expect: - Effective immediately, the Arm NN EP is deprecated and will no longer be maintained - All build options, documentation, and examples referencing ArmNN will be removed once the upstream change merges; the removal will appear in the first ONNX Runtime release that includes that change. We will confirm the release number as soon as it is known - Builds that still rely on Arm NN-specific options (for example --use_armnn) will fail after the change lands, so please adjust configurations in advance --- ## 🌐 Web & JavaScript - **WebGPU EP:** Flash Attention optimizations, graph capture, Split-K MatMul, qMoE support, WGSL templates - **WebNN EP:** GQA local attention, GatherBlockQuantized, ConvInteger/MatMulInteger - **Node.js/React Native:** Node.js v22, JSI for React Native, JSPI build support --- ## 🧠 CPU Improvements - **KleidiAI:** SME1/SME2 Convolution and SGemm kernels, FP32 Gemv, Windows/Arm support - **New ops:** MoE/qMoE kernels, RotaryEmbeddings opset 23, LayerNorm/RMSNorm broadcasting - **Platform support:** S390x SIMD, LoongArch64 4-bit quantization, FP16 inference improvements - **ARM NCHWc layout support:** NCHWc layout support for potential performance improvement of Conv models. Needs building from source with `--enable_arm_neon_nchwc` to enable this feature ([#25580](https://github.com/microsoft/onnxruntime/pull/25580) [#26838](https://github.com/microsoft/onnxruntime/pull/26838) [#26691](https://github.com/microsoft/onnxruntime/pull/26691) [#26171](https://github.com/microsoft/onnxruntime/pull/26171)). This feature may be turned ON by default in a future release based on community feedback. - **ARM perf improvements:** Dedicated depthwise conv kernel ([#26688](https://github.com/microsoft/onnxruntime/pull/26688)) and `SiLU` activation perf improvement ([#26753](https://github.com/microsoft/onnxruntime/pull/26753)) --- ## 🔌 Language Bindings ### C# - .NET 9.0 MAUI targets ([#26463](https://github.com/microsoft/onnxruntime/pull/26463)) ### Python - `add_external_initializers_from_files` ([#26012](https://github.com/microsoft/onnxruntime/pull/26012)) ### Java - Auto EP and compile model support ([#25131](https://github.com/microsoft/onnxruntime/pull/25131)) - `OrtCompiledModelCompatibility` ([#26028](https://github.com/microsoft/onnxruntime/pull/26028)) --- ## 🐛 Bug Fixes ### Critical Fixes - DoS vulnerability in `FuseReluClip` ([#26878](https://github.com/microsoft/onnxruntime/pull/26878)) - Security issue loading arbitrary files as external data ([#26776](https://github.com/microsoft/onnxruntime/pull/26776)) - Memory leak fix for `KernelContext_GetAllocator` ([#26883](https://github.com/microsoft/onnxruntime/pull/26883)) - Local Attention off-by-1 bug ([#25927](https://github.com/microsoft/onnxruntime/pull/25927)) ### EP-Specific Fixes - [QNN] Clip op with min/max from QDQ ([#26601](https://github.com/microsoft/onnxruntime/pull/26601)) - [CoreML] Gather fp16 support ([#26442](https://github.com/microsoft/onnxruntime/pull/26442)) --- ## 🙏 Contributors Thanks to our **170 contributors** for this release! [@fs-eire](https://github.com/fs-eire), [@tianleiwu](https://github.com/tianleiwu), [@edgchen1](https://github.com/edgchen1), [@qjia7](https://github.com/qjia7), [@yuslepukhin](https://github.com/yuslepukhin), [@hariharans29](https://github.com/hariharans29), [@Honry](https://github.com/Honry), [@qti-yuduo](https://github.com/qti-yuduo), [@adrianlizarraga](https://github.com/adrianlizarraga), [@snnn](https://github.com/snnn), [@eserscor](https://github.com/eserscor), [@vraspar](https://github.com/vraspar), [@xiaofeihan1](https://github.com/xiaofeihan1), [@guschmue](https://github.com/guschmue), [@daijh](https://github.com/daijh), [@quic-muchhsu](https://github.com/quic-muchhsu), [@qti-jkilpatrick](https://github.com/qti-jkilpatrick), [@tirupath-qti](https://github.com/tirupath-qti), [@Jiawei-Shao](https://github.com/Jiawei-Shao), [@qti-hungjuiw](https://github.com/qti-hungjuiw), [@quic-ashwshan](https://github.com/quic-ashwshan), [@titaiwangms](https://github.com/titaiwangms), [@qti-mattsinc](https://github.com/qti-mattsinc), [@chilo-ms](https://github.com/chilo-ms), [@jchen10](https://github.com/jchen10), [@xhcao](https://github.com/xhcao), [@skottmckay](https://github.com/skottmckay), [@quic-calvnguy](https://github.com/quic-calvnguy), [@JonathanC-ARM](https://github.com/JonathanC-ARM), [@Rohanjames1997](https://github.com/Rohanjames1997), [@sushraja-msft](https://github.com/sushraja-msft), [@jambayk](https://github.com/jambayk), [@adrastogi](https://github.com/adrastogi), [@xenova](https://github.com/xenova), [@quic-tirupath](https://github.com/quic-tirupath), [@justinchuby](https://github.com/justinchuby), [@HectorSVC](https://github.com/HectorSVC), [@kunal-vaishnavi](https://github.com/kunal-vaishnavi), [@wenqinI](https://github.com/wenqinI), [@prathikr](https://github.com/prathikr), [@baijumeswani](https://github.com/baijumeswani), [@preetha-intel](https://github.com/preetha-intel), [@jatinwadhwa921](https://github.com/jatinwadhwa921), [@umangb-09](https://github.com/umangb-09), [@qti-ashwshan](https://github.com/qti-ashwshan), [@carzh](https://github.com/carzh), [@bachelor-dou](https://github.com/bachelor-dou), [@ranjitshs](https://github.com/ranjitshs), [@gedoensmax](https://github.com/gedoensmax), [@xadupre](https://github.com/xadupre), [@nenad1002](https://github.com/nenad1002), [@TedThemistokleous](https://github.com/TedThemistokleous), [@keshavv27](https://github.com/keshavv27), [@zpye](https://github.com/zpye), [@jnagi-intel](https://github.com/jnagi-intel), [@jiafatom](https://github.com/jiafatom), [@mingyueliuh](https://github.com/mingyueliuh), [@Colm-in-Arm](https://github.com/Colm-in-Arm), [@borg323](https://github.com/borg323), [@chunghow-qti](https://github.com/chunghow-qti), [@Craigacp](https://github.com/Craigacp), [@BODAPATIMAHESH](https://github.com/BODAPATIMAHESH), [@AlekseiNikiforovIBM](https://github.com/AlekseiNikiforovIBM), [@hans00](https://github.com/hans00), [@thevishalagarwal](https://github.com/thevishalagarwal), [@MaanavD](https://github.com/MaanavD), [@qti-kromero](https://github.com/qti-kromero), [@damdoo01-arm](https://github.com/damdoo01-arm), [@BoarQing](https://github.com/BoarQing), [@naomiOvad](https://github.com/naomiOvad), [@yuhuchua-qti](https://github.com/yuhuchua-qti), [@hadiFute](https://github.com/hadiFute), [@vishalpandya1990](https://github.com/vishalpandya1990), [@rivkastroh](https://github.com/rivkastroh), [@minfhong-qti](https://github.com/minfhong-qti), [@kuanyul-qti](https://github.com/kuanyul-qti), [@xieofxie](https://github.com/xieofxie), [@ankitm3k](https://github.com/ankitm3k), [@RyanMetcalfeInt8](https://github.com/RyanMetcalfeInt8), [@MayureshV1](https://github.com/MayureshV1), [@bopeng1234](https://github.com/bopeng1234), [@vthaniel](https://github.com/vthaniel), [@mdvoretc-intel](https://github.com/mdvoretc-intel), [@ericcraw](https://github.com/ericcraw), [@javier-intel](https://github.com/javier-intel), [@saurabhkale17](https://github.com/saurabhkale17), [@sfatimar](https://github.com/sfatimar), [@Kotomi-Du](https://github.com/Kotomi-Du), [@intbf](https://github.com/intbf), [@n1harika](https://github.com/n1harika), [@TejalKhade28](https://github.com/TejalKhade28), [@gupta-pallavi](https://github.com/gupta-pallavi), [@cbourjau](https://github.com/cbourjau), [@nieubank](https://github.com/nieubank), [@r-devulap](https://github.com/r-devulap), [@wszqkzqk](https://github.com/wszqkzqk), [@sanketkaleoss](https://github.com/sanketkaleoss), [@amancini-N](https://github.com/amancini-N), [@fanchenkong1](https://github.com/fanchenkong1), [@meakbiyik](https://github.com/meakbiyik), [@hisham-hchowdhu](https://github.com/hisham-hchowdhu), [@shaoboyan091](https://github.com/shaoboyan091), [@Stonesjtu](https://github.com/Stonesjtu), [@qwu16](https://github.com/qwu16), [@wangw-1991](https://github.com/wangw-1991), [@bonktree](https://github.com/bonktree), [@naetherm](https://github.com/naetherm), [@nikhilfujitsu](https://github.com/nikhilfujitsu), [@Panxuefeng-loongson](https://github.com/Panxuefeng-loongson), [@selenayang888](https://github.com/selenayang888), [@moyo1997](https://github.com/moyo1997), [@chwarr](https://github.com/chwarr), [@patryk-kaiser-ARM](https://github.com/patryk-kaiser-ARM), [@fdwr](https://github.com/fdwr), [@SavaLione](https://github.com/SavaLione), [@shiyi9801](https://github.com/shiyi9801), [@mcost45](https://github.com/mcost45), [@aciddelgado](https://github.com/aciddelgado), [@prudhvi-qti](https://github.com/prudhvi-qti), [@Jonahcb](https://github.com/Jonahcb), [@lifang-zhang](https://github.com/lifang-zhang), [@zhaoxul-qti](https://github.com/zhaoxul-qti), [@gaugarg-nv](https://github.com/gaugarg-nv), [@cocotdf](https://github.com/cocotdf), [@WangFengtu1996](https://github.com/WangFengtu1996), [@orlmon01](https://github.com/orlmon01), [@weidu-tpvision](https://github.com/weidu-tpvision), [@theHamsta](https://github.com/theHamsta), [@kevinch-nv](https://github.com/kevinch-nv), [@XXXXRT666](https://github.com/XXXXRT666), [@movedancer](https://github.com/movedancer), [@melkap01-Arm](https://github.com/melkap01-Arm), [@KingSora](https://github.com/KingSora), [@urpetkov-amd](https://github.com/urpetkov-amd), [@junchao-loongson](https://github.com/junchao-loongson), [@jixiongdeng](https://github.com/jixiongdeng), [@wcy123](https://github.com/wcy123), [@GrigoryEvko](https://github.com/GrigoryEvko), [@anujj](https://github.com/anujj), [@peishenyan](https://github.com/peishenyan), [@quic-ankus](https://github.com/quic-ankus), [@jchen351](https://github.com/jchen351), [@yihonglyu](https://github.com/yihonglyu), [@satyajandhyala](https://github.com/satyajandhyala), [@co63oc](https://github.com/co63oc), [@mschofie](https://github.com/mschofie), [@quic-ashigarg](https://github.com/quic-ashigarg), [@asoldano](https://github.com/asoldano), [@nproshun](https://github.com/nproshun), [@jiangzhaoming](https://github.com/jiangzhaoming), [@seungtaek94](https://github.com/seungtaek94), [@liqunfu](https://github.com/liqunfu), [@jaholme](https://github.com/jaholme), [@hanbitmyths](https://github.com/hanbitmyths), [@quic-boyuc](https://github.com/quic-boyuc), [@rM-planet](https://github.com/rM-planet), [@qti-vaiskv](https://github.com/qti-vaiskv), [@AndreyOrb](https://github.com/AndreyOrb), [@pkubaj](https://github.com/pkubaj), [@xhan65](https://github.com/xhan65), [@Jaswanth51](https://github.com/Jaswanth51), [@quic-hungjuiw](https://github.com/quic-hungjuiw), [@jywu-msft](https://github.com/jywu-msft), [@mklimenk](https://github.com/mklimenk), [@derdeljan-msft](https://github.com/derdeljan-msft), [@ianfhunter](https://github.com/ianfhunter), [@NingW101](https://github.com/NingW101), [@feich-ms](https://github.com/feich-ms), [@Akupadhye](https://github.com/Akupadhye), [@wschin](https://github.com/wschin) --- **Full Changelog:** [v1.23.2...rel-1.24.1](https://github.com/microsoft/onnxruntime/compare/v1.23.2...rel-1.24.1)

onnxruntime

ONNX Runtime v1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

API Version

More C++ Projects

tensorflow

electron

llama.cpp

bitcoin

More C++ Projects

tensorflow

electron

llama.cpp

bitcoin

ONNX Runtime v1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

API Version

✨ New Features

🤖 Execution Provider (EP) Plugin API

🔧 Core APIs

📊 Dependencies & Integration

🖥️ Execution Provider Updates

NVIDIA

Qualcomm QNN EP

Intel & AMD

ArmNN EP

🌐 Web & JavaScript

🧠 CPU Improvements

🔌 Language Bindings

C#

Python

Java

🐛 Bug Fixes

Critical Fixes

EP-Specific Fixes

🙏 Contributors