Ring-buffering now supported in schedules (Func::ring_buffer()). This is distinct from fold_storage in that it folds across time (the loop variables) rather than folding across space (the pure vars of the Func).
Fixed a longstanding bug in lossless_cast()
Lots of fixes for Vulkan backend
OpenGLCompute is no longer supported
Added support for ARM SVE2
Added (basic) support for Intel APX and AVX10
Added support for Hexagon HVX v68
Added support for numpy's .npy format to .debug_to_file() and the code in halide_image_io.h
Python bindings now support bfloat and int64 properly
Hacky code that auto-named Funcs, Vars etc via DWARF introspection was removed
The profiler was revamped to behave better when multiple Halide pipelines are in flight at the same time.
Numerous lowering passes were sped up, resulting in faster compilation for large pipelines. However, time spent in LLVM is still the long pole for most pipelines.
Fixed-point instruction selection has been improved via tracking constant integer bounds of expressions.
Adds feature detection for ARM CPUs to the runtime library and to the host target feature computation. Supports Windows, macOS,
Linux, iOS, and Android.
Deprecations / Removals
tuple_select() has been removed in favor of overloads to select().
Various fixed-point operators have been removed from the Halide::Internal namespace and are now in the public Halide namespace.
What's Changed
Detect ARM CPU features for host target and in runtime (#8298)
Scheduling directive to support ring buffering by @vksnk in https://github.com/halide/Halide/pull/7967
Don't add ring_buffer semaphores if the function is not scheduled as async by @vksnk in https://github.com/halide/Halide/pull/8015
Quick fix for crash that is occurring in SVE2 tests. by @zvookin in https://github.com/halide/Halide/pull/8020
Don't use variable-length arrays by @steven-johnson in https://github.com/halide/Halide/pull/8021
Set warnings on tests as well as src by @steven-johnson in https://github.com/halide/Halide/pull/8022
Stronger chain detection in LoopCarry pass by @vksnk in https://github.com/halide/Halide/pull/8016
adds mappings for f16 variants of halide float math by @mikewoodworth in https://github.com/halide/Halide/pull/8029
Require LLVM >= 16.0 by @steven-johnson in https://github.com/halide/Halide/pull/8003
Add test for #8029 by @steven-johnson in https://github.com/halide/Halide/pull/8032
Tweak the Printer code in runtime for smaller code by @steven-johnson in https://github.com/halide/Halide/pull/8023
Fix bounds_of_nested_lanes by @abadams in https://github.com/halide/Halide/pull/8039
Track whether or not let expressions failed to solve in solver by @abadams in https://github.com/halide/Halide/pull/7982
Fix type error in VectorizeLoops by @abadams in https://github.com/halide/Halide/pull/8055
Update makefile to use test/common/terminate_handler.cpp by @abadams in https://github.com/halide/Halide/pull/8066
add unsafe_promise_clamped by @wraith1995 in https://github.com/halide/Halide/pull/8071
Don't require Halide_WebGPU when using wasm (#8063) by @steven-johnson in https://github.com/halide/Halide/pull/8065
Outsmart the LLVM optimizer by @steven-johnson in https://github.com/halide/Halide/pull/8073
Add hexagon_benchmarks app for CMake builds by @prasmish in https://github.com/halide/Halide/pull/8069
Fix bool conversion bug in Vulkan code generator by @derek-gerstmann in https://github.com/halide/Halide/pull/8067
Better validation of gpu schedules by @abadams in https://github.com/halide/Halide/pull/8068
Add an easy way to print vectors in debug output. by @zvookin in https://github.com/halide/Halide/pull/8072
[WebGPU] Update to latest native headers by @jrprice in https://github.com/halide/Halide/pull/8081
Remove OpenGLCompute by @steven-johnson in https://github.com/halide/Halide/pull/8077
Add checks to prevent people from using negative split factors by @abadams in https://github.com/halide/Halide/pull/8076
Fix rfactor adding too many pure loops by @abadams in https://github.com/halide/Halide/pull/8086
New Contributors
@tylerhou made their first contribution in https://github.com/halide/Halide/pull/8013
@wraith1995 made their first contribution in https://github.com/halide/Halide/pull/8071
@prasmish made their first contribution in https://github.com/halide/Halide/pull/8069
@2022tgoel made their first contribution in https://github.com/halide/Halide/pull/8111
@FabianSchuetze made their first contribution in https://github.com/halide/Halide/pull/8182
@FindHao made their first contribution in https://github.com/halide/Halide/pull/8322
Full Changelog: https://github.com/halide/Halide/compare/v17.0.2...v18.0.0
Forward the partition methods from generator outputs by @abadams in https://github.com/halide/Halide/pull/8090
Parallelize some tests by @abadams in https://github.com/halide/Halide/pull/8078
Allow disabling of mutlithreading in simd op check by @steven-johnson in https://github.com/halide/Halide/pull/8096
clang does not support _Float16 when targeting i386 by @LebedevRI in https://github.com/halide/Halide/pull/8085
tests: correctness/float16_t: mark __extendhfsf2 with default visibility by @LebedevRI in https://github.com/halide/Halide/pull/8084
Fix reduce_expr_modulo of vector in Solve.cpp by @abadams in https://github.com/halide/Halide/pull/8089
[Vulkan] Region allocator fixes for memory requirements and allocations by @derek-gerstmann in https://github.com/halide/Halide/pull/8087
Ensure string(REPLACE) is called with the right number of arguments by @alexreinking in https://github.com/halide/Halide/pull/8097
Strip asserts right at the end of lowering by @abadams in https://github.com/halide/Halide/pull/8094
Fix clang-tidy error in runtime.printer.h (parameter shadows member) by @steven-johnson in https://github.com/halide/Halide/pull/8074
Fix an issue where the Halide compiler hits an internal error for bool types in widening intrinsics. by @zvookin in https://github.com/halide/Halide/pull/8099
Small Tutorial Fix by @2022tgoel in https://github.com/halide/Halide/pull/8111
Optionally print the time taken by each lowering pass by @abadams in https://github.com/halide/Halide/pull/8116
Do less redundant work in UnpackBuffers by @abadams in https://github.com/halide/Halide/pull/8104
Avoid redundant scope lookups by @abadams in https://github.com/halide/Halide/pull/8103
Add Intel APX and AVX10 target flags and LLVM attribute setting. by @zvookin in https://github.com/halide/Halide/pull/8052
Use a caching version of stmt_uses_vars in TightenProducerConsumer nodes by @abadams in https://github.com/halide/Halide/pull/8102
Fix hoist_storage not handling condition correctly. by @abadams in https://github.com/halide/Halide/pull/8123
Rewrite the skip stages lowering pass by @abadams in https://github.com/halide/Halide/pull/8115
Remove two dead vars from the Makefile by @abadams in https://github.com/halide/Halide/pull/8125
Add support for setting the default allocator and deallocator functions in Halide::Runtime::Buffer. by @mcourteaux in https://github.com/halide/Halide/pull/8132
Make realization order invariant to unique_name suffixes by @abadams in https://github.com/halide/Halide/pull/8124
Make gpu thread and block for loop names opaque by @abadams in https://github.com/halide/Halide/pull/8133
Add class template type deduction guides to avoid CTAD warning. by @zvookin in https://github.com/halide/Halide/pull/8135
[vulkan] Add conform API methods to memory allocator to fix block allocations by @derek-gerstmann in https://github.com/halide/Halide/pull/8130
Add sobel in hexagon benchmarks app for CMake builds by @prasmish in https://github.com/halide/Halide/pull/8127
Handle loads of broadcasts in FlattenNestedRamps by @abadams in https://github.com/halide/Halide/pull/8139
Use python itself to get the extension suffix, not python-config by @abadams in https://github.com/halide/Halide/pull/8148
Rewrite the pass that adds mutexes for atomic nodes by @abadams in https://github.com/halide/Halide/pull/8105
Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) by @mcourteaux in https://github.com/halide/Halide/pull/8143
Bound allocation extents for hoist_storage using loop variables one-by-one by @vksnk in https://github.com/halide/Halide/pull/8154
Support for ARM SVE2. by @zvookin in https://github.com/halide/Halide/pull/8051
Fix two compute_with bugs. by @abadams in https://github.com/halide/Halide/pull/8152
Python bindings: add_python_test(): do set HL_JIT_TARGET too by @LebedevRI in https://github.com/halide/Halide/pull/8156
fix ub in lower rounding shift right by @abadams in https://github.com/halide/Halide/pull/8173
Add some missing _Float16 support by @steven-johnson in https://github.com/halide/Halide/pull/8174
Add conversion code for Float16 that was missed in #8174 by @steven-johnson in https://github.com/halide/Halide/pull/8178
Tighten bounds of abs() by @rootjalex in https://github.com/halide/Halide/pull/8168
Clarify the meaning of Shuffle::is_broadcast() by @abadams in https://github.com/halide/Halide/pull/8158
Add .npy support to halide_image_io by @steven-johnson in https://github.com/halide/Halide/pull/8175
Update Hexagon Install Instructions by @FabianSchuetze in https://github.com/halide/Halide/pull/8182
Add .npy support to debug_to_file() by @steven-johnson in https://github.com/halide/Halide/pull/8177
Don't print on parallel task entry/exit with -debug flag by @abadams in https://github.com/halide/Halide/pull/8185
Fix corner case in if_then_else simplification by @abadams in https://github.com/halide/Halide/pull/8189
Rewrite IREquality to use a more compact stack instead of deep recursion by @abadams in https://github.com/halide/Halide/pull/8198
[HEXAGON] Keep support for hexagon_remote/Makefile by @aankit-quic in https://github.com/halide/Halide/pull/8186
Faster substitute_facts by @abadams in https://github.com/halide/Halide/pull/8200
Make Interval::is_single_point check for deep equality by @abadams in https://github.com/halide/Halide/pull/8202
Refactor ConstantInterval by @abadams in https://github.com/halide/Halide/pull/8179
Faster vars used tracking in simplify let visitor by @abadams in https://github.com/halide/Halide/pull/8205
More aggressively unify duplicate lets by @abadams in https://github.com/halide/Halide/pull/8204
Update debug_to_file API to remove type_code by @steven-johnson in https://github.com/halide/Halide/pull/8183
[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection by @rootjalex in https://github.com/halide/Halide/pull/7805
Insert apparently-missing break; in IREquality.cpp by @steven-johnson in https://github.com/halide/Halide/pull/8211
Fix Reinterpret cmp in IREquality by @rootjalex in https://github.com/halide/Halide/pull/8217
Fix give-up case in ModulusRemainder by @abadams in https://github.com/halide/Halide/pull/8221
Fix for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/8223
Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp by @abadams in https://github.com/halide/Halide/pull/8230
Fix saturating add matching in associativity checking by @abadams in https://github.com/halide/Halide/pull/8220
Add HVX_v68 target to support Hexagon HVX v68. by @wangcheng22 in https://github.com/halide/Halide/pull/8232
Mark host_dirty() and device_dirty() with no_discard. by @mcourteaux in https://github.com/halide/Halide/pull/8248
Rework the simplifier to use ConstantInterval for bounds by @abadams in https://github.com/halide/Halide/pull/8222
Remove max size assert from Anderson2021 by @jansel in https://github.com/halide/Halide/pull/8253
Expose BFloat in Python bindings by @jansel in https://github.com/halide/Halide/pull/8255
Fix Metal handling for float16 literals by @shoaibkamil in https://github.com/halide/Halide/pull/8260
Python binding support for int64 literals by @jansel in https://github.com/halide/Halide/pull/8254
Report useful error to user if the promise_clamp all fails to losslessly cast. by @mcourteaux in https://github.com/halide/Halide/pull/8238
It's generally a bad idea for simplifier rules to multiply constants by @abadams in https://github.com/halide/Halide/pull/8234
[vulkan] Fix Vulkan SIMT mappings for GPU loop vars. by @derek-gerstmann in https://github.com/halide/Halide/pull/8259
Stop region costs from complaining about new intrinsics by @abadams in https://github.com/halide/Halide/pull/8262
No longer silently hide errors in Metal completion handlers (alternative approach) by @shoaibkamil in https://github.com/halide/Halide/pull/8240
Use upstream interface for consuming SPIR-V by @alexreinking in https://github.com/halide/Halide/pull/8265
Fix OpenCL positive and negative INF constants. by @alexreinking in https://github.com/halide/Halide/pull/8266
scoped_truth for the loop variable being always less than the loop extent. by @mcourteaux in https://github.com/halide/Halide/pull/8306
Fix incorrect type in emulation of float16 is_inf/nan by @abadams in https://github.com/halide/Halide/pull/8310
Don't try to codegen predicated atomic stores by @abadams in https://github.com/halide/Halide/pull/8285
Add ability to pass explicit RDom to Function::define_update by @abadams in https://github.com/halide/Halide/pull/8284
[vulkan] Dynamically load Vulkan loader library. Avoid Validation Layer crash on exit. by @derek-gerstmann in https://github.com/halide/Halide/pull/8289
Remove Introspection by @steven-johnson in https://github.com/halide/Halide/pull/8273
Per-pipeline-invocation profiling by @abadams in https://github.com/halide/Halide/pull/8153
Fix device slices for Buffer with fixed dimensionality in template. by @mcourteaux in https://github.com/halide/Halide/pull/8313
Remove deprecated operators by @steven-johnson in https://github.com/halide/Halide/pull/8321
Provide a minimum OS version for MachO objects by @alexreinking in https://github.com/halide/Halide/pull/8323
Fix horrifying bug in lossless_cast of a subtract by @abadams in https://github.com/halide/Halide/pull/8155