New
Version 0.8.2: CUDA 13.1 compatibility, minor additions & bug fixes
Changes since v0.8.1:
CUDA 13.x compatibility & features
- #755 : We can now obtaing graph containing a given node, even when we only have its handle (CUDA 13.1)
- #754 : Support some new CUDA 13.1 graph-related API calls involving IDs
- #748, #749, #746 : Now supporting the use of CUDA-13-versions of some API functions.
- #749 : Can now advise on where to locate memory with much better resolution, including NUMA nodes.
CUDA 12.x features
- #735 : Can now query the new pointer attributres added in CUDA 12.6 and 12.8
Usability improvements
- #725 : One can now bypass the kernel argument validity checks, by specialization
is_valid_kernel_argument_t<T>for a given typeT. This is useful for using this library with code which is safe to use as an argument in practice, but for various reasons cannot be defined so as to guarantee that being the case. - #739 : Can now construct
region_t's from C arrays (without specifying their size) - #736 : The
divides()function for non-grid dimensions is no longer a member, but a freestanding one.
Bug fixes
- #729 :
memory::pool::ipc::imported_ptr_twill no longer try to free its pointer on destruction when it has been non-owning
'Unsoundness' fixes
- #767 : Removed gratuitous declaration of
cuda::link::create(const void *image, const link::options_t &options): We create empty links, then add binary data - but only when also specifying the type of said data. - #766 : Aligned declaration and definition of
cuda::memory::virtual_::set_permissions() - #765 : Aligned declaration and definition of
cuda::memory::managed::make_unique_region() - #764 : Marked more methods of proxy/wrapper classes
const - #762 : The launch configuration builder now always validates the configuration when its
::build()method is triggered. - #728 : Now consistently avoiding throwing exceptions in destructors; but - it you want destructors to throw on errors, you can enable that by defining
THROW_IN_DESTRUCTORS. - #741 :
make_unique_region()takes a number of bytes; parameter was renamed fromnum_elementstonum_bytes - #751 : Avoided non-robust/ambiguous behavior when setting an endpoint of a transfer in
copy_parameters_t<N>structures: If the endpoint setting takes dimensions, then the pitch and extents are overwritten with those dimensions (as opposed to checking whether they're 0 or not - we cannot always rely on 0-initialization).
Library build process
- #738 : Minimum CMake version bumped to 3.26
Other changes to the library
- #763 A more explicit friend class declaration to prevent mis-resolution by IDE parsers
- #769 A slew of trivial code style and clarify improvements due to static analysis suggestions
- #730 : Moved some code for supporting compilation with different C++ versions out of
detail/type_traits.hppinto a new header,detail/preamble.hpp: Conditional definitions ofCPP14_CONSTEXPR,CAW_MAYBE_UNUSEDand such - #769 A number of code style and clarification improvement tweaks with no functional effect
Changes in the Example programs
Bug fixes
- #672 In
simpleCudaGraphs, when building a graph in a DIY fashion rather than using a builder - now extending the lifetime of kernel argument information to circumvent a CUDA Driver issue of not copying the relevant data on time.
Build process
- #743 : Added CMake code to replace semicolons with spaces when necessary.
- #752 : When building with CUDA 13.x, Prevent (old versions of) CMake from passing it compute capabilities it no longer supports.
- #745 : Avoid pulling
libcu++code in example programs which include<cooperative_groups.h>. - #727 : Custom compilation commands now use
$CMAKE_CUDA_FLAGS, not just the architecture choice flags.