Rewrite JNI functions to use JNI_TRY/JNI_CATCH by @ttnghia in https://github.com/rapidsai/cudf/pull/19053
Remove compatibility with nvCOMP versions before 5.0 by @vuule in https://github.com/rapidsai/cudf/pull/20140
Remove DataFrame.apply_chunks, Groupby.apply_grouped by @mroeschke in https://github.com/rapidsai/cudf/pull/20194
Change .str.starts/endswith with tuple argument to match any pattern instead of pairwise matching by @mroeschke in https://github.com/rapidsai/cudf/pull/20249
[cudf-polars] CUDA stream by @madsbk in https://github.com/rapidsai/cudf/pull/20154
Chunked read parquet, prepend index column, and apply deletion vector by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20201
Zero-copy hostdevice_vector on integrated systems by @vuule in https://github.com/rapidsai/cudf/pull/20225
Use int64_t for the num_rows slot in parquet_reader_options by @wence- in https://github.com/rapidsai/cudf/pull/20256
Require CUDA 12.2+ by @jakirkham in https://github.com/rapidsai/cudf/pull/20416
Remove compatibility for CCCL < 3.1 by @bdice in https://github.com/rapidsai/cudf/pull/20468
Remove deprecated types and APIs by @vuule in https://github.com/rapidsai/cudf/pull/20422
Support signed integers and decimals in SUM_WITH_OVERFLOW groupby by @PointKernel in https://github.com/rapidsai/cudf/pull/19598
Change groupby-scan COUNT to 1-based results by @davidwendt in https://github.com/rapidsai/cudf/pull/20168
Change strings::like() pattern parameter from string_scalar to string_view by @davidwendt in https://github.com/rapidsai/cudf/pull/20428
No-op performance tracking wrappers by @galipremsagar in https://github.com/rapidsai/cudf/pull/20595
π Bug Fixes
Copy attrs at correct place in DataFrame constructor by @galipremsagar in https://github.com/rapidsai/cudf/pull/20074
Handle missing nightly runs in pandas tests job by @galipremsagar in https://github.com/rapidsai/cudf/pull/20081
Fix numpy ufunc for DataFrame by @galipremsagar in https://github.com/rapidsai/cudf/pull/20070
Unproxy few unnecessary testing utilities in pandas by @galipremsagar in https://github.com/rapidsai/cudf/pull/20088
Fix libcudf groupby benchmarks to not include internal cache by @davidwendt in https://github.com/rapidsai/cudf/pull/20038
Fix cudf.date_range with non-iso start and end date strings by @mroeschke in https://github.com/rapidsai/cudf/pull/20116
Fix create_distinct_rows_column to create non-nullable columns by @davidwendt in https://github.com/rapidsai/cudf/pull/20082
Fix arrow timestamp frequency cases in cudf.pandas by @galipremsagar in https://github.com/rapidsai/cudf/pull/20128
Cast inputs to true division from decimal to float by @Matt711 in https://github.com/rapidsai/cudf/pull/20077
Handle NVMLError_NotSupported in cudf-polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20179
Fix RMM JNI pinned_fallback_host_memory_resource for CCCL 3.1.0 by @bdice in https://github.com/rapidsai/cudf/pull/20160
Require passing memory resources to from_libcudf methods by @vyasr in https://github.com/rapidsai/cudf/pull/20171
Enable hash-groupby for decimal32/64 type and MEAN aggregation by @davidwendt in https://github.com/rapidsai/cudf/pull/20040
Align decimal dtypes in predicate before conditional join by @Matt711 in https://github.com/rapidsai/cudf/pull/20060
Change stream_checking_resource_adaptor::do_deallocate to noexcept by @vyasr in https://github.com/rapidsai/cudf/pull/20218
Deallocation should be noexcept by @bdice in https://github.com/rapidsai/cudf/pull/20219
Fix a race condition in the decode of delta encoded Parquet columns by @vuule in https://github.com/rapidsai/cudf/pull/20216
Fix the host-device tdigest offsets by using cuda::std::span by @PointKernel in https://github.com/rapidsai/cudf/pull/20220
Add stream and mr arguments to Column.from_arrow type stub by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20244
Pin deltalake in cudf-polars-polars-tests CI job by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20255
Pin ibis-framework<11.0.0 by @Matt711 in https://github.com/rapidsai/cudf/pull/20267
Add private attributes for cudf.pandas proxy objects by @galipremsagar in https://github.com/rapidsai/cudf/pull/20276
Add Proxy for SparseAccessor by @galipremsagar in https://github.com/rapidsai/cudf/pull/20278
We need this to pacify mypy by @wence- in https://github.com/rapidsai/cudf/pull/20285
Purge non-empty nulls for the generated lists columns in data generation utility by @ttnghia in https://github.com/rapidsai/cudf/pull/20283
Fix missing table compatibility check in two_table_comparator constructor by @PointKernel in https://github.com/rapidsai/cudf/pull/20305
Fix the check for equal num_cols across empty parquet sources by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20320
Add nans_to_nulls to Frame by @galipremsagar in https://github.com/rapidsai/cudf/pull/20314
Add support for list type in get by @galipremsagar in https://github.com/rapidsai/cudf/pull/20332
Fix decimal dtype serialization in cudf-polars by @Matt711 in https://github.com/rapidsai/cudf/pull/20300
Make the GroupedRollingWindowexpression node reconstructable in cudf-polars by @Matt711 in https://github.com/rapidsai/cudf/pull/20288
Ensure pylibcudf.Scalar.from_py uses CUDA streams by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20340
Skip failing cudf-polars test due to hash groupby bug by @Matt711 in https://github.com/rapidsai/cudf/pull/20356
Support order by keys for order-sensitive scalar aggregations in grouped windows by @Matt711 in https://github.com/rapidsai/cudf/pull/20350
Honor user-passed stream in slice_strings for scalar inputs by @mroeschke in https://github.com/rapidsai/cudf/pull/20349
Thread missing streams in column/table view creation to char size calculation by @vyasr in https://github.com/rapidsai/cudf/pull/20351
Fix missed-sync for mapping_indices_kernel in hash-based groupby aggregation by @ttnghia in https://github.com/rapidsai/cudf/pull/20370
Fix a few SPDX-related issues by @KyleFromNVIDIA in https://github.com/rapidsai/cudf/pull/20364
Fix a dtype bug in column constructor by @galipremsagar in https://github.com/rapidsai/cudf/pull/20384
Refactor as_column dtype parameter calls by @galipremsagar in https://github.com/rapidsai/cudf/pull/20379
Add CUDA stream to cudf_polars.Column.deserialize by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20396
Add missing CUDA stream to cudf-polars left-semi join by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20398
Fix various string APIs to work with extension types by @galipremsagar in https://github.com/rapidsai/cudf/pull/20368
Add parameter validation for merge and MultiIndex.from_frame by @galipremsagar in https://github.com/rapidsai/cudf/pull/20382
Fix nvtext::normalize_characters special token case by @davidwendt in https://github.com/rapidsai/cudf/pull/20242
Fix pinned memory resource shared_pointer lifetime in tests. by @bdice in https://github.com/rapidsai/cudf/pull/20407
Support new nvcompStatus_t enum value by @vuule in https://github.com/rapidsai/cudf/pull/20376
Don't skip blank CSV lines rows after the header in cudf-polars scan_csv by @mroeschke in https://github.com/rapidsai/cudf/pull/20341
Fix OOB accesses in JSON_CornerCase_Empty test and get_row_array_parent_col_id function by @bdice in https://github.com/rapidsai/cudf/pull/20421
Change calls to cudaMemcpyToSymbol to cudaMemcpyToSymbolAsync by @davidwendt in https://github.com/rapidsai/cudf/pull/20374
Do not accelerate pandas._config.config by @Matt711 in https://github.com/rapidsai/cudf/pull/20413
Return timedelta instead of datetime type with std with datetime type with missing values by @mroeschke in https://github.com/rapidsai/cudf/pull/20439
Disallow non-bool skipna arguments to reduction methods by @mroeschke in https://github.com/rapidsai/cudf/pull/20436
Fix parquet scans for duckDB PDS-DS by @Matt711 in https://github.com/rapidsai/cudf/pull/20388
Support __array_function__ on the proxy array type by @Matt711 in https://github.com/rapidsai/cudf/pull/20419
Make memory_usage and __sizeof__ proxy attributes and always skip all memory usage tests by @Matt711 in https://github.com/rapidsai/cudf/pull/20425
Add input validation for from_records by @galipremsagar in https://github.com/rapidsai/cudf/pull/20412
Use computed reduction result type for empty sum and product aggregations by @mroeschke in https://github.com/rapidsai/cudf/pull/20438
Correct level arg validation for Index.isin, unique by @mroeschke in https://github.com/rapidsai/cudf/pull/20449
Add private _grouper attribute to DataFrameGroupBy proxy type by @Matt711 in https://github.com/rapidsai/cudf/pull/20448
Raise ValueError when indexing with zero step slice by @mroeschke in https://github.com/rapidsai/cudf/pull/20453
Raise IndexError for float-like indexers in RangeIndex/MultiIndex.getitem by @mroeschke in https://github.com/rapidsai/cudf/pull/20454
Disallow slice(bool, ...) in DataFrame.loc with MultiIndex by @mroeschke in https://github.com/rapidsai/cudf/pull/20457
Fix core dump in MemoryCleaner by @res-life in https://github.com/rapidsai/cudf/pull/19872
Disallow multiple ellipse values in loc/iloc indexing by @mroeschke in https://github.com/rapidsai/cudf/pull/20456
Fix scan operations for string columns by @galipremsagar in https://github.com/rapidsai/cudf/pull/20460
Fix UTF8 data generator in libcudf benchmarks utility by @davidwendt in https://github.com/rapidsai/cudf/pull/20465
Handle dealloc in stream-ordered cudf-polars ops by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20467
Raise on unsupported unstack cases by @Matt711 in https://github.com/rapidsai/cudf/pull/20463
Allow early exit for left semi-/anti- joins with empty build/probe tables by @shrshi in https://github.com/rapidsai/cudf/pull/20452
Fix OOB memory access in JSON reader ingest_raw utility by @davidwendt in https://github.com/rapidsai/cudf/pull/20451
Round up small-type groupby outputs to 4-byte boundary by @PointKernel in https://github.com/rapidsai/cudf/pull/20455
Fix GPU acceleration bug in decimal type-cast by @galipremsagar in https://github.com/rapidsai/cudf/pull/20471
Add missing CUDA stream in cudf_polars Distinct by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20477
Support __arrow_array__ on proxy extension array by @Matt711 in https://github.com/rapidsai/cudf/pull/20478
Enable scan operation for datetime64 and timedelta64 types by @galipremsagar in https://github.com/rapidsai/cudf/pull/20464
Remove unneeded type check in cudf::strings::slice_strings by @davidwendt in https://github.com/rapidsai/cudf/pull/20437
Fix join match context tests by @PointKernel in https://github.com/rapidsai/cudf/pull/20472
Fix the statistics_mr in benchmark fixture by @PointKernel in https://github.com/rapidsai/cudf/pull/20496
Guard __sizeof__ in pandas compatability mode by @Matt711 in https://github.com/rapidsai/cudf/pull/20495
Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20458
Fix cudf.pandas Timestamp/Timedelta not subclassing stdlib datetime objects by @mroeschke in https://github.com/rapidsai/cudf/pull/20433
Revert benchmark input generation logic for list type by @davidwendt in https://github.com/rapidsai/cudf/pull/20498
Avoid using pylibcudf directly in rapidsmpf runtime by @rjzamora in https://github.com/rapidsai/cudf/pull/20501
Suppress NVRTC arch warnings by @brandon-b-miller in https://github.com/rapidsai/cudf/pull/20517
Fix ChannelManager and Lineariser by @rjzamora in https://github.com/rapidsai/cudf/pull/20516
Synchronize streams in LocalShuffle by @rjzamora in https://github.com/rapidsai/cudf/pull/20515
Make argsort have return type np.intp to match pandas by @Matt711 in https://github.com/rapidsai/cudf/pull/20487
Fix polars.concat_str with one column in cudf_polars by @mroeschke in https://github.com/rapidsai/cudf/pull/20535
Override __sizeof__ for cudf.Index by @Matt711 in https://github.com/rapidsai/cudf/pull/20530
Fix pl.scan_csv(...).slice(...).collect(engine="gpu") with None endpoint by @mroeschke in https://github.com/rapidsai/cudf/pull/20519
Fix DataChunkSourceTest by syncing default stream by @davidwendt in https://github.com/rapidsai/cudf/pull/20492
Fix data size errors in some libcudf benchmarks by @davidwendt in https://github.com/rapidsai/cudf/pull/20512
Pin cython and pytest dependencies by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20571
Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in https://github.com/rapidsai/cudf/pull/20573
Handle Empty child IRs in _decompose by @Matt711 in https://github.com/rapidsai/cudf/pull/20409
Skip flaky pandas datetime test by @Matt711 in https://github.com/rapidsai/cudf/pull/20585
Fix max-pool-size-exceeded error in DATA_CHUNK_SOURCE_TEST by @davidwendt in https://github.com/rapidsai/cudf/pull/20534
Fix racecheck in nvtext wordpiece tokenizer kernel by @davidwendt in https://github.com/rapidsai/cudf/pull/20588
Fix the check to determine if all column chunk pages are dict encoded by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20524
Add stream synchronize to QUANTILES_TEST PercentileApprox gtests by @davidwendt in https://github.com/rapidsai/cudf/pull/20558
updated update-version.sh to handle release branch version changes by @rockhowse in https://github.com/rapidsai/cudf/pull/20598
Fix nvtext tokenizers handling invalid UTF8 data by @davidwendt in https://github.com/rapidsai/cudf/pull/20514
Fix overflow errors in distinct and filtered joins when hash table size exceeds int32 limits by @shrshi in https://github.com/rapidsai/cudf/pull/20594
[FEA] Optimize JIT Filter for Low-Selectivity by @lamarrr in https://github.com/rapidsai/cudf/pull/20222
Compute boolean function(NOT) on integers as a bitwise invert by @Matt711 in https://github.com/rapidsai/cudf/pull/20599
Cast output dtype of rolling aggregations to match pandas by @Matt711 in https://github.com/rapidsai/cudf/pull/20526
Add noop path for Frame.astype by @Matt711 in https://github.com/rapidsai/cudf/pull/20581
Fix copy semantics bugs thus reduce copies and memory usage by @galipremsagar in https://github.com/rapidsai/cudf/pull/20121
Ensure the sum after expression decomposition for mean has float output dtype by @Matt711 in https://github.com/rapidsai/cudf/pull/20596
Use Decimal(0) literal for all-null decimal groups in groupby-sum by @Matt711 in https://github.com/rapidsai/cudf/pull/20591
Do not drop freq when constructing DatetimeIndex from pandas by @brandon-b-miller in https://github.com/rapidsai/cudf/pull/18778
Fix --validation flag for cudf.pandas PDSH benchmarks by @mroeschke in https://github.com/rapidsai/cudf/pull/20540
Enable GPU acceleration for more binops by @galipremsagar in https://github.com/rapidsai/cudf/pull/20507
Fix rmm function calls due to removed deprecated APIs and macro by @ttnghia in https://github.com/rapidsai/cudf/pull/20661
Fix orc reader bool bug due to not being able to resume rle decode by @pmattione-nvidia in https://github.com/rapidsai/cudf/pull/20666
Fix categorical comparisons in cudf to match pandas by @galipremsagar in https://github.com/rapidsai/cudf/pull/20674
Fix any and all to match pandas by @galipremsagar in https://github.com/rapidsai/cudf/pull/20679
Fix return types of string APIs in cudf.pandas by @galipremsagar in https://github.com/rapidsai/cudf/pull/20683
Resolve pandas test failures by @galipremsagar in https://github.com/rapidsai/cudf/pull/20704
Fix DatetimeIndex pickling by @vyasr in https://github.com/rapidsai/cudf/pull/20709
DatetimeIndex.serialize() headers are msgpack serializable by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20714
π Documentation
Add note that --rmm-async only affects distributed scheduler. by @bdice in https://github.com/rapidsai/cudf/pull/20129
Add profiling guide by @bdice in https://github.com/rapidsai/cudf/pull/20292
Find RMM before CCCL by @wence- in https://github.com/rapidsai/cudf/pull/20336
Use current system architecture in conda environment creation command by @bdice in https://github.com/rapidsai/cudf/pull/20500
Use uname -m instead of arch command by @bdice in https://github.com/rapidsai/cudf/pull/20502
Use RAPIDS_BRANCH file for documentation links by @bdice in https://github.com/rapidsai/cudf/pull/20494
π New Features
Add memory resources to unary, transform, and filling modules by @vyasr in https://github.com/rapidsai/cudf/pull/20054
Add memory resources to binaryop, copying, and stream_compaction by @vyasr in https://github.com/rapidsai/cudf/pull/20059
Add memory resources to groupby, datetime, and lists modules by @vyasr in https://github.com/rapidsai/cudf/pull/20102
Add memory resources to search, reshape, and partitioning module by @vyasr in https://github.com/rapidsai/cudf/pull/20101
Add memory resources to rolling, sorting, and quantiles modules by @vyasr in https://github.com/rapidsai/cudf/pull/20099
[FEA] Implement JIT Filter for read_parquet by @lamarrr in https://github.com/rapidsai/cudf/pull/19831
Add memory resources to all nvtext APIs by @vyasr in https://github.com/rapidsai/cudf/pull/20119
Add memory resource to all strings modules by @vyasr in https://github.com/rapidsai/cudf/pull/20123
Add memory resources to reduce, column, column_factories, and contiguous_split by @vyasr in https://github.com/rapidsai/cudf/pull/20135
Add memory resources to I/O modules by @vyasr in https://github.com/rapidsai/cudf/pull/20136
Remove rounding from cudf java by @pmattione-nvidia in https://github.com/rapidsai/cudf/pull/20110
Add memory resources to replace, json, and hashing by @vyasr in https://github.com/rapidsai/cudf/pull/20150
Add support for maintain_order param in joins by @Matt711 in https://github.com/rapidsai/cudf/pull/17698
Add an example to inspect parquet files and dump row group and page level metadata information by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20117
Support forward/backward filling null values in a grouped window context by @Matt711 in https://github.com/rapidsai/cudf/pull/19907
Allow multiple calls to cudf::initialize and cudf::deinitialize by @vuule in https://github.com/rapidsai/cudf/pull/20111
Add remaining memory resources by @vyasr in https://github.com/rapidsai/cudf/pull/20197
Add memory resources to scalars by @vyasr in https://github.com/rapidsai/cudf/pull/20196
Add pylibcudf is_valid_reduce_aggregation API by @davidwendt in https://github.com/rapidsai/cudf/pull/20145
Support decimal literals in cudf-polars by @Matt711 in https://github.com/rapidsai/cudf/pull/20147
Support cum_sum(...).over(...) expressions in cudf-polars by @Matt711 in https://github.com/rapidsai/cudf/pull/19908
Passthrough unary ops through Parquet predicate pushdown by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20127
Implement ARGMIN and ARGMAX aggregations for reduction by @ttnghia in https://github.com/rapidsai/cudf/pull/20207
Skip decompression of pruned parquet pages by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20192
Add an example to demonstrate the use of next-gen parquet reader to read a parquet file with highly selective filters by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/19469
Evaluate IS_NULL at row group and page level in Parquet filtering by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20144
[Java] Add optional native deps loader by @zpuller in https://github.com/rapidsai/cudf/pull/20414
Add cudf-polars + rapidsmpf CI check by @rjzamora in https://github.com/rapidsai/cudf/pull/20355
Add Python bindings for the hybrid scan reader by @vyasr in https://github.com/rapidsai/cudf/pull/20381
RapidsMPF streaming-engine translation by @rjzamora in https://github.com/rapidsai/cudf/pull/20161
[JNI] Use a read/write lock pattern in Rmm.class by @abellina in https://github.com/rapidsai/cudf/pull/20521
[Java] Supports output projection indices for contiguousSplitGroupsAndGenUniqKeys by @res-life in https://github.com/rapidsai/cudf/pull/20391
Support Series.at and Series.iat for pandas compatability by @Matt711 in https://github.com/rapidsai/cudf/pull/20529
Add COUNT_VALID aggregation support to groupby-scan by @davidwendt in https://github.com/rapidsai/cudf/pull/20531
Use RapidsMPF read_parquet in "rapidsmpf" runtime by @rjzamora in https://github.com/rapidsai/cudf/pull/20497
Support decimal128 SUM aggregation in hash-based groupby by @PointKernel in https://github.com/rapidsai/cudf/pull/20509
Add stream testing in pylibcudf by @vyasr in https://github.com/rapidsai/cudf/pull/20625
π οΈ Improvements
Deprecate .from_pandas constructor by @mroeschke in https://github.com/rapidsai/cudf/pull/19996
Prune entries in Sphinx nitpick_ignore by @mroeschke in https://github.com/rapidsai/cudf/pull/20045
Avoid direct CategoricalColumn calls in dask_cudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20080
Fix typing issues in pylibcudf by @vyasr in https://github.com/rapidsai/cudf/pull/20069
Avoid shadowing module names by @vyasr in https://github.com/rapidsai/cudf/pull/20071
Remove calling to purge_nonempty_nulls in make_lists_column by @ttnghia in https://github.com/rapidsai/cudf/pull/12873
Reduce verbosity of running the pandas test suite by @vyasr in https://github.com/rapidsai/cudf/pull/20107
Clean up detail device atomic logic using atomic_ref by @PointKernel in https://github.com/rapidsai/cudf/pull/19924
Use 8 processes for pandas tests, show top 10 test times by @bdice in https://github.com/rapidsai/cudf/pull/20109
Update nvbench by @bdice in https://github.com/rapidsai/cudf/pull/19619
Cleanup of some libcudf aggregation code by @davidwendt in https://github.com/rapidsai/cudf/pull/20053
Run cudf-polars conda unit tests with more than 1 process by @mroeschke in https://github.com/rapidsai/cudf/pull/19980
Avoid running pandas unit tests for private functionality with cudf.pandas by @mroeschke in https://github.com/rapidsai/cudf/pull/20115
Remove MultiIndex.from_pandas pytest benchmark by @mroeschke in https://github.com/rapidsai/cudf/pull/20112
Switch host_vector and host_span dependency by @davidwendt in https://github.com/rapidsai/cudf/pull/20106
Have ListColumn.from_sequence go through pylibcudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20098
Fix RAPIDS_BRANCH version and update script by @galipremsagar in https://github.com/rapidsai/cudf/pull/20091
Add pyarrow stubs to mypy environment and fix associated errors by @vyasr in https://github.com/rapidsai/cudf/pull/20118
Fix slowdown in cudf-polars distributed tests by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20137
Improve performance of string column size computation during parquet reads. by @nvdbaranec in https://github.com/rapidsai/cudf/pull/19986
Disable async MR priming in cudf.pandas by @bdice in https://github.com/rapidsai/cudf/pull/20133
Rework reduction case statement as dispatch_type_and_aggregation by @davidwendt in https://github.com/rapidsai/cudf/pull/20078
Fix type annotations in cudf-polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20131
Add tests for AUTO and HYBRID (de)compression modes by @vuule in https://github.com/rapidsai/cudf/pull/20126
Branch 25.12 merge branch 25.10 by @vyasr in https://github.com/rapidsai/cudf/pull/20152
Manual forward merger for Branch 25.12 - branch 25.10 by @galipremsagar in https://github.com/rapidsai/cudf/pull/20157
Temporarily disable conda-java-tests by @bdice in https://github.com/rapidsai/cudf/pull/20162
Remove unused ColumnBase.view by @mroeschke in https://github.com/rapidsai/cudf/pull/20141
Avoid NumericalColumn call from CategoricalColumn.children by @mroeschke in https://github.com/rapidsai/cudf/pull/20153
Deprecate legacy public row operators by @PointKernel in https://github.com/rapidsai/cudf/pull/20097
Avoid more explicit calls to IntervalColumn and StructColumn by @mroeschke in https://github.com/rapidsai/cudf/pull/20064
Run cudf-polars wheels unit tests with more than 1 process by @mroeschke in https://github.com/rapidsai/cudf/pull/20124
Trace node execution in cudf-polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/19895
Make ColumnBase.as_*_column convert via pylibcudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20149
Reduce execution times for parquet dictionary tests by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20176
Update to rapids-logger 0.2 by @bdice in https://github.com/rapidsai/cudf/pull/20172
Adjust rmm pool handling in PDSH benchmarks by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20138
Don't assume cudf_polars benchmarking scale factor is always an integer by @mroeschke in https://github.com/rapidsai/cudf/pull/20182
Skip filtering Parquet row groups with dictionaries if there are non-dict encoded pages by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20175
Remove unnecessary work from read_parquet_metadata by @vuule in https://github.com/rapidsai/cudf/pull/20180
Improve performance of groupby tdigests gtests by @davidwendt in https://github.com/rapidsai/cudf/pull/20173
Revert "Temporarily disable conda-java-tests" by @bdice in https://github.com/rapidsai/cudf/pull/20184
Add PDSH benchmark runner for cudf.pandas by @mroeschke in https://github.com/rapidsai/cudf/pull/20164
Make Column.set_mask go through pylibcudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20103
Pin pydantic<2.12 in ci/test_cudf_polars_polars_tests.sh by @mroeschke in https://github.com/rapidsai/cudf/pull/20200
Add an overhead field to cudf-polars tracing by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20198
Support binops between float scalar to decimal column by @mroeschke in https://github.com/rapidsai/cudf/pull/20199
Reduce output buffer sizes for pruned pages of columns with a list parent by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20086
Make ListColumn._transform_leaves convert via pylibcudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20151
Rename comparison_binop_generator to arg_minmax_binop_generator and corresponding file to nested_types_extrema_utils.cuh by @Copilot in https://github.com/rapidsai/cudf/pull/20212
Pin polars version <1.34 and >=1.29 by @Matt711 in https://github.com/rapidsai/cudf/pull/19912
Stop using libcudf default parameters in pylibcudf by @vyasr in https://github.com/rapidsai/cudf/pull/20204
Fix various typing errors by @vyasr in https://github.com/rapidsai/cudf/pull/20205
Cleanup parquet for simple columns by @pmattione-nvidia in https://github.com/rapidsai/cudf/pull/19869
Configuration for which metrics are enabled during tracing by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20223
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/rapidsai/cudf/pull/20189
Fix parquet row number check for page bounds by @pmattione-nvidia in https://github.com/rapidsai/cudf/pull/20217
More mypy and docs fixes by @vyasr in https://github.com/rapidsai/cudf/pull/20224
Prevent accidental copies of expensive-to-copy object types by @vuule in https://github.com/rapidsai/cudf/pull/20226
Split row operator header by @PointKernel in https://github.com/rapidsai/cudf/pull/20166
Standardize setting StructDtype field names post libcudf conversion by @mroeschke in https://github.com/rapidsai/cudf/pull/20235
Add arm testing of cudf.pandas unit tests by @vyasr in https://github.com/rapidsai/cudf/pull/20251
Enable sccache-dist connection pool by @trxcllnt in https://github.com/rapidsai/cudf/pull/20264
Run polars tests with the streaming and in-memory executors by @Matt711 in https://github.com/rapidsai/cudf/pull/19354
Move and rename ScanPartitionPlan by @rjzamora in https://github.com/rapidsai/cudf/pull/20248
Unpin DuckDB and Ibis in cudf.pandas thirdparty tests by @mroeschke in https://github.com/rapidsai/cudf/pull/20269
Add pylibcudf to pre-commit linting and fix outstanding errors by @vyasr in https://github.com/rapidsai/cudf/pull/20250
Update ConfigOptions for rapidsmpf-streaming integration by @rjzamora in https://github.com/rapidsai/cudf/pull/20252
Handle unordered grouped windows properly for null filling and cum sums by @Matt711 in https://github.com/rapidsai/cudf/pull/20275
Add more type annotations to cudf/core/column subclasses by @mroeschke in https://github.com/rapidsai/cudf/pull/20277
Remove extraneous host_memory_resource include by @bdice in https://github.com/rapidsai/cudf/pull/20284
Add MultiIndex.dtypes by @galipremsagar in https://github.com/rapidsai/cudf/pull/20279
Skip mypy in pre-commit.ci by @bdice in https://github.com/rapidsai/cudf/pull/20286
Make ColumnBase.deserialize construct via pylibcudf by @mroeschke in https://github.com/rapidsai/cudf/pull/20142
Add numpy to the mypy pre-commit environment by @vyasr in https://github.com/rapidsai/cudf/pull/20282
Add ability to set the source_info of parquet_reader_options by @wence- in https://github.com/rapidsai/cudf/pull/20253
Add more Python type annotations to cudf/core by @mroeschke in https://github.com/rapidsai/cudf/pull/20287
Use main in RAPIDS_BRANCH by @bdice in https://github.com/rapidsai/cudf/pull/20312
Move "All rights reserved" statements to copyright line by @KyleFromNVIDIA in https://github.com/rapidsai/cudf/pull/20313
Add inferred_type and missing IntervalIndex properties by @galipremsagar in https://github.com/rapidsai/cudf/pull/20294
Avoid unseeded, random data generation in cuDF classic tests by @mroeschke in https://github.com/rapidsai/cudf/pull/20319
Improve hash-based groupby aggregation: direct write to the dense output columns whenever possible by @ttnghia in https://github.com/rapidsai/cudf/pull/19764
Avoid accessing range values in cudf::strings::contains_re logic by @davidwendt in https://github.com/rapidsai/cudf/pull/20122
Migrate mixed join to use the multiset data structure by @PointKernel in https://github.com/rapidsai/cudf/pull/19989
Add benchmark for strings cast to/from integer APIs by @davidwendt in https://github.com/rapidsai/cudf/pull/20247
Use main shared-workflows branch by @bdice in https://github.com/rapidsai/cudf/pull/20324
Use the thread pool for Parquet metadata processing by @vuule in https://github.com/rapidsai/cudf/pull/20263
Add .dt.day_of_week and .dt.daysinmonth by @galipremsagar in https://github.com/rapidsai/cudf/pull/20298
Avoid Column materialization in RangeIndex.nans_to_nulls by @mroeschke in https://github.com/rapidsai/cudf/pull/20331
Update the code to be compatible with the new cuco stream-ordered allocator by @PointKernel in https://github.com/rapidsai/cudf/pull/20258
Deprecate Series.data by @mroeschke in https://github.com/rapidsai/cudf/pull/20281
Align cudf Python's Column constructors by @mroeschke in https://github.com/rapidsai/cudf/pull/20233
Make type annotations of ColumnBase.set_mask stricter by @mroeschke in https://github.com/rapidsai/cudf/pull/20261
Make type annotations of ColumnBase.find_and_replace stricter by @mroeschke in https://github.com/rapidsai/cudf/pull/20259
Make type annotations of ColumnBase.apply_boolean_mask stricter by @mroeschke in https://github.com/rapidsai/cudf/pull/20262
Skip Python LZ4 tests when nvCOMP is disabled by @vuule in https://github.com/rapidsai/cudf/pull/20293
Move cudf/io/nvcomp_adapter.hpp to cudf/io/detail by @davidwendt in https://github.com/rapidsai/cudf/pull/20327
Add context to IR.do_evaluate by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20322
Update mypy # type: ignore comments according to stricter mypy configs by @mroeschke in https://github.com/rapidsai/cudf/pull/20272
Remove duplicated enforce null consistency code by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20342
Use SPDX for all copyright headers by @KyleFromNVIDIA in https://github.com/rapidsai/cudf/pull/20321
Add more type annotations to cudf/core/series.py by @mroeschke in https://github.com/rapidsai/cudf/pull/20304
Remove/Replace uses of numba.cuda arrays in pytest benchmarks and tests by @mroeschke in https://github.com/rapidsai/cudf/pull/20359
Add duckdb pdsh query queries by @Matt711 in https://github.com/rapidsai/cudf/pull/20257
Use stream in cudf_polars.DataFrame.to_polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20323
Add join_streams to pylibcudf API by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20316
Use CUDA streams in all pylibcudf calls made by cudf-polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20291
Add cudf/io/config_utils.hpp to doxygen by @davidwendt in https://github.com/rapidsai/cudf/pull/20329
Test coverage for parallel metadata parsing by @vuule in https://github.com/rapidsai/cudf/pull/20334
Support serializing more polars types by @Matt711 in https://github.com/rapidsai/cudf/pull/20347
Add CUDAStreamPolicy to cudf-polars configuration by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20366
Unskip cudf-polars groupby test by @Matt711 in https://github.com/rapidsai/cudf/pull/20406
Deprecate pylibcudf interop arrow APIs by @Matt711 in https://github.com/rapidsai/cudf/pull/20405
Get rid of the hashing helper header by @PointKernel in https://github.com/rapidsai/cudf/pull/20360
Minor cleanup and fixes for libcudf generate_input.cu by @davidwendt in https://github.com/rapidsai/cudf/pull/20363
Ignore assert_produces_warning and shares_memory pandas unit tests for cudf.pandas by @mroeschke in https://github.com/rapidsai/cudf/pull/20434
Short circut RangeIndex.append for length 0 input, proxy private attribute by @mroeschke in https://github.com/rapidsai/cudf/pull/20442
Mark DataFrame.insert as _external_only_api by @Copilot in https://github.com/rapidsai/cudf/pull/20403
Deprecate get_current_device_resource in favor of get_current_device_resource_ref by @PointKernel in https://github.com/rapidsai/cudf/pull/20386
Promote JoinNoneValue to public as JoinNoMatch for clear non-match Join semantics by @PointKernel in https://github.com/rapidsai/cudf/pull/20440
Remove duplicate entries in NODEIDS_THAT_FAIL_WITH_CUDF_PANDAS by @mroeschke in https://github.com/rapidsai/cudf/pull/20447
Use the thread pool in the compact protocol reader by @vuule in https://github.com/rapidsai/cudf/pull/20417
Update README.md generalizing all cuDF components by @mroeschke in https://github.com/rapidsai/cudf/pull/20357
Skip TestDatetimelikeCoercion pandas tests that assert ._value identity by @mroeschke in https://github.com/rapidsai/cudf/pull/20459
Add PSDH Q2-9 for cudf.pandas by @mroeschke in https://github.com/rapidsai/cudf/pull/20418
Add s3fs to test_cudf_python common dependencies by @trxcllnt in https://github.com/rapidsai/cudf/pull/20473
Use public pandas APIs in StringColumn.to_pandas by @mroeschke in https://github.com/rapidsai/cudf/pull/20474
Expose java GatherMap internals and add toString to AST by @revans2 in https://github.com/rapidsai/cudf/pull/20483
Add create_ascii_string_column to the libcudf benchmark data generator by @davidwendt in https://github.com/rapidsai/cudf/pull/20354
Skip more pandas unit tests that tests BlockManager, private sparse types by @mroeschke in https://github.com/rapidsai/cudf/pull/20489
Add boto3/botocore/aiobotocore to common test dependencies by @trxcllnt in https://github.com/rapidsai/cudf/pull/20490
Use a lower bound when estimating the partial file-size by @rjzamora in https://github.com/rapidsai/cudf/pull/20193
Performance improvement for nvtext::edit_distance for long strings by @davidwendt in https://github.com/rapidsai/cudf/pull/20268
Add MemoryResourceConfig to cudf-polars config by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20042
Improve project automation by @vyasr in https://github.com/rapidsai/cudf/pull/20523
Fuse simple streaming reductions in cudf-polars by @rjzamora in https://github.com/rapidsai/cudf/pull/18757
Migrate to new CCCL memory resource interface by @bdice in https://github.com/rapidsai/cudf/pull/20513
Add empty input gtest for cudf::transform by @davidwendt in https://github.com/rapidsai/cudf/pull/20505
Rework internal json headers to allow converting gtests files from .cu to .cpp by @davidwendt in https://github.com/rapidsai/cudf/pull/20491
Set continue on error in the cudf-polars-rapidsmpf nightly CI job by @Matt711 in https://github.com/rapidsai/cudf/pull/20550
Permanently back cuDF column by a pylibcudf.Column by @mroeschke in https://github.com/rapidsai/cudf/pull/20306
Skip flaky upstream polars rolling test by @Matt711 in https://github.com/rapidsai/cudf/pull/20552
Accelerate data page mask computation on device by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20280
Change default rapidsmpf stream policy to 'pool' by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20527
Increase gtests coverage for cudf::strings::like patterns by @davidwendt in https://github.com/rapidsai/cudf/pull/20348
Add cuda::std::span operator to cudf::column_view by @davidwendt in https://github.com/rapidsai/cudf/pull/20541
Update ArrowStringView compare benchmark for gather by @davidwendt in https://github.com/rapidsai/cudf/pull/19935
Add pytest stubs and remove ujson usage by @vyasr in https://github.com/rapidsai/cudf/pull/20560
Skip arrow array constructor tests by @Matt711 in https://github.com/rapidsai/cudf/pull/20579
Add Polars to mypy environment and fix errors by @vyasr in https://github.com/rapidsai/cudf/pull/20563
Ensure table chunks are unspilled and available by @madsbk in https://github.com/rapidsai/cudf/pull/20583
Skip tests that assert behavior when copy-on-write is False by @Matt711 in https://github.com/rapidsai/cudf/pull/20506
Pass streams through Column.from_array/from_iterable_of_py by @Matt711 in https://github.com/rapidsai/cudf/pull/20569
Stop using Dtype annotation by @vyasr in https://github.com/rapidsai/cudf/pull/20590
Workaround to enable running PDS-H via WebHDFS by @kingcrimsontianyu in https://github.com/rapidsai/cudf/pull/20132
Update RMM includes from <rmm/mr/device/*> to <rmm/mr/*> by @bdice in https://github.com/rapidsai/cudf/pull/20607
Stricter typing import for cudf-polars by @TomAugspurger in https://github.com/rapidsai/cudf/pull/20614
Avoid the unnecessary H2H copy in the std::vector sink by @vuule in https://github.com/rapidsai/cudf/pull/20602
Preprocessing offsets for Parquet non-dictionary string columns by @pmattione-nvidia in https://github.com/rapidsai/cudf/pull/20430
Move more pandas unit tests that test private APIs by @mroeschke in https://github.com/rapidsai/cudf/pull/20511
Use .plc_column instead of .to_pylibcudf in rolling, string utilties by @mroeschke in https://github.com/rapidsai/cudf/pull/20562
Skip TestSetitemNADatetimeLikeDtype pandas unit tests due to private assertion by @mroeschke in https://github.com/rapidsai/cudf/pull/20578
Pin Polars version <1.35 by @Matt711 in https://github.com/rapidsai/cudf/pull/20266
Skip pandas unit tests in test_old_base.py that test private APIs by @mroeschke in https://github.com/rapidsai/cudf/pull/20572
Use .plc_column attribute instead of to_pylibcudf more internally by @mroeschke in https://github.com/rapidsai/cudf/pull/20559
Skip arrow-backed arithmetic tests and categorize the remaining failing tests by @Matt711 in https://github.com/rapidsai/cudf/pull/20577
Fix a pytest execution that is spawned in a subprocess by @galipremsagar in https://github.com/rapidsai/cudf/pull/20660
Accelerated parquet page header decoding when page index is available by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20369
feat: add error handling for non-existent columns in parquet reader by @gforsyth in https://github.com/rapidsai/cudf/pull/20659
Optimize row mask computation for single filter column by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20335
Skip MultiIndex pandas unit tests testing private functionalty, test_chaining_and_caching.py by @mroeschke in https://github.com/rapidsai/cudf/pull/20575
Address minor comments from recent hybrid scan PRs by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/20672
Add a timeout for the rapidsmpf test run by @vyasr in https://github.com/rapidsai/cudf/pull/20681
Use sccache-dist build cluster for conda and wheel builds by @trxcllnt in https://github.com/rapidsai/cudf/pull/20488
New Contributors
@Copilot made their first contribution in https://github.com/rapidsai/cudf/pull/20212
@rockhowse made their first contribution in https://github.com/rapidsai/cudf/pull/20598
Full Changelog: https://github.com/rapidsai/cudf/compare/v25.12.00a...v25.12.00