Unclaimed project
Are you a maintainer of polars? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
polars
Extremely fast Query Engine for DataFrames, written in Rust
Back to changelogNew
Python Polars 1.37.0
π Performance improvements
- Speed up
SQL interface "ORDER BY" clauses (#26037)
- Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
- Optimize ArrayFromIter implementations for ObjectArray (#25712)
- New streaming NDJSON sink pipeline (#25948)
- New streaming CSV sink pipeline (#25900)
- Dispatch partitioned usage of
sink_* functions to new-streaming by default (#25910)
- Replace ryu with faster zmij (#25885)
- Reduce memory usage for .item() count in grouped first/last (#25787)
- Skip schema inference if schema provided for
scan_csv/ndjson (#25757)
- Add width-aware chunking to prevent degradation with wide data (#25764)
- Use new sink pipeline for write/sink_ipc (#25746)
- Reduce memory usage when scanning multiple parquet files in streaming (#25747)
- Don't call cluster_with_columns optimization if not needed (#25724)
β¨ Enhancements
- Add new
pl.PartitionBy API (#26004)
- ArrowStreamExportable and sink_delta (#25994)
Python Polars 1.37.0 - polars Release Notes | AnnounceHQRelease musl builds (#25894)Implement streaming decompression for CSV COUNT(*) fast path (#25988)Add nulls support for rolling_mean_by (#25917)Add lazy collect_all (#25991)Add streaming decompression for NDJSON schema inference (#25992)Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)Drop Python 3.9 support (#25984)Expose record batch size in {sink,write}_ipc (#25958)Add null_on_oob parameter to expr.get (#25957)Suggest correct timezone if timezone validation fails (#25937)Support streaming IPC scan from S3 object store (#25868)Implement streaming CSV schema inference (#25911)Support hashing of meta expressions (#25916)Improve SQLContext recognition of possible table objects in the Python globals (#25749)Add pl.Expr.(min|max)_by (#25905)Improve MemSlice Debug impl (#25913)Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)Expand scatter to more dtypes (#25874)Implement streaming CSV decompression (#25842)Add Series sql method for API consistency (#25792)Mark Polars as safe for free-threading (#25677)Support Binary and Decimal in arg_(min|max) (#25839)Allow Decimal parsing in str.json_decode (#25797)Add shift support for Object data type (#25769)Add missing Series.arr.mean (#25774)Allow scientific notation when parsing Decimals (#25711)π Bug fixes
- Release GIL on collect_batches (#26033)
- Missing buffer update in String is_in Parquet pushdown (#26019)
- Make
struct.with_fields data model coherent (#25610)
- Incorrect output order for order sensitive operations after join_asof (#25990)
- Use SeriesExport for pyo3-polars FFI (#26000)
- Add pl.Schema to type signature for DataFrame.cast (#25983)
- Don't write Parquet min/max statistics for i128 (#25986)
- Ensure chunk consistency in in-memory join (#25979)
- Fix varying block metadata length in IPC reader (#25975)
- Implement collect_batches properly in Rust (#25918)
- Fix panic on arithmetic with bools in list (#25898)
- Convert to index type with strict cast in some places (#25912)
- Empty dataframe in streaming non-strict hconcat (#25903)
- Infer large u64 in json as i128 (#25904)
- Set http client timeouts to 10 minutes (#25902)
- Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
- Raise error on duplicate
group_by names in upsample() (#25811)
- Correctly export view buffer sizes nested in Extension types (#25853)
- Fix
DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
- Ensure Kahan sum does not introduce NaN from infinities (#25850)
- Trim excess bytes in parquet decode (#25829)
- Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
- Fix quantile
midpoint interpolation (#25824)
- Don't use cast when converting from physical in list.get (#25831)
- Invalid null count on int -> categorical cast (#25816)
- Update groups in
list.eval (#25826)
- Use downcast before FFI conversion in PythonScan (#25815)
- Double-counting of row metrics (#25810)
- Cast nulls to expected type in streaming union node (#25802)
- Incorrect slice pushdown into map_groups (#25809)
- Fix panic writing parquet with single bool column (#25807)
- Fix upsample with
group_by incorrectly introduced NULLs on group key columns (#25794)
- Panic in top_k pruning (#25798)
- Fix incorrect
collect_schema for unpivot followed by join (#25782)
- Verify
arr namespace is called from array column (#25650)
- Ensure
LazyFrame.serialize() unchanged after collect_schema() (#25780)
- Function map_(rows|elements) with return_dtype = pl.Object (#25753)
- Fix incorrect cargo sub-feature (#25738)
π Documentation
- Fix display of deprecation warning (#26010)
- Document null behaviour for
rank (#25887)
- Add
QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
- Update mixed-offset datetime parsing example in user guide (#25915)
- Update bare-metal docs for mounted anonymous results (#25801)
- Fix credential parameter name in cloud-storage.py (#25788)
- Configuration options update (#25756)
π οΈ Other improvements
- Update rust compiler (#26017)
- Improve csv test coverage (#25980)
- Ramp up CSV read size (#25997)
- Mark
lazy parameter to collect_all as unstable (#25999)
- Update
ruff action and simplify version handling (#25940)
- Run python lint target as part of pre-commit (#25982)
- Disable HTTP timeout for receiving response body (#25970)
- Fix mypy lint (#25963)
- Add AI contribution policy (#25956)
- Fix failing scan delta S3 test (#25932)
- Improve MemSlice Debug impl (#25913)
- Remove and deprecate batched csv reader (#25884)
- Remove unused AnonymousScan functions (#25872)
- Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
- Various small improvements (#25835)
- Clear venv with appropriate version of Python (#25851)
- Skip schema inference if schema provided for
scan_csv/ndjson (#25757)
- Ensure proper async connection cleanup on DB test exit (#25766)
- Ensure we uninstall other Polars runtimes in CI (#25739)
- Make 'make requirements' more robust (#25693)
- Remove duplicate compression level types (#25723)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]