Unclaimed project
Are you a maintainer of polars? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
polars
Extremely fast Query Engine for DataFrames, written in Rust
Back to changelogNew
Rust Polars 0.53.0
π Highlights
- Add Extension types (#25322)
π Performance improvements
- Don't always rechunk on gather of nested types (#26478)
- Enable zero-copy object_store
put upload for IPC sink (#26288)
- Resolve file schema's and metadata concurrently (#26325)
- Run elementwise CSEE for the streaming engine (#26278)
- Disable morsel splitting for fast-count on streaming engine (#26245)
- Implement streaming decompression for scan_ndjson and scan_lines (#26200)
- Improve string slicing performance (#26206)
- Refactor
scan_delta to use python dataset interface (#26190)
- Add dedicated kernel for group-by
arg_max/arg_min (#26093)
- Add streaming merge-join (#25964)
- Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
- Reduce fs stat calls in path expansion (#26173)
- Lower streaming group_by n_unique to unique().len() (#26109)
- Speed up
SQL interface "UNION" clauses (#26039)
- Speed up
SQL interface "ORDER BY" clauses (#26037)
Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)Optimize ArrayFromIter implementations for ObjectArray (#25712)New streaming NDJSON sink pipeline (#25948)New streaming CSV sink pipeline (#25900)Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)Replace ryu with faster zmij (#25885)Reduce memory usage for .item() count in grouped first/last (#25787)Skip schema inference if schema provided for scan_csv/ndjson (#25757)Add width-aware chunking to prevent degradation with wide data (#25764)Use new sink pipeline for write/sink_ipc (#25746)Reduce memory usage when scanning multiple parquet files in streaming (#25747)Don't call cluster_with_columns optimization if not needed (#25724)Tune partitioned sink_parquet cloud performance (#25687)New single file IO sink pipeline enabled for sink_parquet (#25670)New partitioned IO sink pipeline enabled for sink_parquet (#25629)Correct overly eager local predicate insertion for unpivot (#25644)Reduce HuggingFace API calls (#25521)Use strong hash instead of traversal for CSPE equality (#25537)Fix panic in is_between support in streaming Parquet predicate push down (#25476)Faster kernels for rle_lengths (#25448)Allow detecting plan sortedness in more cases (#25408)Enable predicate expressions on unsigned integers (#25416)Mark output of more non-order-maintaining ops as unordered (#25419)Fast find start window in group_by_dynamic with large offset (#25376)Add streaming native LazyFrame.group_by_dynamic (#25342)Add streaming sorted Group-By (#25013)Add parquet prefiltering for string regexes (#25381)Use fast path for agg_min/agg_max when nulls present (#25374)Fuse positive slice into streaming LazyFrame.rolling (#25338)Mark Expr.reshape((-1,)) as row separable (#25326)Use bitmap instead of Vec<bool> in first/last w. skip_nulls (#25318)Return references from aexpr_to_leaf_names_iter (#25319)β¨ Enhancements
- Add primitive filter -> agg lowering in streaming GroupBy (#26459)
- Support for the SQL
FETCH clause (#26449)
- Add get() to retrieve a byte from binary data (#26454)
- Remove with_context in SQL lowering (#26416)
- Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
- Add JoinBuildSide (#26403)
- Support annoymous agg in-mem (#26376)
- Add unstable
arrow_schema parameter to sink_parquet (#26323)
- Improve error message formatting for structs (#26349)
- Remove parquet field overwrites (#26236)
- Enable zero-copy object_store
put upload for IPC sink (#26288)
- Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
- Expose
upload_concurrency through env var (#26263)
- Allow quantile to compute multiple quantiles at once (#25516)
- Allow empty LazyFrame in
LazyFrame.group_by(...).map_groups (#26275)
- Use delta file statistics for batch predicate pushdown (#26242)
- Add streaming UnorderedUnion (#26240)
- Implement compression support for sink_ndjson (#26212)
- Add unstable record batch statistics flags to
{sink/scan}_ipc (#26254)
- Cloud retry/backoff configuration via
storage_options (#26204)
- Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
- Expose physical plan
NodeStyle (#26184)
- Add streaming merge-join (#25964)
- Serialize optimization flags for cloud plan (#26168)
- Add compression support to write_csv and sink_csv (#26111)
- Add
scan_lines (#26112)
- Support regex in
str.split (#26060)
- Add unstable IPC Statistics read/write to
scan_ipc/sink_ipc (#26079)
- Add nulls support for all rolling_by operations (#26081)
- ArrowStreamExportable and sink_delta (#25994)
- Release musl builds (#25894)
- Implement streaming decompression for CSV
COUNT(*) fast path (#25988)
- Add nulls support for rolling_mean_by (#25917)
- Add lazy
collect_all (#25991)
- Add streaming decompression for NDJSON schema inference (#25992)
- Improved handling of unqualified SQL
JOIN columns that are ambiguous (#25761)
- Expose record batch size in
{sink,write}_ipc (#25958)
- Add
null_on_oob parameter to expr.get (#25957)
- Suggest correct timezone if timezone validation fails (#25937)
- Support streaming IPC scan from S3 object store (#25868)
- Implement streaming CSV schema inference (#25911)
- Support hashing of meta expressions (#25916)
- Improve
SQLContext recognition of possible table objects in the Python globals (#25749)
- Add pl.Expr.(min|max)_by (#25905)
- Improve MemSlice Debug impl (#25913)
- Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
- Expand scatter to more dtypes (#25874)
- Implement streaming CSV decompression (#25842)
- Add Series
sql method for API consistency (#25792)
- Mark Polars as safe for free-threading (#25677)
- Support Binary and Decimal in arg_(min|max) (#25839)
- Allow Decimal parsing in str.json_decode (#25797)
- Add
shift support for Object data type (#25769)
- Add node status to NodeMetrics (#25760)
- Allow scientific notation when parsing Decimals (#25711)
- Allow creation of
Object literal (#25690)
- Don't collect schema in SQL union processing (#25675)
- Add
bin.slice(), bin.head(), and bin.tail() methods (#25647)
- Add SQL support for the
QUALIFY clause (#25652)
- New partitioned IO sink pipeline enabled for sink_parquet (#25629)
- Add SQL syntax support for
CROSS JOIN UNNEST(col) (#25623)
- Add separate env var to log tracked metrics (#25586)
- Expose fields for generating physical plan visualization data (#25562)
- Allow pl.Object in pivot value (#25533)
- Extend SQL
UNNEST support to handle multiple array expressions (#25418)
- Minor improvement for
as_struct repr (#25529)
- Temporal
quantile in rolling context (#25479)
- Add support for
Float16 dtype (#25185)
- Add strict parameter to pl.concat(how='horizontal') (#25452)
- Add leftmost option to
str.replace_many / str.find_many / str.extract_many (#25398)
- Add
quantile for missing temporals (#25464)
- Expose and document pl.Categories (#25443)
- Support decimals in search_sorted (#25450)
- Use reference to Graph pipes when flushing metrics (#25442)
- Add SQL support for named
WINDOW references (#25400)
- Add Extension types (#25322)
- Add
having to group_by context (#23550)
- Allow elementwise
Expr.over in aggregation context (#25402)
- Add SQL support for
ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
- Automatically Parquet dictionary encode floats (#25387)
- Add
empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
- Allow
hash for all List dtypes (#25372)
- Support
unique_counts for all datatypes (#25379)
- Add
maintain_order to Expr.mode (#25377)
- Display function of streaming physical plan
map node (#25368)
- Allow
slice on scalar in aggregation context (#25358)
- Allow
implode and aggregation in aggregation context (#25357)
- Add
empty_as_null and keep_nulls flags to Expr.explode (#25289)
- Add
ignore_nulls to first / last (#25105)
- Move GraphMetrics into StreamingQuery (#25310)
- Allow
Expr.unique on List/Array with non-numeric types (#25285)
- Allow
Expr.rolling in aggregation contexts (#25258)
- Support additional forms of SQL
CREATE TABLE statements (#25191)
- Add
LazyFrame.pivot (#25016)
- Support column-positional SQL
UNION operations (#25183)
- Allow arbitrary expressions as the
Expr.rolling index_column (#25117)
- Allow arbitrary Expressions in "subset" parameter of
unique frame method (#25099)
- Support arbitrary expressions in SQL
JOIN constraints (#25132)
π Bug fixes
- Do not overwrite used names in cluster_with_columns pushdown (#26467)
- Do not mark output of concat_str on multiple inputs as sorted (#26468)
- Fix CSV schema inference content line duplication bug (#26452)
- Fix InvalidOperationError using
scan_delta with filter (#26448)
- Alias giving missing column after streaming GroupBy CSE (#26447)
- Ensure
by_name selector selects only names (#26437)
- Restore compatibility of strings written to parquet with pyarrow filter (#26436)
- Update schema in cluster_with_columns optimization (#26430)
- Fix negative slice in groups slicing (#26442)
- Don't run CPU check on aarch64 musl (#26439)
- Remove the
POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
- Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
- Support very large integers in env var limits (#26399)
- Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
- Fix Float dtype for spearman correlation (#26392)
- Fix optimizer panic in right joins with type coercion (#26365)
- Don't serialize retry config from local environment vars (#26289)
- Fix
PartitionBy with scalar key expressions and diff() (#26370)
- Add {Float16, Float32} -> Float32 lossless upcast (#26373)
- Fix panic using
with_columns and collect_all (#26366)
- Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
- Ensure slice advancement when skipping non-inlinable values in
is_in with inlinable needles (#26361)
- Pin
xlsx2csv version temporarily (#26352)
- Bugs in ViewArray total_bytes_len (#26328)
- Overflow in i128::abs in Decimal fits check (#26341)
- Make Expr.hash on Categorical mapping-independent (#26340)
- Clone shared GroupBy node before mutation in physical plan creation (#26327)
- Fix lazy evaluation of replace_strict by making it fallible (#26267)
- Consider the "current location" of an item when computing
rolling_rank_by (#26287)
- Reset
is_count_star flag between queries in collect_all (#26256)
- Fix incorrect is_between filter on scan_parquet (#26284)
- Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
- Avoid overflow in
pl.duration scalar arguments case (#26213)
- Broadcast arr.get on single array with multiple indices (#26219)
- Fix panic on CSPE with sorts (#26231)
- Fix UB in
DataFrame::transpose_from_dtype (#26203)
- Eager
DataFrame.slice with negative offset and length=None (#26215)
- Use correct schema side for streaming merge join lowering (#26218)
- Implement expression keys for merge-join (#26202)
- Overflow panic in
scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
- Respect
allow_object flag after cache (#26196)
- Raise error on non-elementwise PartitionBy keys (#26194)
- Allow ordered categorical dictionary in scan_parquet (#26180)
- Allow excess bytes on IPC bitmap compressed length (#26176)
- Address buggy quadratic scaling fix in scan_csv (#26175)
- Address a macOS-specific compile issue (#26172)
- Fix deadlock on
hash_rows() of 0-width DataFrame (#26154)
- Fix NameError filtering pyarrow dataset (#26166)
- Fix concat_arr panic when using categoricals/enums (#26146)
- Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
- Incorrect group_by min/max fast path (#26139)
- Remove a source of non-determinism from lowering (#26137)
- Error when
with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
- Panics on shift with head (#26099)
- Optimize slicing support on compressed IPC (#26071)
- CPU check for musl builds (#26076)
- Fix slicing on compressed IPC (#26066)
- Release GIL on collect_batches (#26033)
- Missing buffer update in String is_in Parquet pushdown (#26019)
- Make
struct.with_fields data model coherent (#25610)
- Incorrect output order for order sensitive operations after join_asof (#25990)
- Use SeriesExport for pyo3-polars FFI (#26000)
- Don't write Parquet min/max statistics for i128 (#25986)
- Ensure chunk consistency in in-memory join (#25979)
- Fix varying block metadata length in IPC reader (#25975)
- Implement collect_batches properly in Rust (#25918)
- Fix panic on arithmetic with bools in list (#25898)
- Convert to index type with strict cast in some places (#25912)
- Empty dataframe in streaming non-strict hconcat (#25903)
- Infer large u64 in json as i128 (#25904)
- Set http client timeouts to 10 minutes (#25902)
- Prevent panic when comparing
Date with Duration types (#25856)
- Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
- Raise error on duplicate
group_by names in upsample() (#25811)
- Correctly export view buffer sizes nested in Extension types (#25853)
- Fix
DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
- Ensure Kahan sum does not introduce NaN from infinities (#25850)
- Trim excess bytes in parquet decode (#25829)
- Reshape checks size to match exactly (#25571)
- Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
- Fix quantile
midpoint interpolation (#25824)
- Don't use cast when converting from physical in list.get (#25831)
- Invalid null count on int -> categorical cast (#25816)
- Update groups in
list.eval (#25826)
- Use downcast before FFI conversion in PythonScan (#25815)
- Double-counting of row metrics (#25810)
- Cast nulls to expected type in streaming union node (#25802)
- Incorrect slice pushdown into map_groups (#25809)
- Fix panic writing parquet with single bool column (#25807)
- Fix upsample with
group_by incorrectly introduced NULLs on group key columns (#25794)
- Panic in top_k pruning (#25798)
- Fix documentation for new() (#25791)
- Fix incorrect
collect_schema for unpivot followed by join (#25782)
- Fix documentation for
tail() (#25784)
- Verify
arr namespace is called from array column (#25650)
- Ensure
LazyFrame.serialize() unchanged after collect_schema() (#25780)
- Function map_(rows|elements) with return_dtype = pl.Object (#25753)
- Avoid visiting nodes multiple times in PhysicalPlanVisualizationDataGenerator (#25737)
- Fix incorrect cargo sub-feature (#25738)
- Fix deadlock on empty scan IR (#25716)
- Don't invalidate node in cluster-with-columns (#25714)
- Move
boto3 extra from s3fs in dev requirements (#25667)
- Binary slice methods missing from Series and docs (#25683)
- Mix-up of variable_name/value_name in unpivot (#25685)
- Invalid usage of
drop_first in to_dummies when nulls present (#25435)
- Rechunk on nested dtypes in
take_unchecked_impl parallel path (#25662)
- New single file IO sink pipeline enabled for sink_parquet (#25670)
- Fix streaming
SchemaMismatch panic on list.drop_nulls (#25661)
- Correct overly eager local predicate insertion for unpivot (#25644)
- Fix "dtype is unknown" panic in cross joins with literals (#25658)
- Fix panic on Boolean
rolling_sum calculation for list or array eval (#25660)
- Preserve List inner dtype during chunked take operations (#25634)
- Fix
panic edge-case when scanning hive partitioned data (#25656)
- Fix lifetime for
AmortSeries lazy group iterator (#25620)
- Improve SQL
GROUP BY and ORDER BY expression resolution, handling aliasing edge-cases (#25637)
- Fix empty format handling (#25638)
- Prevent false positives in is_in for large integers (#25608)
- Optimize projection pushdown through HConcat (#25371)
- Differentiate between empty list an no list for unpivot (#25597)
- Properly resolve
HAVING clause during SQL GROUP BY operations (#25615)
- Fix spearman panicking on nulls (#25619)
- Increase precision when constructing
float Series (#25323)
- Make sum on strings error in group_by context (#25456)
- Hang in multi-chunk DataFrame .rows() (#25582)
- Bug in boolean unique_counts (#25587)
- Set
Float16 parquet schema type to Float16 (#25578)
- Correct arr_to_any_value for object arrays (#25581)
- Have
PySeries::new_f16 receive pf16s instead of f32s (#25579)
- Fix occurence of exact matches of
.join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
- Raise error on out-of-range dates in temporal operations (#25471)
- Fix incorrect
.list.eval after slicing operations (#25540)
- Reduce HuggingFace API calls (#25521)
- Strict conversion AnyValue to Struct (#25536)
- Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Always respect return_dtype in map_elements and map_rows (#25504)
- Rolling
mean/median for temporals (#25512)
- Add
.rolling_rank() support for temporal types and pl.Boolean (#25509)
- Fix dictionary replacement error in
write_ipc() (#25497)
- Fix group lengths check in
sort_by with AggregatedScalar (#25503)
- Fix expr slice pushdown causing shape error on literals (#25485)
- Allow empty list in
sort_by in list.eval context (#25481)
- Prevent panic when joining sorted LazyFrame with itself (#25453)
- Apply CSV dict overrides by name only (#25436)
- Incorrect result in aggregated
first/last with ignore_nulls (#25414)
- Fix off-by-one bug in
ColumnPredicates generation for inequalities operating on integer columns (#25412)
- Fix
arr.{eval,agg} in aggregation context (#25390)
- Support
AggregatedList in list.{eval,agg} context (#25385)
- Improve SQL
UNNEST behaviour (#22546)
- Remove
ClosableFile (#25330)
- Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Resolve edge-case with SQL aggregates that have the same name as one of the
GROUP BY keys (#25362)
- Revert
pl.format behavior with nulls (#25370)
- Remove
Expr casts in pl.lit invocations (#25373)
- Nested dtypes in streaming
first_non_null/last_non_null (#25375)
- Correct
eq_missing for struct with nulls (#25363)
- Unique on literal in aggregation context (#25359)
- Allow
implode and aggregation in aggregation context (#25357)
- Aggregation with
drop_nulls on literal (#25356)
- Address multiple issues with SQL
OVER clause behaviour for window functions (#25249)
- Schema mismatch with
list.agg, unique and scalar (#25348)
- Correct
drop_items for scalar input (#25351)
- SQL
NATURAL joins should coalesce the key columns (#25353)
- Mark
{forward,backward}_fill as length_preserving (#25352)
- Nested dtypes in streaming
first/last (#25298)
- AnyValue::to_physical for categoricals (#25341)
- Fix link errors reported by
markdown-link-check (#25314)
- Parquet
is_in for mixed validity pages (#25313)
- Fix length preserving check for
eval expressions in streaming engine (#25294)
- Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix building polars-mem-engine with the async feature (#25300)
- Don't quietly allow unsupported SQL
SELECT clauses (#25282)
- Fix small bug with
PyExpr to PyObject conversion (#25265)
- Reverse on chunked
struct (#25281)
- Panic exception when calling
Expr.rolling in .over (#25283)
- Correct
{first,last}_non_null if there are empty chunks (#25279)
- Incorrect results for aggregated
{n_,}unique on bools (#25275)
- Fix building polars-expr without timezones feature (#25254)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of() (#24369)
- Correctly prune projected columns in hints (#25250)
- Allow
Null dtype values in scatter (#25245)
- Correct handle requested stops in streaming shift (#25239)
- Make
str.json_decode output deterministic with lists (#25240)
- Wide-table join performance regression (#25222)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Fix serialization of lazyframes containing huge tables (#25190)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Fix assertion panic on
group_by (#25179)
- Fix
format_str in case of multiple chunks (#25162)
- Fix incorrect
drop_nans() result when used in group_by() / over() (#25146)
π Documentation
- Fix typo in max_by docstring (#26404)
- Remove deprecated
cublet_id (#26260)
- Update for new release (#26255)
- Update MCP server section with new URL (#26241)
- Fix unmatched paren and punctuation in pandas migration guide (#26251)
- Add observatory database_path to docs (#26201)
- Note plugins in Python user-defined functions (#26138)
- Clarify min_by/max_by behavior on ties (#26077)
- Add
QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
- Update mixed-offset datetime parsing example in user guide (#25915)
- Update bare-metal docs for mounted anonymous results (#25801)
- Fix credential parameter name in cloud-storage.py (#25788)
- Configuration options update (#25756)
- Fix typos in Excel and Pandas migration guides (#25709)
- Add "right" to
how options in join() docstrings (#25678)
- Document schema parameter in meta methods (#25543)
- Correct link to
datetime_range instead of date_range in resampling page (#25532)
- Explain aggregation & sorting of lists (#25260)
- Update
LazyFrame.collect_schema() docstring (#25508)
- Remove lzo from parquet write options (#25522)
- Update on-premise documentation (#25489)
- Fix incorrect 'bitwise' in
any_horizontal/all_horizontal docstring (#25469)
- Add Extension and BaseExtension to doc index (#25444)
- Add polars-on-premise documentation (#25431)
- Fix link errors reported by
markdown-link-check (#25314)
- Fix LanceDB URL (#25198)
π¦ Build system
- Address remaining Python 3.14 issues with
make requirements-all (#26195)
- Address a macOS-specific compile issue (#26172)
- Fix
make fmt and make lint commands (#25200)
π οΈ Other improvements
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @Atarust, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @TNieuwdorp, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @bayoumi17m, @borchero, @c-peters, @cBournhonesque, @camriddell, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @davidia, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @etiennebacher, @feliblo, @gab23r, @guilhem-dvr, @hallmason17, @hamdanal, @henryharbeck, @hutch3232, @ion-elgreco, @itamarst, @jamesfricker, @jannickj, @jetuk, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @pomo-mondreganto, @qxzcode, @r-brink, @ritchie46, @sachinn854, @stijnherfst, @sweb, @tlauli, @vyasr, @wtn, @yonikremer and dependabot[bot]