New
v0.26.0
Added
path_filterparameter inpw.io.s3.readandpw.io.minio.readfunctions. It enables post-filtering of object paths using a wildcard pattern (*,?), allowing exclusion of paths that pass the mainpathfilter but do not matchpath_filter.- Input connectors now support backpressure control via
max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates. pw.reducers.count_distinctandpw.reducers.count_distinct_approximateto count the number of distinct elements in a table. Thepw.reducers.count_distinct_approximateallows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using theprecisionparameter.pw.Table.join(and its variants) now has two additional parameters -left_exactly_onceandright_exactly_once. If the elements from a side of a join should be joined exactly once,*_exactly_onceparameter of the side can be set toTrue. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.
Changed
- Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
- Improved initialization speed of
pw.io.s3.readandpw.io.minio.read. pw.io.s3.readandpw.io.minio.readnow limit the number and the total size of objects to be predownloaded.- BREAKING optimized the implementation of
pw.reducers.min,pw.reducers.max,pw.reducers.argmin,pw.reducers.argmax,pw.reducers.anyreducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING optimized the implementation of
pw.reducers.sumreducer onfloatandnp.ndarraycolumns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- Improved precision of
pw.reducers.sumonfloatcolumns by introducing Neumeier summation.