New
v0.7.2
What's Changed ๐
โจ Features
- feat: Add name and path properties to daft.File @everettVT (#6024)
- feat(mcap): support topic_start_time_resolver and raw-bytes non-seekable reader @Jay-ju (#5886)
- feat: Add configurable token limits to OpenAI text embedder @kyo-tom (#6017)
- feat: Add guess_mime_type scalar expression for MIME type detection from bytes @copilot-swe-agent[bot] (#5883)
- feat: Channel-less intermediate op @colin-ho (#5999)
- feat: support dropping dimensions in /v1/embeddings requests @rchowell (#5988)
- feat: Add Apache Gravitino virtual file system (gvfs://) write support in io module @shaofengshi (#5965)
- feat: async embed image @colin-ho (#5833)
- feat: support delimiter, quota, header options for csv writer @stayrascal (#5794)
- feat(agg): support map_groups with v2 udf @Jay-ju (#5927)
- feat:
index_coloption inexplode@aaron-ang (#5842) - feat: support resumable stream @stayrascal (#5824)
- feat(functions): add shell_op for distributed shell execution @huleilei (#5738)
- feat: Add Apache Gravitino virtual file system (gvfs://) read support in io module @shaofengshi (#5766)
- feat: Support configuring Conda Env for Class UDFs in Flotilla @plotor (#5117)
- feat: image to tensor @aaron-ang (#5847)
- feat: support native writer via tos @stayrascal (#5760)
๐ Bug Fixes
- fix(observability): Clean up progress bar naming @srilman (#6028)
- fix(video): correct keyframe seek timestamp calculation for start_time @huleilei (#6005)
- fix: ".*" not handled correctly in SQL planner @Lucas61000 (#5784)
- fix: Optimize the small files issue of sink lance @caican00 (#5844)
- fix: overriding dimensions for openai embedding models @kevinzwang (#6013)
- fix: support externally-hosted models via OpenAI-compatible API when using embed_text func @caican00 (#5873)
- fix: respect model dtype when overriding embedding dimensions @fenfeng9 (#5899)
- fix: Pass csv option into native writer @colin-ho (#6003)
- fix: Nonzero morsel upper bound @colin-ho (#5989)
- fix: Clean up UDF display name in progress bar and plans @srilman (#5810)
- fix(cast): handle whitespace in string-to-number casting @ykdojo (#5955)
- fix: Incorrect buffer pool calculation strategy when reading CSV @plotor (#5857)
- fix: Optimize the display information of Join nodes in query plan @plotor (#5617)
- fix: Fast failure when dashboard is enabled in Ray Runner @plotor (#5867)
- fix(ai): resolve intermittent meta tensor error in classify_text/classify_image @rohitkulshreshtha (#5977)
- fix(ci): cargo machete error that slipped through ci somehow @universalmind303 (#5975)
- fix: Daft.ai link checker to ignore X @everettVT (#5879)
- fix: Supporting fractional gpu count on class udf @caican00 (#5840)
- fix: remove flaky datasets from read_huggingface tests @everettVT (#5926)
- fix: allows appending nulls to lists, null is compatible with all types @rchowell (#5921)
- fix(test): read all splits in HuggingFace integration tests @ykdojo (#5878)
- fix(iceberg): Correct test setup to ensure delete files are created @huleilei (#5864)
- fix(ray): namespace flotilla actor per job to avoid plan id collisions @huleilei (#5855)
๐ Performance
- perf: dont eval empty recordbatches @universalmind303 (#5968)
โป๏ธ Refactor
- refactor(swordfish): Channel-less blocking sink @colin-ho (#6023)
- refactor(arrow2): rename validity to nulls to align with arrow-rs @universalmind303 (#6027)
- refactor(arrow-rs): remove usages of build_is_equal and replace with โฆ @universalmind303 (#6018)
- refactor(arrow2): arrow based take kernels @universalmind303 (#6022)
- refactor(swordfish): Channel-less streaming sink @colin-ho (#6021)
- refactor(arrow2): remove makegrowable from fsl array @universalmind303 (#6019)
- refactor(flotilla): Swordfish task builder @colin-ho (#5976)
- refactor(arrow-rs): remove makegrowable from concat_agg.rs @universalmind303 (#6020)
- refactor(arrow2): refactor build_probe_table_without_nulls @universalmind303 (#6011)
- refactor(arrow2): rest of comparison.rs @kevinzwang (#6000)
- refactor: abstract TosRetrier to retry all tos operation @stayrascal (#5858)
- refactor(arrow2): functions-utf8 utils @universalmind303 (#5996)
- refactor(arrow2): parquet/read @universalmind303 (#5997)
- refactor(arrow2): remove some arrow2 based from impls @universalmind303 (#5995)
- refactor(arrow2): migrate apply and binaryapply functions @universalmind303 (#5994)
- refactor(arrow2): DaftCompare between two DataArrays @kevinzwang (#5964)
- refactor(arrow-rs): Remove arrow2 from daft-writers @srilman (#5985)
- refactor(arrow2): add new from_iter_values and arange impl @universalmind303 (#5984)
- refactor(arrow-rs): Remove arrow2 from daft-scan and related @srilman (#5974)
- refactor(arrow-rs): Remove arrow2 from sort kernels @desmondcheongzx (#5963)
- refactor(arrow-rs): Upgrade arrow-rs to 57.1.0 @srilman (#5969)
- refactor(arrow2): migrate float.rs to arrow-rs @rohitkulshreshtha (#5953)
- refactor(arrow2): use arrow for file arrays @universalmind303 (#5972)
- refactor(arrow2): count and array/mod @universalmind303 (#5973)
- refactor(arrow-rs): Remove arrow2 from daft-sketch @srilman (#5967)
- refactor(arrow2): migrate repeat.rs to arrow-rs @universalmind303 (#5928)
- refactor(arrow2): migrate left.rs to arrow-rs @universalmind303 (#5930)
- refactor(arrow2): migrate daft-image/src/ops.rs to arrow-rs @universalmind303 (#5940)
- refactor(arrow2): migrate daft-image/series.rs to arrow-rs @universalmind303 (#5939)
- refactor(arrow2): migrate endswith.rs to arrow-rs @universalmind303 (#5933)
- refactor(arrow2): migrate replace.rs to arrow-rs @universalmind303 (#5932)
- refactor(arrow2): migrate find.rs to arrow-rs @universalmind303 (#5929)
- refactor(arrow2): recordbatch ops @kevinzwang (#5962)
- refactor(arrow-rs): Remove arrow2 from WARC reader @srilman (#5948)
- refactor(arrow2): bool_agg @kevinzwang (#5959)
- refactor(arrow2): migrate functions-utf8 repeat, replace, to_datetime @cckellogg (#5961)
- refactor(arrow2): migrate functions-utf8 to_date.rs @cckellogg (#5960)
- refactor(arrow-rs): Remove arrow2 use in jq kernel @colin-ho (#5957)
- refactor(arrow2): migrate functions-utf8 substr.rs @cckellogg (#5958)
- refactor(arrow-rs): remove arrow2 from PartitionedWriter @cckellogg (#5951)
- refactor(arrow2): binary kernels @kevinzwang (#5956)
- refactor(arrow-rs): use arrow-rs for delete map @colin-ho (#5950)
- refactor(arrow2): Remove arrow2 from utf8 array ops @desmondcheongzx (#5954)
- refactor(arrow2): Migrate utf8.right to use arrow-rs instead of arrow2 @huleilei (#5889)
- refactor(arrow-rs): Move IPC conversion from arrow2 to arrow-rs @srilman (#5805)
- refactor(arrow2): migrate streaming_sink/vllm @universalmind303 (#5922)
- refactor(arrow2): add more deprecation markers @universalmind303 (#5917)
- refactor(arrow2): array & series to/from arrow @universalmind303 (#5848)
๐ Documentation
- docs: new section for openai compatible providers @everettVT (#5748)
- docs: add tos config and a specific schema example in write lance @huleilei (#5992)
โ Tests
- test: Parametrize limit offset tests @colin-ho (#5971)
- test: Don't skip all pyarrow 8.0.0 tests @colin-ho (#5944)
- test: strip whitespace from process output for run_process test @kevinzwang (#5942)
๐ง Maintenance
- chore(observability): Refactor progress bar to remove RuntimeStatsSubscriber @srilman (#6030)
- chore: Use tokio channel for swordfish channels @colin-ho (#6035)
- chore: optimizes slow tests in CI/CD @rchowell (#6029)
- chore: Add workflow permissions @desmondcheongzx (#6014)
- chore: Upgrade nextjs @desmondcheongzx (#6015)
- chore: Update dependencies for dependabot alerts @desmondcheongzx (#6002)
- chore: uv lock check @colin-ho (#6010)
- chore: add some additional iterator methods for daft arrays @universalmind303 (#5937)
- chore: add deprecation warnings on other arrow2 arrays @kevinzwang (#5946)
- chore: ignore all markdown files inside .claude @universalmind303 (#5943)
- chore: ignore all prompting agents inside .claude dir @universalmind303 (#5935)
- chore: consolidates arrow-* crates as workspace dependencies @rchowell (#5923)
- chore: bump
mypyandruffin pre-commit @aaron-ang (#5836) - chore: Add basic usage instructions for daft-dashboard in development guide @plotor (#5865)
Full Changelog: https://github.com/Eventual-Inc/Daft/compare/v0.7.1...v0.7.2