Delta Lake 3.3.0
We are excited to announce the release of Delta Lake 3.3.0! This release includes several exciting new features.
Highlights
- [Delta Spark] Support for Identity Column to assign unique values for each record inserted into a table.
- [Delta Spark] Support VACUUM LITE to deliver faster VACUUM for periodically run VACUUM commands.
- [Delta Spark] Support for Row Tracking Backfill to alter an existing table to enable Row Tracking. Row Tracking allows engines such as Spark to track row-level lineage in Delta Lake tables.
- [Delta Spark] Support for enhanced table state validation with version checksums and improved Snapshot initialization performance based on this checksum.
- [Delta UniForm] Support for enabling UniForm Iceberg on existing tables without rewriting the data files using ALTER TABLE.
- [Delta Kernel] Support for reading Delta tables that have Type Widening enabled.
Details by each component.
Delta Spark
Delta Spark 3.3.0 is built on Apache Spark™ 3.5.3. Similarly to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
- Documentation: https://docs.delta.io/3.3.0/index.html
- API documentation: https://docs.delta.io/3.3.0/delta-apidoc.html#delta-spark
- Maven artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
- Python artifacts: https://pypi.org/project/delta-spark/3.3.0/
The key features of this release are:
- Support for Identity Column: Delta Lake identity columns are a type of generated column that automatically assigns unique values to each record inserted into a table. Users do not need to explicitly provide values for these columns during data insertion. They offer a straightforward and efficient mechanism to generate unique keys for table rows, combining ease of use with high performance. See the documentation for more information.
- Support VACUUM LITE to deliver faster VACUUM for periodically run VACUUM commands. When running VACUUM in LITE mode, instead of finding all files in the table directory, VACUUM LITE uses the Delta transaction log to identify and remove files no longer referenced by any table versions within the retention duration.
- Support for Row Tracking Backfill:Row Tracking feature can now be used on existing Delta Lake tables to track row-level lineage in Delta Spark, previously it was only possible for new tables. Users can now use ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = true) syntax to alter an existing table to enable Row Tracking. When enabled, users can identify rows across multiple versions of the table and can access this tracking information using the two metadata fields and . Refer to the on Row Tracking for more information and examples.
Other notable changes include:
- Protocol upgrade/downgrade improvements
- Support dropping table features for columnMapping, vacuumProtocolCheck, and checkConstraints.
- Improve table protocol transitions to simplify the CUJ when altering the table protocol.
- Support protocol version downgrades when the existing table features exist in the lower protocol version.
- Update protocol upgrades behavior such that when enabling a legacy feature via a table property (e.g. setting
delta.enableChangeDataFeed=true) the protocol is upgraded to (1,7) and only the legacy feature is enabled. Previously the minimum protocol version would be selected and all preceding legacy features enabled. - Support enabling a table feature on a table using the Python DeltaTable API with .
Delta Universal Format (UniForm)
- Documentation: https://docs.delta.io/3.3.0/delta-uniform.html
- Maven artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13
You can now enable UniForm Iceberg on existing Delta tables without rewriting data files. You can then seamlessly read the table downstream in Iceberg clients such as Spark and Snowflake. See Enable by altering an existing table.
Other notable changes include:
- Support Timestamp-type partition columns for UniForm Iceberg.
- Support automatically running expireSnapshot on the UniForm Iceberg table to cleanup old manifests whenever OPTIMIZE is run on the Delta table.
- Support retrying for the Delta UniForm Iceberg conversion.
- Support list and map data types for UniForm Hudi.
- Miscellaneous bug fixes
Delta Kernel
- API documentation: https://docs.delta.io/3.3.0/api/java/kernel/index.html
- Maven artifacts: delta-kernel-api, delta-kernel-defaults
The Delta Kernel project is a set of Java and Rust libraries for building Delta connectors that can read and write to Delta tables without the need to understand the Delta protocol details.
This release of Delta Kernel Java contains the following changes:
- Delta Kernel Java and Rust now support reading Delta tables that have Type Widening enabled. The default ParquetHandlers provided by both Delta kernel implementations include support for reading tables that had any of the type changes covered by the feature applied.
- Support cleaning up expired log files as part of checkpointing.
- Support data skipping on timestamp and timestamp_ntz type columns.
- Support writing to tables with the inCommitTimestamp table feature enabled.
- Other notable read-side changes
- Support reading tables with schemas with long type field metadata.
Other projects
Delta Sharing Spark
- Upgrade delta-sharing-client to 1.2.2 from 1.1.1.
- Support typeWidening, variantType, and timestampNtz table features for Delta Format Sharing.
Delta Storage
- Fix an issue where the S3DynamoDBLogStore (used for safe, concurrent multi-cluster writes to S3) would make extraneous GET calls to DynamoDB during Delta VACUUM operations, impacting performance.
Delta Flink
- Support partition columns of Date type in the Delta Sink.
Credits
Abhishek Radhakrishnan, Adam Binford, Alden Lau, Aleksei Shishkin, Alexey Shishkin, Allison Portis, Ami Oka, Amogh Jahagirdar, Andreas Chatzistergiou, Andrew Xue, Anish, Annie Wang, Avril Aysha, Bart Samwel, Burak Yavuz, Carmen Kwan, Charlene Lyu, ChengJi-db, Chirag Singh, Christos Stavrakakis, Cuong Nguyen, Dhruv Arya, Eduard Tudenhoefner, Felipe Pessoto, Fokko Driesprong, Fred Storage Liu, Hao Jiang, Hyukjin Kwon, Jacek Laskowski, Jackie Zhang, Jade Wang, James DeLoye, Jiaheng Tang, Jintao Shen, Johan Lasperas, Juliusz Sompolski, Jun, Jungtaek Lim, Kaiqi Jin, Kam Cheung Ting, Krishnan Paranji Ravi, Lars Kroll, Leon Windheuser, Lin Zhou, Liwen Sun, Lukas Rupprecht, Marko Ilić, Matt Braymer-Hayes, Maxim Gekk, Michael Zhang, Ming DAI, Mingkang Li, Nils Andre, Ole Sasse, Paddy Xu, Prakhar Jain, Qianru Lao, Qiyuan Dong, Rahul Shivu Mahadev, Rajesh Parangi, Rakesh Veeramacheneni, Richard Chen, Richard-code-gig, Robert Dillitz, Robin Moffatt, Ryan Johnson, Sabir Akhadov, Scott Sandre, Sergiu Pocol, Shawn Chang, Shixiong Zhu, Sumeet Varma, Tai Le Manh, Taiga Matsumoto, Tathagata Das, Thang Long Vu, Tom van Bussel, Tulio Cavalcanti, Venki Korukanti, Vishwas Modhera, Wenchen Fan, Yan Zhao, YotillaAntoni, Yumingxuan Guo, Yuya Ebihara, Zhipeng Mao, Zihao Xu, zzl-7