Delta Lake

Delta Lake is the third major open table format alongside Apache Hudi and Apache Iceberg. Originated at Databricks in 2019 and donated to the Linux Foundation in 2022, it adds ACID transactions, schema enforcement, time travel, and incremental processing to Parquet-on-object-storage. Delta is the native table format on Databricks and is increasingly first-class on Snowflake, Microsoft Fabric, and AWS.

Key Features:

Architecture:

A Delta table is a directory of Parquet data files plus a _delta_log/ folder of transaction-log entries. Each commit appends a numbered JSON file describing the added and removed Parquet files; periodic Parquet checkpoints summarize the log so readers don’t have to replay every entry. Optimistic concurrency control on the log gives serializable writes without a coordinator.

Use Cases:

For deeper Databricks-specific patterns — managed vs. external tables, Delta Live Tables, and SCD Type 2 — see the Databricks Delta Lake reference.