Delta Live Tables (DLT) in Databricks

Delta Live Tables (DLT) in Databricks is a framework for building reliable, scalable, and simple data pipelines. It is built on top of Delta Lake and simplifies creating, managing, and monitoring data pipelines.

Key Features of Delta Live Tables

Example Pipeline Workflow


import dlt
from pyspark.sql.functions import *

# Ingest raw data
@dlt.table
def raw_data():
    return spark.read("path/to/raw_data")

# Clean data by filtering invalid rows
@dlt.table
def clean_data():
    return dlt.read("raw_data").filter(col("age") > 18)

# Aggregate the cleaned data
@dlt.table
def aggregated_data():
    return dlt.read("clean_data").groupBy("country").agg(count("*").alias("user_count"))
    

Use Cases for Delta Live Tables