Delta Live Tables (DLT) – Slowly Changing Dimensions SCD Type 2

Delta Live Tables (DLT) is a declarative framework in Databricks for building reliable, automated ETL pipelines. It can automatically manage SCD Type 2 logic, tracking complete historical changes for dimension tables.

What SCD Type 2 Means

How DLT Implements SCD Type 2

DLT simplifies SCD2 by using the dlt.apply_changes function, which automatically:

Pipeline Structure

  1. A source table or stream (CDC feed or Silver table)
  2. A target dimension table in DLT
  3. A configuration using dlt.apply_changes with SCD Type 2 options

Example Source Table (Customer Changes)


@dlt.table
def customer_changes():
    return spark.readStream.table("silver.customer_changes")

Example Target Table (Dimension Seed)


@dlt.table
def dim_customer():
    return spark.read.table("dim.dim_customer_seed")

DLT SCD Type 2 Logic


dlt.apply_changes(
    target = "dim_customer",
    source = "customer_changes",
    keys = ["customer_id"],
    sequence_by = col("change_ts"),
    stored_as_scd_type = "2",
    track_history_columns = ["name", "address"]
)

Why Use DLT for SCD Type 2

Summary

  1. DLT reads change data from a source table or stream
  2. A dimension table is defined as a DLT-managed table
  3. dlt.apply_changes uses SCD2 logic to insert new versions and expire old ones
  4. The result is a fully historical, audit-friendly dimension table