Databricks
Delta Live (DLT), Managed, External Tables

Key Differences:

Feature	Delta Live Tables (DLT)	Managed Tables	External Tables
Data Management	Managed pipelines with automation for data ingestion, transformation, and output	Fully managed by Databricks	Data stored externally, metadata managed by Databricks
Storage Location	Can use managed or external storage	Databricks File System (DBFS) or default cloud storage	External storage (e.g., S3, Blob, HDFS)
Data Lifecycle	Lifecycle managed by DLT pipelines	Data is deleted when the table is dropped	Data remains after the table is dropped
Use Case	Automated ETL pipelines and real-time data processing	Temporary or internal datasets managed by Databricks	Persistent or shared datasets
Automation & Monitoring	Automated pipeline execution, monitoring, and quality checks	No automation for tasks	No automation for tasks

1. Delta Live Tables (DLT)

Delta Live Tables (DLT) is a framework designed for building and managing ETL pipelines. It automates data processing, handling dependencies, and optimizing workflows in both batch and streaming data pipelines.

Example of Delta Live Tables Pipeline:


import dlt
from pyspark.sql.functions import *

@dlt.table
def clean_data():
    return spark.read("path/to/raw_data").filter(col("age") > 18)

2. Managed Tables

Managed tables are fully controlled by Databricks. Databricks manages both the metadata and data storage. When you drop a managed table, both the metadata and the underlying data files are deleted.

Example of Creating a Managed Table:


CREATE TABLE my_managed_table (
    id INT,
    name STRING
) USING DELTA;

3. External Tables

External tables allow users to store data externally and only manage metadata within Databricks. The actual data remains in external storage, such as AWS S3, Azure Blob, or HDFS.

Example of Creating an External Table:


CREATE TABLE my_external_table (
    id INT,
    name STRING
) USING DELTA LOCATION '/mnt/external_data/';

DatabricksDelta Live (DLT), Managed, External Tables