Data Lineage in Unity Catalog

Overview

Data lineage in Unity Catalog automatically tracks the flow and transformations of data across your organization. It captures relationships between tables, columns, and processes, enabling impact analysis, compliance audits, and debugging.

Types of Lineage

Automatic Lineage Capture

Unity Catalog automatically captures lineage from:

No additional configuration or instrumentation is required — lineage is captured transparently.

Viewing Lineage in the UI

In the Databricks workspace:

  1. Navigate to Catalog (left sidebar)
  2. Expand a catalog and schema, then click on a table name
  3. Click the Lineage tab at the top
  4. The lineage graph shows upstream (sources) and downstream (consumers) tables
  5. Hover over nodes to see table names; click to navigate to that table

Querying Lineage via System Tables

For programmatic access to lineage data, query the system tables in your metastore.

Table-Level Lineage

-- View upstream dependencies (which tables feed into my_table)
SELECT
  upstream_table_id,
  upstream_schema_id,
  upstream_catalog_id,
  downstream_table_id,
  creation_time
FROM system.access.table_lineage
WHERE downstream_table_full_name = 'finance_prod.transactions.orders'
ORDER BY creation_time DESC;

Column-Level Lineage

-- View column-level dependencies
SELECT
  upstream_catalog,
  upstream_schema,
  upstream_table,
  upstream_column,
  downstream_catalog,
  downstream_schema,
  downstream_table,
  downstream_column,
  created_at
FROM system.access.column_lineage
WHERE downstream_table_full_name = 'finance_prod.transactions.order_summary'
ORDER BY created_at DESC;

Find All Downstream Consumers

-- Identify all tables that depend on a specific source table
SELECT DISTINCT downstream_table_full_name
FROM system.access.table_lineage
WHERE upstream_table_full_name = 'finance_prod.transactions.orders';

Use Cases

Impact Analysis

When modifying a table schema or transforming data, query lineage to identify all downstream tables that will be affected. Prioritize testing and communication accordingly.

-- "If I change the orders table, what else breaks?"
WITH RECURSIVE impacted AS (
  SELECT downstream_table_full_name
  FROM system.access.table_lineage
  WHERE upstream_table_full_name = 'finance_prod.transactions.orders'
)
SELECT * FROM impacted;

Compliance and Audit Trails

For regulatory requirements (GDPR, CCPA, SOX), lineage proves how sensitive data flows through the organization. Use lineage to demonstrate that only authorized systems access protected data.

Data Quality Debugging

When downstream reports show unexpected values, trace lineage backward to identify where corrupted or incorrect data originated. Narrow down which transformation step introduced the issue.

Catalog Discovery

New team members can explore related tables by following lineage chains, understanding data transformations and dependencies without manual documentation.

Lineage Limitations and Considerations