Liquid Clustering in Delta Lake

Liquid Clustering is Delta Lake's modern data layout strategy — the successor to Hive-style partitioning and to Z-Ordering. It addresses two longstanding pain points in Lakehouse engineering: partitioning's brittleness when access patterns change, and Z-Order's expensive, all-or-nothing rewrites. Liquid Clustering is incremental, has no fixed boundaries, and lets you change cluster columns without rebuilding the table.

For most new Delta tables, Liquid Clustering is the right default. For most existing tables that suffer from small-files or partition skew, it is the right migration target.


1. Why Hive-Style Partitioning Fails

Hive-style partitioning physically subdivides a table into directories by the values of one or more columns: /orders/year=2025/month=04/day=22/. This works beautifully when queries always filter on the partition columns and the cardinality is moderate. It collapses in three failure modes that show up at scale:

Once partition cardinality goes above a few thousand distinct values, or once any column with skew is used as a partition, performance degrades. The fix is usually to stop partitioning and let the data layout be managed automatically.

2. Z-Order Limitations

Z-Order was Delta's first-generation answer to "I want fast filters on more than one column without partitioning by all of them." It uses a space-filling curve to colocate rows by multiple columns within each file, then relies on Delta's data-skipping min/max statistics to prune files at query time.

Two structural limitations:

Z-Order is still useful in narrow situations (one-shot historical optimization on a static table) but it is no longer the recommended pattern for evolving tables.

3. What Liquid Clustering Is

Liquid Clustering replaces both partitioning and Z-Order with a single mechanism. The cluster columns are stored in table metadata. Files are grouped into clusters by those columns, and Delta maintains the layout incrementally as data is written or compacted — new files are clustered in place, and only files that need rebalancing are touched. There are no directory boundaries, no skew traps, and no rewrite of the entire table when the layout drifts.

Properties to internalize:

4. Enabling Liquid Clustering

On a New Table

Use CLUSTER BY in CREATE TABLE. Do not also use PARTITIONED BY — the two are mutually exclusive on the same table.


CREATE TABLE main.sales.orders (
  order_id        STRING NOT NULL,
  customer_id     STRING NOT NULL,
  order_ts        TIMESTAMP NOT NULL,
  region          STRING,
  product_sku     STRING,
  amount_usd      DECIMAL(18,2),
  status          STRING
)
CLUSTER BY (customer_id, order_ts);

Picking customer_id first and order_ts second optimizes for the common access pattern "give me everything for customer X in the last 30 days," which is the dominant pattern in transactional analytics.

On an Existing Table

Convert in two steps: declare cluster columns, then run OPTIMIZE to migrate the data layout. The conversion is online — readers and writers continue to operate.


-- Cannot apply CLUSTER BY directly to a partitioned table without first dropping the partition spec.
-- For an unpartitioned table, you can simply alter:
ALTER TABLE main.sales.orders
  CLUSTER BY (customer_id, order_ts);

-- Migrate existing files to the new layout incrementally
OPTIMIZE main.sales.orders;

To change cluster columns later, alter the table again. New writes immediately use the new columns; subsequent OPTIMIZE runs migrate older files.


ALTER TABLE main.sales.orders
  CLUSTER BY (region, order_ts);

OPTIMIZE main.sales.orders;

To stop clustering entirely (rare):


ALTER TABLE main.sales.orders CLUSTER BY NONE;

Python Equivalents


# Create a clustered table from a DataFrame using DataFrameWriterV2
(df.writeTo("main.sales.orders")
   .using("delta")
   .clusterBy("customer_id", "order_ts")
   .create())

# Or via the spark.sql shorthand
spark.sql("""
  CREATE TABLE main.sales.orders_v2 (
    order_id STRING, customer_id STRING, order_ts TIMESTAMP, amount_usd DECIMAL(18,2)
  )
  CLUSTER BY (customer_id, order_ts)
""")

# Run OPTIMIZE
spark.sql("OPTIMIZE main.sales.orders")

5. Choosing Cluster Columns

The rule of thumb: pick the columns that appear in WHERE clauses or join keys for the queries you care about. Within that set, prefer:

Avoid:

6. Running OPTIMIZE and Auto-Clustering

Liquid Clustering is not "set and forget" — the layout improves only when data is rewritten. There are three ways to drive that:

  1. Manual OPTIMIZE. Schedule it nightly or after large batch loads. Cheap, because it only touches files that need rebalancing.
  2. Predictive Optimization. A workspace-level setting (Unity Catalog managed tables) that automatically schedules OPTIMIZE and VACUUM based on observed query patterns and write activity. Recommended.
  3. Auto-clustering during writes. Recent Delta versions cluster newly written files in place so the layout stays close to optimal even without an explicit OPTIMIZE.

-- Enable Predictive Optimization at the schema level (UC managed tables)
ALTER SCHEMA main.sales ENABLE PREDICTIVE OPTIMIZATION;

-- Or schedule via a job
OPTIMIZE main.sales.orders;

A typical before/after on a 2 TB orders table clustered by (customer_id, order_ts):


-- Before clustering: full scan
SELECT SUM(amount_usd)
FROM main.sales.orders
WHERE customer_id = 'C-7841'
  AND order_ts >= current_date() - INTERVAL 30 DAYS;
-- 2 TB scanned, ~45 s

-- After CLUSTER BY (customer_id, order_ts) + OPTIMIZE
-- Same query: 1.4 GB scanned, ~1.2 s

7. Partitioning vs Z-Order vs Liquid

Property Partitioning Z-Order Liquid Clustering
Layout boundaries Hard directory boundaries Within-file sort, no dir boundaries Soft cluster groups, no dir boundaries
Cardinality limit Low (thousands) High High
Skew handling Poor Better Good
Incremental rewrite N/A No (full rewrite per OPTIMIZE) Yes
Change layout columns Full table rewrite Full table rewrite ALTER + incremental migrate
Concurrency File-level conflicts File-level conflicts Row-level concurrency
Coexists with the others Yes (with partitioning) No (mutually exclusive)

8. When Partitioning Still Wins

Liquid Clustering is the new default, but partitioning has not been deleted from the toolbox. Two scenarios where partitioning is still the right choice:

Outside those cases, default to Liquid Clustering on new tables and migrate existing high-cardinality or skewed partitioned tables to it.


Common Interview Questions:

What is Liquid Clustering and how is it different from partitioning?

Liquid Clustering is a Delta feature that physically organizes table data by chosen clustering keys without creating directory-level partitions. Unlike Hive-style partitioning, it adapts to high-cardinality keys without producing tiny files, supports changing the clustering keys after the table is in production, and avoids the small-files problem that kills over-partitioned tables. It is the right default for most new tables on modern Databricks.

Liquid Clustering vs Z-Order — when does each win?

Z-Order is a clustering operation applied periodically by OPTIMIZE ... ZORDER BY on a partitioned table; clustering quality decays between runs and the operation rewrites entire partitions. Liquid Clustering is incremental — new writes are placed near existing clusters automatically — and there is no partition column, so high-cardinality keys are safe. Liquid wins on continuously-loaded tables, mutable tables, and high-cardinality keys; Z-Order is still acceptable on stable date-partitioned append-only tables built before Liquid existed.

When should you still partition a Delta table instead of using Liquid Clustering?

Two cases. First, when you genuinely need partition-level operations: drop a year of data with DROP PARTITION, run a job that processes one date at a time, or apply differential retention policies per partition. Second, very large append-only tables with a single low-cardinality date filter on every query — directory partitioning gives the planner free file-skipping with zero metadata cost. Outside these, Liquid is the better default; partitioning becomes a footgun the moment cardinality exceeds a few hundred values per partition column.

How many clustering keys should you specify and how do you pick them?

Up to four keys, ordered by selectivity in your query workload — the columns that appear most often in WHERE and JOIN predicates. Picking too many dilutes the clustering benefit because the data has to be organized in too many dimensions; picking the wrong ones means OPTIMIZE rewrites the table without speeding up queries. Use the query history in Databricks SQL to find the actual filter columns rather than guessing — it is almost always 1-2 columns plus a date.

What does it cost to change the clustering keys on an existing Liquid table?

The ALTER TABLE ... CLUSTER BY (...) statement itself is metadata-only and instantaneous — no data is rewritten. New writes start clustering on the new keys immediately. Existing files keep their old organization until the next OPTIMIZE, which incrementally re-clusters them. That is the headline win over partitioning: re-partitioning a multi-terabyte table requires a full rewrite and a maintenance window, while changing Liquid keys is a single statement that re-converges over time.

How do you tell whether Liquid Clustering is actually helping a query?

Look at the Spark UI's scan node for the table — specifically the "files read" and "bytes read" vs "files in table" metrics. If a filter on a clustering key reads only a few percent of files, clustering is working; if it reads most files, either the key is wrong or OPTIMIZE has not run recently. Also check the explain plan for "PartitionFilters" / "DataFilters" being pushed down. Run DESCRIBE DETAIL to see the current clustering columns and the last optimization timestamp; if the table has had heavy writes since the last OPTIMIZE, schedule one.


↑ Back to Top