Workspace Isolation & PrivateLink

By default a Databricks workspace exposes its UI and REST API on the public internet, and worker nodes phone home to the Databricks-hosted control plane through a public NAT gateway. For regulated workloads — PCI, HIPAA, FedRAMP — you want every byte to stay on private network paths. This page covers the four levers that get you there: secure cluster connectivity, AWS PrivateLink (front-end and back-end), customer-managed VPC injection, and IP access lists.


1. Traffic Flow Overview

Databricks runs in a split-plane model: the control plane (Web UI, REST API, job scheduler, cluster manager) lives in Databricks' AWS account, while the data plane (your clusters, your data) lives in your AWS account. PrivateLink anchors both directions of that traffic on private VPC endpoints instead of public DNS.

┌────────────────────────────────────────────────────────────────────────────┐
│             DATABRICKS CONTROL PLANE (Databricks AWS Account)              │
│         Web App  |  REST API  |  Job Scheduler  |  Cluster Manager         │
└────────────────────────────────────────────────────────────────────────────┘
                                      ▲                  │
                                      │ Front-End PL     │ Back-End PL
                                      │ (UI / API)       │ (SCC reverse tunnel)
                                      │                  ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    VPC ENDPOINTS (Customer AWS Account)                    │
│       com.amazonaws.vpce.<region>.vpce-svc-XXXX  (front + back-end)        │
└────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                     CUSTOMER-MANAGED VPC  (Data Plane)                     │
│          Private Subnets  |  No public IPs (NPIP)  |  NAT gateway          │
│           Cluster nodes  |  Photon  |  DBR runtime  |  EBS (CMK)           │
└────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                           CUSTOMER DATA SOURCES                            │
│         S3 (Gateway VPCE)  |  Glue  |  Kinesis  |  RDS / Redshift          │
└────────────────────────────────────────────────────────────────────────────┘

2. Secure Cluster Connectivity (NPIP)

Secure Cluster Connectivity (SCC), also called No Public IP (NPIP), removes public IPs from cluster nodes. Instead of the control plane connecting in to clusters, clusters open an outbound TLS tunnel out to the control plane (a reverse SSH tunnel over port 443). The benefits:

SCC is enabled at workspace creation and cannot be turned on retroactively without a workspace migration. New workspaces should always have it on.

3. AWS PrivateLink — Front-End and Back-End

Front-End PrivateLink (Users → Workspace UI / API)

A VPC endpoint in your network for the Databricks workspace REST and Web App service. Users on your corporate network (via Direct Connect or Site-to-Site VPN) reach the workspace through a private endpoint — the workspace URL resolves to a private IP. Combined with Private Access Settings, you can require that the workspace be reachable only via PrivateLink and reject any request that arrives from the public NLB.

Back-End PrivateLink (Data Plane → Control Plane)

Two VPC endpoints in your data-plane VPC: one for the SCC relay (cluster-to-control-plane tunnel), one for the workspace REST API (cluster init, library install, table commit). With back-end PrivateLink in place, the worker nodes never traverse a NAT gateway to reach Databricks — every control-plane RPC stays inside the AWS backbone.

Configuration shape

# 1. Create VPC endpoints (one per service)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc \
  --service-name com.amazonaws.vpce.us-east-1.vpce-svc-xxxx \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-priv1 subnet-priv2 \
  --security-group-ids sg-databricks-pl

# 2. Register the endpoint with the Databricks Account API
databricks account vpc-endpoints create \
  --aws-vpc-endpoint-id vpce-0123 \
  --region us-east-1 \
  --use-case DATAPLANE_RELAY    # or WORKSPACE_ACCESS for front-end

4. Customer-Managed VPC (VPC Injection)

By default Databricks creates and owns the workspace VPC. With VPC injection you supply your own VPC and Databricks runs clusters inside it. This unlocks:

Requirements: at least two private subnets in different AZs, a NAT gateway (or PrivateLink-only egress), and security groups that allow worker-to-worker traffic on all ports.

5. IP Access Lists

Even with front-end PrivateLink, you can layer an IP access list on the workspace REST API as defense in depth. Allow-list your corporate egress CIDRs and your CI runner pool; deny everything else.

databricks ip-access-lists create \
  --label "Corp VPN" \
  --list-type ALLOW \
  --ip-addresses 203.0.113.0/24 198.51.100.42/32

IP ACLs apply to the Web UI and the REST API. They do not apply to the SCC relay (which already requires the workspace's own AWS-signed identity) or to PrivateLink endpoints (which use endpoint policies instead).

6. Hardening Checklist


↑ Back to Top