Audit Logs & Compliance

Every privileged action in a Databricks workspace — from a successful login, to a cluster start, to a GRANT issued in Unity Catalog — emits an audit event. Those events flow through two complementary surfaces: audit log delivery (JSON files written to your cloud storage bucket) and system tables (Delta tables in the system catalog you can query with SQL). Together they support compliance frameworks (HIPAA, FedRAMP, PCI-DSS) and threat detection in your SIEM.


1. Audit Pipeline

┌──────────────────────────────────────────────────────────────────────────────────┐
│                                      EVENTS                                      │
│     Login  |  Cluster create  |  Notebook run  |  Table read/write  |  Grant     │
└──────────────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                                AUDIT LOG DELIVERY                                │
│            Workspace verbose audit  +  Account-level diagnostic logs             │
│         Delivered to S3 / ADLS / GCS bucket (JSON, partitioned by date)          │
└──────────────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                          SYSTEM TABLES (Unity Catalog)                           │
│      system.access.audit  |  system.access.table_lineage  |  column_lineage      │
└──────────────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                                 SIEM / DETECTION                                 │
│     Splunk  |  Datadog  |  Sentinel  |  Chronicle  |  Custom Lakeview alerts     │
└──────────────────────────────────────────────────────────────────────────────────┘

2. Audit Log Delivery

Audit log delivery is configured at the account level (not the workspace) and writes JSON files to a customer-owned bucket. Two flavors:

Enable verbose audit at the workspace level to capture row-level events for SQL warehouses (every query and its full text):

# enable verbose audit
databricks workspace-conf set-status --json '{"enableVerboseAuditLogs": "true"}'

Path layout in the destination bucket:

s3://<your-bucket>/<prefix>/workspaceId=<wsid>/date=YYYY-MM-DD/auditlogs_<ts>.json

3. System Tables

System tables are Delta tables in the system catalog, automatically populated by Databricks. The most useful for security:

Enable a system schema (one-time, per-account):

databricks account metastores enable-system-schema \
  --metastore-id <mid> --schema access

4. Detection Query Recipes

Failed-login spikes (last 24h)

SELECT user_identity.email AS user,
       source_ip_address,
       COUNT(*) AS failures
FROM   system.access.audit
WHERE  service_name = 'accounts'
  AND  action_name  = 'login'
  AND  response.status_code >= 400
  AND  event_time > current_timestamp() - INTERVAL 24 HOURS
GROUP  BY user_identity.email, source_ip_address
HAVING failures > 5
ORDER  BY failures DESC;

Sensitive-table reads outside business hours

SELECT user_identity.email,
       request_params.full_name_arg AS table_name,
       event_time
FROM   system.access.audit
WHERE  action_name = 'getTable'
  AND  request_params.full_name_arg LIKE 'finance.pii.%'
  AND  (HOUR(event_time) < 6 OR HOUR(event_time) > 22)
ORDER  BY event_time DESC;

Unity Catalog grant changes (last 7 days)

SELECT event_time,
       user_identity.email AS granter,
       request_params
FROM   system.access.audit
WHERE  service_name = 'unityCatalog'
  AND  action_name IN ('updatePermissions','updateSharePermissions')
  AND  event_time > current_timestamp() - INTERVAL 7 DAYS
ORDER  BY event_time DESC;

Personal access token creation

SELECT event_time, user_identity.email, source_ip_address, request_params
FROM   system.access.audit
WHERE  action_name = 'createToken'
ORDER  BY event_time DESC;

5. Compliance Frameworks

Databricks publishes attestations for the major frameworks; the audit + system-table pipeline is the evidence trail.

6. Retention & Lifecycle

-- One-time: stream audit log delivery into a Delta table
CREATE OR REFRESH STREAMING TABLE audit_archive
AS SELECT *
   FROM cloud_files('s3://my-audit-bucket/auditlogs/',
                    'json',
                    map('cloudFiles.schemaLocation','s3://my-audit-bucket/_schema/'));

↑ Back to Top