Databricks Genie and AI/BI Dashboards

Genie is Databricks' text-to-SQL chat experience: a business user asks a question in natural language, Genie converts it to SQL against tables in Unity Catalog, executes the query on a SQL Warehouse, and returns the answer with the generated SQL visible for inspection. AI/BI Dashboards (the GA evolution of Lakeview) are the dashboarding side of the same product family — AI-native dashboards built on the same UC-aware semantic layer, with embedded Genie chat as a first-class citizen.

Together they form Databricks' answer to self-service BI: a curated set of governed tables, a natural-language interface for exploration, and dashboards for the canonical questions.


1. What Genie Is

Genie sits over a curated Genie Space — a scoped collection of Unity Catalog tables and views, plus the metadata that grounds the model: column descriptions, business glossary terms, sample questions, and certified SQL examples. When a user types a question, Genie:

  1. Selects relevant tables from the Space using the metadata.
  2. Generates SQL using a foundation model conditioned on the schema and the curated examples.
  3. Executes the SQL on the Space's SQL Warehouse using the questioner's identity (so UC row/column policies apply).
  4. Returns the result as a table or chart, alongside the generated SQL for the user to inspect, edit, and re-run.

The "show the SQL" property is what makes Genie credible for analysts: the model is not a black box, it's an SQL author whose work you can audit and correct on the spot. Corrections that you mark as certified get fed back into the grounding context for future questions.

2. Creating a Genie Space

The setup happens in the workspace UI under GenieNew Space. The conceptual fields:

Permissions are standard Unity Catalog: a user can use the Space only if they have SELECT on the underlying tables. Row filters and column masks are honored at execution time, so two users asking the same question may legitimately get different results.

3. Grounding: Instructions, Examples, Certified Queries

The single biggest lever on Genie quality is the metadata you give it. Three things matter:

Rich Column Descriptions in Unity Catalog

Genie reads UC column comments. Tables with sparse or generic comments (id, amount, status) produce ambiguous SQL. Invest in describing every column the model will see:


ALTER TABLE main.sales.fct_orders
  ALTER COLUMN order_id     COMMENT 'Surrogate primary key generated at ingest',
  ALTER COLUMN customer_id  COMMENT 'Foreign key to dim_customer.customer_id; natural business key',
  ALTER COLUMN order_ts     COMMENT 'Order placement time in UTC; use this column for any time-based filter',
  ALTER COLUMN net_revenue  COMMENT 'Order total after discounts and refunds, in USD; the canonical revenue column',
  ALTER COLUMN status       COMMENT 'One of: PLACED, SHIPPED, DELIVERED, CANCELLED, RETURNED';

ALTER TABLE main.sales.fct_orders
  SET TBLPROPERTIES (
    'comment' = 'One row per customer order. Use this table for revenue, AOV, and order-volume questions.'
  );

Space-Level Instructions

Use the Instructions field to encode business semantics that aren't visible in the schema:


- The canonical revenue column is `net_revenue` in `main.sales.fct_orders`. Do not use `gross_revenue`.
- "Active customer" means a customer with an order in the last 90 days. Always join to fct_orders and filter on order_ts.
- Treat status='CANCELLED' and status='RETURNED' as excluded from any "completed sale" question unless the user explicitly asks for them.
- All time columns are UTC. When the user says "today" or "this week", use UTC, not local time.
- Currency is always USD. Do not attempt currency conversion.

Certified Queries

A certified query is a verified (question, SQL, optional chart) triple. They are the highest-leverage grounding signal — they teach the model the joins, filters, and idioms that match how this business actually thinks.


-- Certified query: "What is daily revenue for the last 30 days?"
SELECT
  DATE_TRUNC('day', order_ts) AS order_date,
  SUM(net_revenue)            AS daily_revenue
FROM main.sales.fct_orders
WHERE order_ts >= current_timestamp() - INTERVAL 30 DAYS
  AND status NOT IN ('CANCELLED', 'RETURNED')
GROUP BY 1
ORDER BY 1;

-- Certified query: "Top 10 customers by revenue this quarter"
SELECT
  c.customer_id,
  c.customer_name,
  SUM(o.net_revenue) AS revenue
FROM main.sales.fct_orders o
JOIN main.sales.dim_customer c USING (customer_id)
WHERE o.order_ts >= DATE_TRUNC('quarter', current_date())
  AND o.status NOT IN ('CANCELLED', 'RETURNED')
GROUP BY 1, 2
ORDER BY revenue DESC
LIMIT 10;

Five to ten well-chosen certified queries per Space cover a surprising fraction of real-world questions. The pattern: ship the Space, watch what users ask in the first two weeks, and certify the answers to the most common questions.

4. AI/BI Dashboards

AI/BI Dashboards are the dashboard product in the same family as Genie — the rebrand and GA of Lakeview Dashboards. Each dashboard is composed of tiles backed by SQL queries against Unity Catalog tables, using the same SQL Warehouses as Genie. Two design ideas distinguish them from older Databricks SQL dashboards:

A dashboard is itself a UC object — serializable to YAML, manageable through Asset Bundles, and grantable through standard GRANT statements. The same dashboard definition can be promoted dev → staging → prod via a bundle target override.


# Sketch of a dashboard resource in a Databricks Asset Bundle
resources:
  dashboards:
    sales_overview:
      display_name: Sales Overview ${bundle.target}
      file_path: ./dashboards/sales_overview.lvdash.json
      warehouse_id: ${var.warehouse_id}
      parent_path: /Shared/dashboards

5. Practical Tips for Reliable Genie

Genie quality is a function of how disciplined you are about scope and metadata. The patterns that consistently work:

6. Genie vs Code Interpreter vs Cortex Analyst

Three products in adjacent territory, optimized for different deployment contexts:

Product What It Does Where It Wins
Databricks Genie Text-to-SQL over UC-governed tables, executed on a SQL Warehouse with the user's identity. Lakehouse-native shops where the curated tables already live in UC and governance is the priority.
OpenAI Code Interpreter General-purpose Python sandbox — user uploads a CSV/XLSX and the model writes pandas/matplotlib to answer. Ad-hoc analysis on files that are not in any warehouse; one-off exploratory work, no data governance involved.
Snowflake Cortex Analyst Text-to-SQL over Snowflake tables, grounded by a YAML semantic model file. Snowflake-native shops; teams comfortable maintaining a hand-authored semantic model file in git.

Genie and Cortex Analyst are the closest pair: both are text-to-SQL bound to a governed warehouse. The main difference is grounding ergonomics — Genie leans on UC metadata plus a UI for instructions and certified queries, while Cortex Analyst leans on a YAML semantic model. Pick the one that matches the warehouse you're already on; the underlying capability gap is small.

Code Interpreter is solving a different problem: arbitrary-files exploration without governance. It is complementary to Genie, not competitive — you'd reach for Code Interpreter when there is no warehouse table to query in the first place.


↑ Back to Top