Snowflake Cortex Search

Cortex Search is Snowflake's managed hybrid-search service: a serverless retrieval engine that auto-embeds and indexes a text column from a Snowflake table, blends vector similarity with lexical (BM25-style) scoring out of the box, and returns ranked rows with metadata. It is the retrieval primitive Snowflake expects you to use under Cortex Agents and inside any RAG pipeline that lives in the warehouse.

The pitch is the same shape as Mosaic AI Vector Search on the Databricks side: instead of standing up a separate vector database, exporting embeddings, and writing a sync job, you point a service at a Snowflake table, declare which column is searchable, and Snowflake handles embedding, indexing, refresh, and serving. The data never leaves the account, and existing row access policies and masking policies flow through to search results.


1. What Cortex Search Is

Cortex Search is a Snowflake-native object — created with CREATE CORTEX SEARCH SERVICE, owned by a role, governed by grants — that wraps a SELECT statement and exposes a hybrid retrieval API over the resulting rows.

2. Creating a Service with SQL DDL

The DDL is one statement. You declare the searchable column (ON), any columns you want returned with each hit (ATTRIBUTES), the warehouse used for refresh, the target lag (how stale results are allowed to get), and the embedding model.


CREATE OR REPLACE CORTEX SEARCH SERVICE support_kb_search
  ON                doc_text
  ATTRIBUTES        doc_id, product_sku, category, updated_at, region
  WAREHOUSE       = SEARCH_REFRESH_WH
  TARGET_LAG      = '1 hour'
  EMBEDDING_MODEL = 'snowflake-arctic-embed-m-v2.0'
  AS (
    SELECT
      doc_id,
      product_sku,
      category,
      region,
      updated_at,
      doc_text
    FROM ANALYTICS.SUPPORT.KB_ARTICLES
    WHERE is_published = TRUE
  );

A few mechanics worth calling out:

3. Querying: SQL and REST

There are two entry points. SQL via SEARCH_PREVIEW (now also exposed as CORTEX.SEARCH in newer regions) is convenient for ad hoc work and joining results to other tables. The REST endpoint is what an application actually calls in production.

SQL


SELECT
  PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'support_kb_search',
      OBJECT_CONSTRUCT(
        'query',   'how do I reset the device to factory settings',
        'columns', ARRAY_CONSTRUCT('doc_id', 'product_sku', 'category', 'doc_text'),
        'limit',   5
      )::VARCHAR
    )
  ) AS results;

The result is a JSON object containing a results array of hits, each with the requested attribute columns plus an internal score. To unnest into rows:


WITH raw AS (
  SELECT PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'support_kb_search',
      OBJECT_CONSTRUCT(
        'query',   'factory reset',
        'columns', ARRAY_CONSTRUCT('doc_id', 'product_sku', 'doc_text'),
        'limit',   10
      )::VARCHAR
    )
  ) AS j
)
SELECT
  hit.value:doc_id::STRING      AS doc_id,
  hit.value:product_sku::STRING AS product_sku,
  hit.value:doc_text::STRING    AS doc_text
FROM raw, LATERAL FLATTEN(input => raw.j:results) AS hit;

REST / Python

The REST endpoint is the production interface. The Snowflake Python connector wraps it; below is the explicit HTTP version so the wire shape is clear.


import os, requests, snowflake.connector

conn = snowflake.connector.connect(
    account="abc12345",
    user="rag_app",
    authenticator="OAUTH",
    token=os.environ["SNOWFLAKE_OAUTH_TOKEN"],
    role="RAG_APP_ROLE",
    warehouse="RAG_QUERY_WH",
)

def search(query: str, k: int = 10, filters: dict | None = None) -> list[dict]:
    payload = {
        "query":   query,
        "columns": ["doc_id", "product_sku", "category", "doc_text"],
        "limit":   k,
    }
    if filters is not None:
        payload["filter"] = filters

    resp = requests.post(
        f"https://{conn.host}/api/v2/databases/ANALYTICS/schemas/SUPPORT"
        f"/cortex-search-services/support_kb_search:query",
        headers={
            "Authorization": f"Bearer {conn.rest.token}",
            "X-Snowflake-Authorization-Token-Type": "OAUTH",
            "Content-Type": "application/json",
        },
        json=payload,
        timeout=15,
    )
    resp.raise_for_status()
    return resp.json()["results"]

hits = search("factory reset stuck on splash screen", k=5)
for h in hits:
    print(f"{h['doc_id']:12s}  {h['product_sku']}  {h['doc_text'][:80]}")

Attribute Filters and Boost Columns

Filters prune candidates during retrieval rather than post-filtering, so recall stays high even when the filter is selective. The filter language is JSON with operators @eq, @contains, @gte, @lte, @and, @or, @not.


hits = search(
    query="warranty replacement policy",
    k=10,
    filters={
        "@and": [
            {"@eq":      {"category": "support"}},
            {"@contains":{"product_sku": "SKU-100"}},
            {"@gte":     {"updated_at": "2025-01-01"}},
        ]
    },
)

Boost columns let you nudge ranking based on a numeric or recency signal — useful for surfacing newer revisions of an article, or articles attached to higher-tier products. They are declared at service-create time and supplied in the query payload.


CREATE OR REPLACE CORTEX SEARCH SERVICE support_kb_search
  ON                doc_text
  ATTRIBUTES        doc_id, product_sku, category, updated_at, region, popularity_score
  WAREHOUSE       = SEARCH_REFRESH_WH
  TARGET_LAG      = '1 hour'
  EMBEDDING_MODEL = 'snowflake-arctic-embed-m-v2.0'
  AS (
    SELECT *, recent_views AS popularity_score
    FROM ANALYTICS.SUPPORT.KB_ARTICLES
    WHERE is_published
  );

hits = search_with_boost(
    query="firmware update fails",
    k=10,
    boosts=[
        {"column": "updated_at",       "weight": 0.3},   # prefer recent
        {"column": "popularity_score", "weight": 0.2},   # prefer often-viewed
    ],
)

4. Embedding Model Choice

The default embedding model is snowflake-arctic-embed-m-v2.0 — Snowflake's in-house dense encoder, multilingual as of the v2 line, trained for retrieval. Other options exposed via EMBEDDING_MODEL include the larger Arctic Embed L variant, the original e5-base-v2, and (in some regions) voyage-multilingual-2. Newer model strings are added periodically; check the docs for the catalog in your region.

The choice is mostly irrelevant for the first version of an application. Ship with Arctic Embed M, measure recall on a held-out evaluation set, and switch only when you can show a concrete win.

5. Refresh, Change Data Capture, Freshness Lag

A Cortex Search service is logically a materialized view: the source SELECT is re-evaluated and changes are reflected in the index on a schedule chosen to honor TARGET_LAG. Snowflake uses change tracking on the underlying tables when possible to refresh incrementally rather than re-embedding the full corpus.


SHOW CORTEX SEARCH SERVICES IN SCHEMA ANALYTICS.SUPPORT;

ALTER CORTEX SEARCH SERVICE support_kb_search
  SET TARGET_LAG = '15 minutes';

ALTER CORTEX SEARCH SERVICE support_kb_search REFRESH;

6. As the Retrieval Layer for Cortex Agents

Cortex Agents accept Cortex Search services as registered tools. The agent decides whether to retrieve, what to retrieve, and how many hits to ask for; results come back to the agent's context with citations the LLM is instructed to surface in the final answer. Wiring is one block in the agent definition:


{
  "tools": [
    {
      "tool_spec": {
        "type": "cortex_search",
        "name": "kb",
        "description": "Search the support knowledge base."
      },
      "tool_resources": {
        "name":         "ANALYTICS.SUPPORT.support_kb_search",
        "max_results":  6,
        "id_column":    "doc_id",
        "title_column": "doc_id"
      }
    }
  ]
}

The same service can be reused by multiple agents and by direct REST callers. That reuse is a real advantage over per-application vector stores: one indexed corpus, one refresh job, one set of grants.

7. Cortex Search vs Mosaic Vector Search vs pgvector vs Pinecone

Option Best For Trade-offs
Snowflake Cortex Search Corpus already in Snowflake; want hybrid by default and zero data egress. Locked to Snowflake; embedding model choice constrained to the Cortex catalog (or BYO column).
Mosaic AI Vector Search Corpus in Delta tables under Unity Catalog; want UC lineage. Locked to Databricks; endpoint billed by uptime.
pgvector / pgvectorscale Existing Postgres app; vectors live next to transactional rows. You own index tuning, autovacuum, replicas. Hybrid search requires extra extensions.
Pinecone / Weaviate / Qdrant Cross-warehouse, multi-cloud, very large or shared corpora. Separate auth, separate sync, separate bill; corpus has to leave the warehouse.

The deciding question is where the source-of-truth text lives. If it is a Snowflake table that already has row access policies and masking in place, Cortex Search is the lowest-friction option because there is no second system to govern. If you are running in Databricks, Mosaic AI Vector Search is the equivalent answer. Postgres-resident text under a few million documents is best served by pgvector. Anything that needs to be queried from outside the warehouse perimeter is the case for a SaaS vector DB.

8. Interview Q&A

Q: Why is hybrid search the default in Cortex Search instead of pure vector?

Pure dense vector retrieval is bad at rare proper nouns, error codes, SKUs, and any token whose meaning is encoded in its surface form rather than its semantics. Embeddings smear "E-4471" into a fuzzy neighborhood of similar-looking strings. Lexical scoring catches exact-match cases the embedding misses; vector scoring catches semantic paraphrases the lexical scorer misses. Reciprocal rank fusion combines the two without requiring tuning. In practice hybrid is the right default for almost every enterprise corpus.

Q: A Cortex Search service is returning stale results. What are the likely causes?

Three things to check. First, the refresh warehouse — if it is suspended or undersized, refresh jobs fall behind TARGET_LAG. Second, change tracking on the source tables — if it is off, refresh falls back to full rebuild, which can blow past the lag window for large corpora. Third, the source SELECT itself — a join to a slowly updated dimension can cap the freshness of the entire service. SHOW CORTEX SEARCH SERVICES reports actual lag.

Q: How do you let a Cortex Search service serve different result sets to different users?

Put a row access policy on the underlying source table that filters by the calling user's role. The policy applies at query time when the search runs, so each user sees only the rows they would have been allowed to SELECT directly. There is no need to build per-tenant search services or partition the index by user — one service plus one policy handles multi-tenant isolation correctly.

Q: When would you choose BYO embeddings over the managed Arctic Embed default?

When you need an embedding model the Cortex catalog doesn't ship. Concrete cases: multilingual workloads where a specific multilingual encoder outperforms Arctic Embed; code search where a code-specific encoder like Jina Code or Voyage Code is materially better; image embeddings via CLIP for cross-modal retrieval; or when you are migrating from an existing pipeline and need bit-for-bit parity with the embeddings you already produced. Default to managed; switch on evidence.

Q: How do you tune Cortex Search for a corpus where users frequently search by SKU or part number?

Three levers. First, ensure the SKU is in the embedded text column (or concatenated into it) so lexical scoring can match it directly. Second, expose the SKU as an ATTRIBUTE so applications can apply an exact-match filter when they detect a SKU pattern in the query — much more precise than relying on retrieval alone. Third, add a boost column for product popularity or revision recency so newer SKU revisions outrank older ones when both match.

Q: How does Cortex Search relate to Cortex Agents?

Cortex Search is the retrieval primitive; Cortex Agents is the orchestration framework that decides when to use it. An agent registers one or more search services as tools and one or more Analyst semantic models as tools, then the agent runtime composes the answer — possibly retrieving from search, possibly running SQL through Analyst, possibly both — and returns a response with tool-use traces and citations. You can use Cortex Search standalone for raw retrieval, but agents are how you assemble it into a multi-step assistant.


↑ Back to Top