Homomorphic Embedding Search

In a conventional RAG pipeline, the user's query is embedded on the application server and then sent in plaintext (as a vector of floats) to the vector database host, which computes similarity against indexed embeddings. For most workloads this is fine; for the most sensitive matters it is not — the query vector itself can reveal what an attorney is researching, and the vector-DB operator sits in the trust boundary.

Homomorphic encryption (HE) allows the vector database to perform similarity search on encrypted query vectors without ever decrypting them. The host returns encrypted scores; only the client can decrypt. Libraries such as Microsoft SEAL, OpenFHE, and TenSEAL (SEAL with a PyTorch-friendly wrapper) implement the CKKS scheme that supports approximate arithmetic on vectors of real numbers — exactly what cosine similarity needs.



1. When to Use HE Search

If the vector DB is on-prem inside the same TEE that runs inference (see confidential computing), you may not need HE at all — the cheaper defense is to keep plaintext vectors inside the attested boundary.


2. CKKS in One Paragraph

CKKS (Cheon–Kim–Kim–Song) encodes vectors of real numbers into polynomial ciphertexts that support addition, multiplication, and rotation. The encryption is somewhat homomorphic: each multiplication consumes "noise budget", and after a fixed number of operations the ciphertext must be bootstrapped (expensive) or the circuit must be shallow enough to stay within budget. A cosine-similarity dot product is shallow (one multiplication + sum via rotation), so CKKS handles it well.


3. Example: Encrypted Cosine Similarity with TenSEAL

import tenseal as ts
import numpy as np


def make_context() -> ts.Context:
    ctx = ts.context(
        scheme=ts.SCHEME_TYPE.CKKS,
        poly_modulus_degree=8192,
        coeff_mod_bit_sizes=[60, 40, 40, 60],
    )
    ctx.global_scale = 2 ** 40
    ctx.generate_galois_keys()   # enables rotations for summation
    return ctx


def l2_normalize(v: np.ndarray) -> np.ndarray:
    n = np.linalg.norm(v)
    return v / n if n > 0 else v


# --- Client side: encrypt query vector ---
client_ctx = make_context()
query = l2_normalize(np.random.randn(768).astype(np.float64))
enc_query = ts.ckks_vector(client_ctx, query)

# Serialize public-only context for the server.
public_ctx_bytes = client_ctx.serialize(save_secret_key=False)
enc_query_bytes  = enc_query.serialize()


# --- Server side: score against indexed (pre-normalized) document vectors ---
server_ctx = ts.context_from(public_ctx_bytes)
enc_q = ts.ckks_vector_from(server_ctx, enc_query_bytes)

doc_vectors = [l2_normalize(np.random.randn(768)) for _ in range(1000)]
enc_scores = [(i, enc_q.dot(d)) for i, d in enumerate(doc_vectors)]  # plaintext doc, encrypted query

# Return top-K ciphertexts (server cannot rank — it sends encrypted scores back).
enc_score_bytes = [(i, s.serialize()) for i, s in enc_scores]


# --- Client side: decrypt and rank ---
scores = [(i, ts.ckks_vector_from(client_ctx, b).decrypt()[0])
          for i, b in enc_score_bytes]
top_k = sorted(scores, key=lambda x: -x[1])[:10]

Note the asymmetry: document vectors stay plaintext on the server; only the query is encrypted. This is the common configuration — documents are bulk-loaded under a different trust model (often via a secure pipeline into the index), while queries are the high-sensitivity signal. Encrypting both sides is possible but multiplies cost.


4. Performance & Cost


5. Alternatives: PIR, Enclaves, Split Inference


6. Limits and Honest Caveats


↑ Back to Top