Knowledge Bases for Amazon Bedrock

Knowledge Bases for Amazon Bedrock is a managed retrieval-augmented generation (RAG) service. You point it at a data source (S3, a website, a SaaS connector), pick an embedding model and a vector store, and Bedrock handles ingestion, chunking, embedding, indexing, retrieval, citation tracking, and grounded generation. The result is a single API — retrieve for raw chunks, retrieve_and_generate for a fully-grounded answer — that replaces a meaningful slice of custom RAG plumbing.


1. Architecture Overview

A Knowledge Base is a thin orchestrator over four pieces:

An ingestion job walks the data source, parses, chunks, embeds, and writes to the vector store. After ingestion, queries hit the vector store and (optionally) the FM for generation.


2. Supported Data Sources


2.1 S3 Metadata Sidecars

Attach metadata to a chunk to enable filtered retrieval (e.g. only "year=2026" docs). Drop a JSON file next to each source file:


{
  "metadataAttributes": {
    "year":       { "value": { "type": "NUMBER", "numberValue": 2026 } },
    "department": { "value": { "type": "STRING", "stringValue": "HR" } },
    "tags":       { "value": { "type": "STRING_LIST", "stringListValue": ["policy", "leave"] } }
  }
}
  

Filename convention: if the source is policies/2026-leave.pdf, the sidecar is policies/2026-leave.pdf.metadata.json.


3. Supported Vector Stores

For each store you must pre-create the collection/database and pass field-mapping hints (vector field, text field, metadata field) when creating the KB.


4. Chunking Strategies


5. Embedding Model Choice

Pick the embedding model up front and treat it as immutable — switching models means reindexing the entire corpus. Smaller dimensions (Titan v2 at 256d) cut storage and latency by ~4x with a small recall penalty; worth measuring on your data.


6. Create a Knowledge Base with boto3


import boto3

agent = boto3.client("bedrock-agent", region_name="us-west-2")

kb = agent.create_knowledge_base(
    name="hr-policies",
    description="Internal HR policy documents (US, EMEA, APAC).",
    roleArn="arn:aws:iam::111111111111:role/BedrockKBRole",
    knowledgeBaseConfiguration={
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
            "embeddingModelArn": "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0",
            "embeddingModelConfiguration": {
                "bedrockEmbeddingModelConfiguration": {
                    "dimensions": 1024,
                    "embeddingDataType": "FLOAT32",
                }
            },
        },
    },
    storageConfiguration={
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration": {
            "collectionArn": "arn:aws:aoss:us-west-2:111111111111:collection/abc123",
            "vectorIndexName": "hr-policies-idx",
            "fieldMapping": {
                "vectorField":   "embedding",
                "textField":     "text",
                "metadataField": "metadata",
            },
        },
    },
)

kb_id = kb["knowledgeBase"]["knowledgeBaseId"]
print("KB:", kb_id)

ds = agent.create_data_source(
    knowledgeBaseId=kb_id,
    name="hr-policies-s3",
    dataSourceConfiguration={
        "type": "S3",
        "s3Configuration": {
            "bucketArn":               "arn:aws:s3:::company-hr-docs",
            "inclusionPrefixes":       ["policies/"],
            "bucketOwnerAccountId":    "111111111111",
        },
    },
    vectorIngestionConfiguration={
        "chunkingConfiguration": {
            "chunkingStrategy": "HIERARCHICAL",
            "hierarchicalChunkingConfiguration": {
                "levelConfigurations": [
                    {"maxTokens": 1500},  # parent
                    {"maxTokens": 300},   # child
                ],
                "overlapTokens": 60,
            },
        },
    },
)
print("DS:", ds["dataSource"]["dataSourceId"])
  


7. Run an Ingestion Job

Ingestion jobs are async. Trigger one whenever the data source changes; Bedrock detects added/modified/deleted files and updates only the affected chunks (incremental sync).


import time

job = agent.start_ingestion_job(knowledgeBaseId=kb_id, dataSourceId=ds_id)
job_id = job["ingestionJob"]["ingestionJobId"]

while True:
    status = agent.get_ingestion_job(
        knowledgeBaseId=kb_id, dataSourceId=ds_id, ingestionJobId=job_id,
    )["ingestionJob"]
    state = status["status"]
    print(state, status.get("statistics", {}))
    if state in ("COMPLETE", "FAILED", "STOPPED"):
        break
    time.sleep(10)
  

The statistics block reports documents scanned, indexed, modified, deleted, and failed — log it to CloudWatch as your ingestion SLO.

Trigger ingestion automatically by wiring an S3 EventBridge rule on Object Created events to a Lambda that calls start_ingestion_job.


8. Retrieve and Retrieve-and-Generate

8.1 retrieve — raw chunks only

Use this when you want to do your own prompting, rerank with a different model, or display raw search results.


runtime = boto3.client("bedrock-agent-runtime", region_name="us-west-2")

resp = runtime.retrieve(
    knowledgeBaseId=kb_id,
    retrievalQuery={"text": "How many weeks of parental leave do EMEA employees get?"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults": 5,
            "overrideSearchType": "HYBRID",  # SEMANTIC | HYBRID
            "filter": {
                "andAll": [
                    {"equals":      {"key": "department", "value": "HR"}},
                    {"greaterThan": {"key": "year",       "value": 2024}},
                ]
            },
        }
    },
)

for r in resp["retrievalResults"]:
    print(round(r["score"], 3), r["location"], r["content"]["text"][:120])
  

8.2 retrieve_and_generate — grounded answer in one call


resp = runtime.retrieve_and_generate(
    input={"text": "How many weeks of parental leave do EMEA employees get?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": kb_id,
            "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-opus-4-7",
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {"numberOfResults": 8, "overrideSearchType": "HYBRID"}
            },
            "generationConfiguration": {
                "inferenceConfig": {"textInferenceConfig": {
                    "temperature": 0.0, "maxTokens": 600,
                }},
                "promptTemplate": {"textPromptTemplate": (
                    "You are an HR assistant. Answer using ONLY the search results below. "
                    "If the answer is not present, say 'I don't have that policy on file.'\n\n"
                    "$search_results$\n\nQuestion: $query$"
                )},
            },
        },
    },
)
print(resp["output"]["text"])
  

8.3 Multi-turn Sessions

Pass sessionId from one call into the next so the KB carries chat context (it rewrites follow-up questions like "what about APAC?" into standalone queries before retrieving).


session_id = resp["sessionId"]
followup = runtime.retrieve_and_generate(
    input={"text": "What about APAC?"},
    sessionId=session_id,
    retrieveAndGenerateConfiguration=resp_config,  # same as above
)
  


9. Citations and Grounding

Every retrieve_and_generate response includes a citations array that maps spans of the generated text to specific retrieved chunks. Surface these in the UI to let users verify the answer.


text = resp["output"]["text"]

for cite in resp.get("citations", []):
    span = cite["generatedResponsePart"]["textResponsePart"]["span"]
    quoted = text[span["start"]:span["end"] + 1]
    print(f"---\nCLAIM: {quoted}")
    for ref in cite["retrievedReferences"]:
        loc = ref["location"]
        kind = loc["type"]
        if kind == "S3":
            print(f"  source: {loc['s3Location']['uri']}")
        elif kind == "WEB":
            print(f"  source: {loc['webLocation']['url']}")
        print(f"  chunk:  {ref['content']['text'][:160]}...")
  

Citations are also the raw material for hallucination guardrails — wire them into a contextual-grounding guardrail (see Bedrock Guardrails) to block answers that drift from the cited context.


10. Advanced Parsing with FM-as-Parser

Default parsing extracts plain text — fine for prose but loses structure in slide decks, tables, and financial PDFs. Enable advanced parsing to use a foundation model to interpret each page as Markdown, preserving tables, headings, and figure captions.


agent.create_data_source(
    knowledgeBaseId=kb_id,
    name="financial-reports-s3",
    dataSourceConfiguration={"type": "S3", "s3Configuration": {
        "bucketArn": "arn:aws:s3:::company-finance-docs",
    }},
    vectorIngestionConfiguration={
        "parsingConfiguration": {
            "parsingStrategy": "BEDROCK_FOUNDATION_MODEL",
            "bedrockFoundationModelConfiguration": {
                "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-opus-4-7",
                "parsingPrompt": {"parsingPromptText": (
                    "Convert each page to Markdown. Preserve tables as GitHub-flavored "
                    "Markdown tables. Render figures as '![figure: ]'."
                )},
            },
        },
        "chunkingConfiguration": {
            "chunkingStrategy": "FIXED_SIZE",
            "fixedSizeChunkingConfiguration": {"maxTokens": 500, "overlapPercentage": 15},
        },
    },
)
  

FM parsing costs more (one model call per page) and slows ingestion materially. Reserve it for documents where layout actually carries meaning — annual reports, scientific papers, regulatory filings.


11. When to Use a KB vs Roll Your Own

Knowledge Bases for Bedrock collapse most of the RAG plumbing into a managed service. The trade-off — as always with managed services — is lower flexibility on the retrieval pipeline. Start with the KB; reach for custom RAG only when an evaluation actually fails because of it.


12. Operational Tips


13. Cost Components


↑ Back to Top