Amazon Bedrock

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI companies available through a single API. It lets you build and scale generative AI applications without managing infrastructure, choosing between models from Anthropic (Claude), Meta (Llama), Mistral, Cohere, AI21 Labs, Stability AI, and Amazon's own Titan and Nova families.

Key Features:

Model Choice via Single API: Invoke any supported foundation model with the same InvokeModel / Converse API, so you can benchmark and swap models without rewriting application code.
Knowledge Bases (RAG): Managed retrieval-augmented generation over your own data — Bedrock handles chunking, embedding, vector storage (OpenSearch Serverless, Aurora, Pinecone, MongoDB Atlas) and retrieval.
Agents for Bedrock: Orchestrates multi-step tasks by planning, invoking tools/APIs, and calling knowledge bases — handles memory, session state, and return-of-control to your own code.
Guardrails: Policy layer that filters prompts and completions for denied topics, PII, profanity, prompt injections, and contextual grounding — applied consistently across any model.
Custom Model Fine-Tuning & Continued Pre-Training: Adapt foundation models to your domain with labeled or unlabeled data; resulting custom models remain private to your account.
Provisioned Throughput: Reserve dedicated model capacity for predictable latency and throughput on production workloads; on-demand is the default for variable traffic.
Private & Secure by Default: All traffic stays within the AWS network, your prompts and completions are not used to train base models, and VPC endpoints (PrivateLink), IAM, and KMS integrate natively.
Model Evaluation: Built-in automated and human-in-the-loop evaluation jobs for accuracy, robustness, and toxicity across your candidate models.

Common Use Cases:

Enterprise Chat Assistants: Ground conversational assistants in private documentation using Knowledge Bases + Guardrails for consistent, policy-safe responses.
Document Intelligence: Summarization, classification, and extraction over contracts, invoices, and medical records — often paired with Amazon Textract.
Code & Content Generation: Generate boilerplate, drafts, marketing copy, and translations with model-specific strengths (e.g., Claude for long-context reasoning, Titan for embeddings).
Agentic Workflows: Tool-using agents that query internal APIs (inventory, CRM, scheduling) and take actions on behalf of users.
Semantic Search & Recommendations: Use Titan or Cohere embeddings to power vector search over product catalogs, support tickets, or research corpora.

A Note on Model Availability:

Bedrock does not host OpenAI models (GPT-4, GPT-4o, o1). OpenAI is available only through OpenAI's own API or Azure OpenAI Service. The examples below use models that are actually on Bedrock — Claude (Anthropic), Llama (Meta), Mistral, Cohere Command, and Amazon Titan/Nova — and a final comparison shows the equivalent OpenAI call.

Examples

1. Claude via the Converse API (model-agnostic, recommended)

The Converse API normalizes messages across providers so the same client code works for Claude, Llama, Mistral, Cohere, and Nova — swap the modelId without rewriting the call.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

response = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[
        {"role": "user", "content": [{"text": "Summarize Q3 sales trends in 3 bullets."}]}
    ],
    system=[{"text": "You are a concise financial analyst."}],
    inferenceConfig={"maxTokens": 512, "temperature": 0.2},
)

print(response["output"]["message"]["content"][0]["text"])

2. Claude Streaming with ConverseStream

For chat UIs, stream tokens as they're generated instead of waiting for the full response.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

stream = bedrock.converse_stream(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "Explain CAP theorem to a new engineer."}]}],
    inferenceConfig={"maxTokens": 800},
)

for event in stream["stream"]:
    if "contentBlockDelta" in event:
        delta = event["contentBlockDelta"]["delta"]
        if "text" in delta:
            print(delta["text"], end="", flush=True)
    elif "messageStop" in event:
        print()  # final newline

3. Claude Tool Use (Function Calling)

Claude can decide to call a tool; your code runs it and feeds the result back. This is the building block for agents.


import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

tools = [{
    "toolSpec": {
        "name": "get_order_status",
        "description": "Look up the shipping status of a customer order by ID.",
        "inputSchema": {"json": {
            "type": "object",
            "properties": {"order_id": {"type": "string"}},
            "required": ["order_id"],
        }},
    }
}]

messages = [{"role": "user", "content": [{"text": "Where is order A-482?"}]}]

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=messages,
    toolConfig={"tools": tools},
)

# If Claude asked to use the tool, run it and return the result
out = resp["output"]["message"]
for block in out["content"]:
    if "toolUse" in block:
        tool_use = block["toolUse"]
        # Pretend this calls your real order system
        tool_result = {"order_id": tool_use["input"]["order_id"], "status": "In transit, ETA Fri"}

        messages.append(out)
        messages.append({"role": "user", "content": [{
            "toolResult": {
                "toolUseId": tool_use["toolUseId"],
                "content": [{"json": tool_result}],
            }
        }]})

        final = bedrock.converse(
            modelId="anthropic.claude-opus-4-7",
            messages=messages,
            toolConfig={"tools": tools},
        )
        print(final["output"]["message"]["content"][0]["text"])

4. Claude with Vision (multimodal)

Send an image alongside text — useful for document understanding, chart reading, or visual QA.


import boto3, base64

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

with open("invoice.png", "rb") as f:
    image_bytes = f.read()

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [
        {"image": {"format": "png", "source": {"bytes": image_bytes}}},
        {"text": "Extract vendor, invoice number, and total amount as JSON."},
    ]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0},
)

print(resp["output"]["message"]["content"][0]["text"])

5. Meta Llama 3 on Bedrock

Same Converse API, different modelId. Llama is often chosen for cost-sensitive workloads or when you want an open-weights lineage.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="meta.llama3-70b-instruct-v1:0",
    messages=[{"role": "user", "content": [{"text": "Write a SQL query that finds the top 5 customers by revenue in 2025."}]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0.1},
)
print(resp["output"]["message"]["content"][0]["text"])

6. Mistral Large on Bedrock


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="mistral.mistral-large-2407-v1:0",
    messages=[{"role": "user", "content": [{"text": "Translate to formal French: 'The meeting has been rescheduled to Thursday.'"}]}],
    inferenceConfig={"maxTokens": 200, "temperature": 0.3},
)
print(resp["output"]["message"]["content"][0]["text"])

7. Cohere Command R+ on Bedrock

Cohere Command is optimized for enterprise RAG and tool use.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="cohere.command-r-plus-v1:0",
    messages=[{"role": "user", "content": [{"text": "Draft a polite follow-up email for an unpaid invoice #A-482."}]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0.5},
)
print(resp["output"]["message"]["content"][0]["text"])

8. Amazon Titan Embeddings (for semantic search / RAG)

Embeddings models don't use Converse — use invoke_model directly. Store the resulting vectors in OpenSearch, pgvector, or Bedrock Knowledge Bases.


import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

def embed(text: str) -> list[float]:
    resp = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({"inputText": text, "dimensions": 1024, "normalize": True}),
    )
    return json.loads(resp["body"].read())["embedding"]

vec = embed("AWS Bedrock provides foundation models through a single API.")
print(len(vec), vec[:5])  # 1024  [0.0142, -0.0356, ...]

9. Amazon Nova Pro (Amazon's in-house multimodal model)


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="amazon.nova-pro-v1:0",
    messages=[{"role": "user", "content": [{"text": "List three cost-optimization ideas for a serverless API on AWS."}]}],
    inferenceConfig={"maxTokens": 500, "temperature": 0.3},
)
print(resp["output"]["message"]["content"][0]["text"])

10. Retrieval-Augmented Generation via Bedrock Knowledge Bases

Knowledge Bases handle chunking, embedding, and vector retrieval against your documents in S3. retrieve_and_generate does the retrieval + generation in a single call.


import boto3

agents = boto3.client("bedrock-agent-runtime", region_name="us-west-2")

resp = agents.retrieve_and_generate(
    input={"text": "What is our 2026 parental-leave policy?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KB1234ABCD",
            "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-opus-4-7",
        },
    },
)

print(resp["output"]["text"])
for citation in resp.get("citations", []):
    for ref in citation.get("retrievedReferences", []):
        print(" -", ref["location"])

11. Applying a Guardrail to Any Model

Guardrails enforce denied topics, PII redaction, profanity, and prompt-injection filters — configured once and attached to any model invocation.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "My SSN is 123-45-6789, can you help with my account?"}]}],
    guardrailConfig={
        "guardrailIdentifier": "gr-pii-strict",
        "guardrailVersion": "3",
        "trace": "enabled",
    },
)

print(resp["output"]["message"]["content"][0]["text"])
print("Action:", resp.get("stopReason"))  # 'guardrail_intervened' when a rule triggers

12. Comparison: The Same Task Against OpenAI (Not on Bedrock)

For reference — if you need GPT-4o or o1, call OpenAI or Azure OpenAI directly. Note the different SDK and message shape.


from openai import OpenAI  # pip install openai

client = OpenAI()  # reads OPENAI_API_KEY from env

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise financial analyst."},
        {"role": "user",   "content": "Summarize Q3 sales trends in 3 bullets."},
    ],
    max_tokens=512,
    temperature=0.2,
)

print(resp.choices[0].message.content)

Many teams run a multi-provider stack: Bedrock for Claude/Llama/Mistral/Titan inside the AWS boundary (VPC, IAM, KMS, CloudTrail), and OpenAI/Azure OpenAI for GPT when a specific capability is needed. Libraries like LiteLLM or LangChain abstract the two behind a shared interface.

When to Choose Bedrock vs. SageMaker:

Choose Bedrock when you want a managed API to consume foundation models, need guardrails/knowledge bases out of the box, and do not want to manage GPUs or model-hosting infrastructure.
Choose SageMaker when you need to train models from scratch, host arbitrary open-source or custom models with full control over instance type and serving stack, or run large-scale distributed training.

Amazon Bedrock is the primary AWS entry point for generative AI — it collapses model selection, RAG, agents, and safety into a single service so teams can focus on the application rather than the ML platform.