Amazon Bedrock

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI companies available through a single API. It lets you build and scale generative AI applications without managing infrastructure, choosing between models from Anthropic (Claude), Meta (Llama), Mistral, Cohere, AI21 Labs, Stability AI, and Amazon's own Titan and Nova families.


Key Features:


Common Use Cases:


A Note on Model Availability:

Bedrock does not host OpenAI models (GPT-4, GPT-4o, o1). OpenAI is available only through OpenAI's own API or Azure OpenAI Service. The examples below use models that are actually on Bedrock — Claude (Anthropic), Llama (Meta), Mistral, Cohere Command, and Amazon Titan/Nova — and a final comparison shows the equivalent OpenAI call.


Examples

1. Claude via the Converse API (model-agnostic, recommended)

The Converse API normalizes messages across providers so the same client code works for Claude, Llama, Mistral, Cohere, and Nova — swap the modelId without rewriting the call.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

response = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[
        {"role": "user", "content": [{"text": "Summarize Q3 sales trends in 3 bullets."}]}
    ],
    system=[{"text": "You are a concise financial analyst."}],
    inferenceConfig={"maxTokens": 512, "temperature": 0.2},
)

print(response["output"]["message"]["content"][0]["text"])
  


2. Claude Streaming with ConverseStream

For chat UIs, stream tokens as they're generated instead of waiting for the full response.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

stream = bedrock.converse_stream(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "Explain CAP theorem to a new engineer."}]}],
    inferenceConfig={"maxTokens": 800},
)

for event in stream["stream"]:
    if "contentBlockDelta" in event:
        delta = event["contentBlockDelta"]["delta"]
        if "text" in delta:
            print(delta["text"], end="", flush=True)
    elif "messageStop" in event:
        print()  # final newline
  


3. Claude Tool Use (Function Calling)

Claude can decide to call a tool; your code runs it and feeds the result back. This is the building block for agents.


import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

tools = [{
    "toolSpec": {
        "name": "get_order_status",
        "description": "Look up the shipping status of a customer order by ID.",
        "inputSchema": {"json": {
            "type": "object",
            "properties": {"order_id": {"type": "string"}},
            "required": ["order_id"],
        }},
    }
}]

messages = [{"role": "user", "content": [{"text": "Where is order A-482?"}]}]

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=messages,
    toolConfig={"tools": tools},
)

# If Claude asked to use the tool, run it and return the result
out = resp["output"]["message"]
for block in out["content"]:
    if "toolUse" in block:
        tool_use = block["toolUse"]
        # Pretend this calls your real order system
        tool_result = {"order_id": tool_use["input"]["order_id"], "status": "In transit, ETA Fri"}

        messages.append(out)
        messages.append({"role": "user", "content": [{
            "toolResult": {
                "toolUseId": tool_use["toolUseId"],
                "content": [{"json": tool_result}],
            }
        }]})

        final = bedrock.converse(
            modelId="anthropic.claude-opus-4-7",
            messages=messages,
            toolConfig={"tools": tools},
        )
        print(final["output"]["message"]["content"][0]["text"])
  


4. Claude with Vision (multimodal)

Send an image alongside text — useful for document understanding, chart reading, or visual QA.


import boto3, base64

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

with open("invoice.png", "rb") as f:
    image_bytes = f.read()

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [
        {"image": {"format": "png", "source": {"bytes": image_bytes}}},
        {"text": "Extract vendor, invoice number, and total amount as JSON."},
    ]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0},
)

print(resp["output"]["message"]["content"][0]["text"])
  


5. Meta Llama 3 on Bedrock

Same Converse API, different modelId. Llama is often chosen for cost-sensitive workloads or when you want an open-weights lineage.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="meta.llama3-70b-instruct-v1:0",
    messages=[{"role": "user", "content": [{"text": "Write a SQL query that finds the top 5 customers by revenue in 2025."}]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0.1},
)
print(resp["output"]["message"]["content"][0]["text"])
  


6. Mistral Large on Bedrock


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="mistral.mistral-large-2407-v1:0",
    messages=[{"role": "user", "content": [{"text": "Translate to formal French: 'The meeting has been rescheduled to Thursday.'"}]}],
    inferenceConfig={"maxTokens": 200, "temperature": 0.3},
)
print(resp["output"]["message"]["content"][0]["text"])
  


7. Cohere Command R+ on Bedrock

Cohere Command is optimized for enterprise RAG and tool use.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="cohere.command-r-plus-v1:0",
    messages=[{"role": "user", "content": [{"text": "Draft a polite follow-up email for an unpaid invoice #A-482."}]}],
    inferenceConfig={"maxTokens": 400, "temperature": 0.5},
)
print(resp["output"]["message"]["content"][0]["text"])
  


8. Amazon Titan Embeddings (for semantic search / RAG)

Embeddings models don't use Converse — use invoke_model directly. Store the resulting vectors in OpenSearch, pgvector, or Bedrock Knowledge Bases.


import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

def embed(text: str) -> list[float]:
    resp = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({"inputText": text, "dimensions": 1024, "normalize": True}),
    )
    return json.loads(resp["body"].read())["embedding"]

vec = embed("AWS Bedrock provides foundation models through a single API.")
print(len(vec), vec[:5])  # 1024  [0.0142, -0.0356, ...]
  


9. Amazon Nova Pro (Amazon's in-house multimodal model)


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="amazon.nova-pro-v1:0",
    messages=[{"role": "user", "content": [{"text": "List three cost-optimization ideas for a serverless API on AWS."}]}],
    inferenceConfig={"maxTokens": 500, "temperature": 0.3},
)
print(resp["output"]["message"]["content"][0]["text"])
  


10. Retrieval-Augmented Generation via Bedrock Knowledge Bases

Knowledge Bases handle chunking, embedding, and vector retrieval against your documents in S3. retrieve_and_generate does the retrieval + generation in a single call.


import boto3

agents = boto3.client("bedrock-agent-runtime", region_name="us-west-2")

resp = agents.retrieve_and_generate(
    input={"text": "What is our 2026 parental-leave policy?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KB1234ABCD",
            "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-opus-4-7",
        },
    },
)

print(resp["output"]["text"])
for citation in resp.get("citations", []):
    for ref in citation.get("retrievedReferences", []):
        print(" -", ref["location"])
  


11. Applying a Guardrail to Any Model

Guardrails enforce denied topics, PII redaction, profanity, and prompt-injection filters — configured once and attached to any model invocation.


import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = bedrock.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "My SSN is 123-45-6789, can you help with my account?"}]}],
    guardrailConfig={
        "guardrailIdentifier": "gr-pii-strict",
        "guardrailVersion": "3",
        "trace": "enabled",
    },
)

print(resp["output"]["message"]["content"][0]["text"])
print("Action:", resp.get("stopReason"))  # 'guardrail_intervened' when a rule triggers
  


12. Comparison: The Same Task Against OpenAI (Not on Bedrock)

For reference — if you need GPT-4o or o1, call OpenAI or Azure OpenAI directly. Note the different SDK and message shape.


from openai import OpenAI  # pip install openai

client = OpenAI()  # reads OPENAI_API_KEY from env

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise financial analyst."},
        {"role": "user",   "content": "Summarize Q3 sales trends in 3 bullets."},
    ],
    max_tokens=512,
    temperature=0.2,
)

print(resp.choices[0].message.content)
  

Many teams run a multi-provider stack: Bedrock for Claude/Llama/Mistral/Titan inside the AWS boundary (VPC, IAM, KMS, CloudTrail), and OpenAI/Azure OpenAI for GPT when a specific capability is needed. Libraries like LiteLLM or LangChain abstract the two behind a shared interface.


When to Choose Bedrock vs. SageMaker:

Amazon Bedrock is the primary AWS entry point for generative AI — it collapses model selection, RAG, agents, and safety into a single service so teams can focus on the application rather than the ML platform.