Guardrails for Amazon Bedrock

Guardrails for Amazon Bedrock is a policy layer that screens both the user prompt and the model completion against rules you define — denied topics, content categories, banned words, sensitive information, and contextual grounding for RAG. A guardrail can be attached to any Bedrock model invocation, or invoked standalone via ApplyGuardrail against any text — including completions from non-Bedrock models like OpenAI. The point is to centralize "what the assistant must never do" once, not re-implement it in every prompt.


1. Filter Types

A guardrail bundles five filter families. Each is independently configurable; you can mix and match. All filters apply to both prompts (input) and responses (output) by default — disable per direction when it makes sense.

1.1 Denied Topics

Block entire conversational topics defined in natural language plus example phrases. The classifier is a small dedicated model — far more flexible than a regex.


"topicPolicyConfig": {
    "topicsConfig": [
        {
            "name":       "Investment Advice",
            "definition": "Personalized recommendations on what securities, funds, or "
                          "crypto assets a user should buy, sell, or hold.",
            "examples":   [
                "Should I sell my AAPL shares?",
                "What's a good ETF to buy right now?",
                "Is bitcoin a good long-term hold?",
            ],
            "type": "DENY",
        }
    ]
}
  

1.2 Content Filters

Six harm categories, each with a strength dial: NONE, LOW, MEDIUM, HIGH. Stronger settings catch more borderline cases at the cost of more false positives.


"contentPolicyConfig": {
    "filtersConfig": [
        {"type": "HATE",          "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "INSULTS",       "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
        {"type": "SEXUAL",        "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "VIOLENCE",      "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "MISCONDUCT",    "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "PROMPT_ATTACK", "inputStrength": "HIGH",   "outputStrength": "NONE"},
    ]
}
  

1.3 Word Filters

Two layers: a managed Profanity list and a custom word list. The custom list is the right place for brand-safety terms (competitor names, internal codenames, product names you don't want hallucinated).


"wordPolicyConfig": {
    "wordsConfig":           [{"text": "ProjectAtlas"}, {"text": "CompetitorCorp"}],
    "managedWordListsConfig": [{"type": "PROFANITY"}],
}
  

1.4 Sensitive-Info / PII Filters

Detects 30+ entity types (SSN, credit-card, phone, email, names, addresses, IP, plus AWS-specific ones like access keys). Each entity has an action: BLOCK (refuse the request), ANONYMIZE (mask the value), or implicit redaction in the model context.


"sensitiveInformationPolicyConfig": {
    "piiEntitiesConfig": [
        {"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
        {"type": "CREDIT_DEBIT_CARD_NUMBER",  "action": "BLOCK"},
        {"type": "EMAIL",                     "action": "ANONYMIZE"},
        {"type": "PHONE",                     "action": "ANONYMIZE"},
        {"type": "NAME",                      "action": "ANONYMIZE"},
        {"type": "AWS_ACCESS_KEY",            "action": "BLOCK"},
    ],
    "regexesConfig": [{
        "name":        "InternalTicketId",
        "description": "Internal ticket identifiers like TKT-12345.",
        "pattern":     "TKT-\\d{4,8}",
        "action":      "ANONYMIZE",
    }],
}
  

Use ANONYMIZE when the model still needs the structure of the message but not the literal value (e.g. "send an email to {EMAIL}" still parses). Use BLOCK for things that should never reach the model at all.

1.5 Contextual Grounding (RAG Hallucination Filter)

For RAG flows: after generation, the guardrail scores how well the answer is grounded in the retrieved context and how relevant it is to the user query. Below either threshold, the response is blocked.


"contextualGroundingPolicyConfig": {
    "filtersConfig": [
        {"type": "GROUNDING", "threshold": 0.75},  # answer must be supported by context
        {"type": "RELEVANCE", "threshold": 0.70},  # answer must address the query
    ]
}
  

To use this, pass the retrieved context to the Converse call as guardContent blocks; the guardrail compares the response against those blocks. Pair this with Knowledge Bases for end-to-end hallucination control.


2. Creating a Guardrail


import boto3

bedrock = boto3.client("bedrock", region_name="us-west-2")

resp = bedrock.create_guardrail(
    name="support-bot-guardrail",
    description="Default guardrail for the customer-support assistant.",
    blockedInputMessaging  ="I can't help with that request.",
    blockedOutputsMessaging="I can't share that information.",
    topicPolicyConfig={
        "topicsConfig": [{
            "name": "Investment Advice",
            "definition": "Personalized recommendations on securities, funds, or crypto.",
            "examples": ["Should I buy AAPL?", "Is BTC a good long hold?"],
            "type": "DENY",
        }]
    },
    contentPolicyConfig={"filtersConfig": [
        {"type": "HATE",          "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "VIOLENCE",      "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "PROMPT_ATTACK", "inputStrength": "HIGH",   "outputStrength": "NONE"},
    ]},
    sensitiveInformationPolicyConfig={"piiEntitiesConfig": [
        {"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
        {"type": "CREDIT_DEBIT_CARD_NUMBER",  "action": "BLOCK"},
        {"type": "EMAIL",                     "action": "ANONYMIZE"},
    ]},
    contextualGroundingPolicyConfig={"filtersConfig": [
        {"type": "GROUNDING", "threshold": 0.75},
        {"type": "RELEVANCE", "threshold": 0.70},
    ]},
)

guardrail_id = resp["guardrailId"]
print("Created", guardrail_id, "version", resp["version"])
  


3. Attaching to Converse / InvokeModel

Pass guardrailConfig to any Bedrock runtime call. The guardrail runs on the input first; if blocked, the model is never called. It runs again on the output before returning.


runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = runtime.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "My SSN is 123-45-6789, can you help with my account?"}]}],
    guardrailConfig={
        "guardrailIdentifier": "gr-pii-strict",
        "guardrailVersion":    "3",
        "trace":               "enabled",
    },
)

print("Stop:", resp["stopReason"])  # 'guardrail_intervened' on a block
print(resp["output"]["message"]["content"][0]["text"])
  

3.1 Tagging Specific Content for Grounding Checks

Wrap the retrieved context in guardContent blocks so the contextual-grounding filter knows what the answer is supposed to be grounded in:


runtime.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [
        {"guardContent": {"text": {
            "text": "Source: 2026 leave policy. EMEA staff receive 18 weeks of paid parental leave.",
            "qualifiers": ["grounding_source"],
        }}},
        {"text": "How many weeks of parental leave do EMEA employees get?"},
    ]}],
    guardrailConfig={
        "guardrailIdentifier": guardrail_id,
        "guardrailVersion":    "DRAFT",
        "trace":               "enabled",
    },
)
  


4. Standalone Invocation with ApplyGuardrail

ApplyGuardrail runs a guardrail against arbitrary text without invoking a model. Use it to screen completions from non-Bedrock models (OpenAI, Azure, a self-hosted Llama), to validate user-generated content before storage, or to gate any text crossing a trust boundary.


from openai import OpenAI

openai = OpenAI()
runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

# 1. Screen the user prompt against the guardrail
prompt = "Tell me how to bypass my company's expense-policy approvals."

input_check = runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion="3",
    source="INPUT",
    content=[{"text": {"text": prompt}}],
)

if input_check["action"] == "GUARDRAIL_INTERVENED":
    print("Blocked at input:", input_check["assessments"])
    raise SystemExit

# 2. Call OpenAI (or any non-Bedrock model)
gpt = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
).choices[0].message.content

# 3. Screen the model output before returning to the user
output_check = runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion="3",
    source="OUTPUT",
    content=[{"text": {"text": gpt}}],
)

if output_check["action"] == "GUARDRAIL_INTERVENED":
    final = output_check["outputs"][0]["text"]  # the blockedOutputsMessaging string
else:
    final = gpt

print(final)
  

This is the building block for "BYO model + AWS-native safety": the model runs anywhere; the guardrail runs in your AWS account with a complete CloudTrail audit log of every intervention.


5. Reading the Intervention Trace

With trace: "enabled", the response includes assessments showing exactly which filter triggered and at what confidence. Log these for tuning and incident review.


trace = resp.get("trace", {}).get("guardrail", {})

for kind in ("inputAssessment", "outputAssessments"):
    for asm in (trace.get(kind, {}).values() if kind == "inputAssessment" else trace.get(kind, [])):
        # Topic policy
        for t in asm.get("topicPolicy", {}).get("topics", []):
            print(f"TOPIC {t['name']}: {t['action']}")
        # Content policy
        for f in asm.get("contentPolicy", {}).get("filters", []):
            print(f"CONTENT {f['type']}: {f['action']} confidence={f['confidence']}")
        # PII
        for p in asm.get("sensitiveInformationPolicy", {}).get("piiEntities", []):
            print(f"PII {p['type']}: {p['action']} match='{p['match']}'")
        # Grounding
        for g in asm.get("contextualGroundingPolicy", {}).get("filters", []):
            print(f"GROUNDING {g['type']}: {g['action']} score={g['score']} threshold={g['threshold']}")
  


6. Versions, Aliases & Drafts

Guardrails are versioned the same way as Lambda: a DRAFT you can mutate, plus immutable numbered versions you can pin in production. Always pin a numeric version in production callers — never reference DRAFT from a live application, or a guardrail-author's edit can ship to prod accidentally.


# Edit the draft
bedrock.update_guardrail(guardrailIdentifier=guardrail_id, name="support-bot-guardrail", ...)

# Cut a new immutable version
v = bedrock.create_guardrail_version(
    guardrailIdentifier=guardrail_id,
    description="Tightened HATE input strength to HIGH after 2026-Q2 incident review.",
)
print("Pin this in callers:", v["version"])
  


7. Comparison: Azure Content Safety, OpenAI Moderation

If you're multi-cloud, the practical pattern is: pick one guardrail surface as the canonical one (whichever cloud hosts most of your inference), then apply it to all model output via the standalone endpoints (ApplyGuardrail, Azure Content Safety REST). That way one team owns "the policy" and every channel enforces the same rules.


8. Operational Tips


9. Quotas & Limits to Watch


↑ Back to Top