Guardrails for Amazon Bedrock

Guardrails for Amazon Bedrock is a policy layer that screens both the user prompt and the model completion against rules you define — denied topics, content categories, banned words, sensitive information, and contextual grounding for RAG. A guardrail can be attached to any Bedrock model invocation, or invoked standalone via ApplyGuardrail against any text — including completions from non-Bedrock models like OpenAI. The point is to centralize "what the assistant must never do" once, not re-implement it in every prompt.

1. Filter Types

A guardrail bundles five filter families. Each is independently configurable; you can mix and match. All filters apply to both prompts (input) and responses (output) by default — disable per direction when it makes sense.

1.1 Denied Topics

Block entire conversational topics defined in natural language plus example phrases. The classifier is a small dedicated model — far more flexible than a regex.


"topicPolicyConfig": {
    "topicsConfig": [
        {
            "name":       "Investment Advice",
            "definition": "Personalized recommendations on what securities, funds, or "
                          "crypto assets a user should buy, sell, or hold.",
            "examples":   [
                "Should I sell my AAPL shares?",
                "What's a good ETF to buy right now?",
                "Is bitcoin a good long-term hold?",
            ],
            "type": "DENY",
        }
    ]
}

1.2 Content Filters

Six harm categories, each with a strength dial: NONE, LOW, MEDIUM, HIGH. Stronger settings catch more borderline cases at the cost of more false positives.

HATE — content that targets identity attributes.
INSULTS — demeaning or harassing language.
SEXUAL — sexual content.
VIOLENCE — graphic or threatening violence.
MISCONDUCT — instructions enabling crimes or fraud.
PROMPT_ATTACK — jailbreaks, role-play hijacks, prompt injection (input only).


"contentPolicyConfig": {
    "filtersConfig": [
        {"type": "HATE",          "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "INSULTS",       "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
        {"type": "SEXUAL",        "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "VIOLENCE",      "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "MISCONDUCT",    "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "PROMPT_ATTACK", "inputStrength": "HIGH",   "outputStrength": "NONE"},
    ]
}

1.3 Word Filters

Two layers: a managed Profanity list and a custom word list. The custom list is the right place for brand-safety terms (competitor names, internal codenames, product names you don't want hallucinated).


"wordPolicyConfig": {
    "wordsConfig":           [{"text": "ProjectAtlas"}, {"text": "CompetitorCorp"}],
    "managedWordListsConfig": [{"type": "PROFANITY"}],
}

1.4 Sensitive-Info / PII Filters

Detects 30+ entity types (SSN, credit-card, phone, email, names, addresses, IP, plus AWS-specific ones like access keys). Each entity has an action: BLOCK (refuse the request), ANONYMIZE (mask the value), or implicit redaction in the model context.


"sensitiveInformationPolicyConfig": {
    "piiEntitiesConfig": [
        {"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
        {"type": "CREDIT_DEBIT_CARD_NUMBER",  "action": "BLOCK"},
        {"type": "EMAIL",                     "action": "ANONYMIZE"},
        {"type": "PHONE",                     "action": "ANONYMIZE"},
        {"type": "NAME",                      "action": "ANONYMIZE"},
        {"type": "AWS_ACCESS_KEY",            "action": "BLOCK"},
    ],
    "regexesConfig": [{
        "name":        "InternalTicketId",
        "description": "Internal ticket identifiers like TKT-12345.",
        "pattern":     "TKT-\\d{4,8}",
        "action":      "ANONYMIZE",
    }],
}

Use ANONYMIZE when the model still needs the structure of the message but not the literal value (e.g. "send an email to {EMAIL}" still parses). Use BLOCK for things that should never reach the model at all.

1.5 Contextual Grounding (RAG Hallucination Filter)

For RAG flows: after generation, the guardrail scores how well the answer is grounded in the retrieved context and how relevant it is to the user query. Below either threshold, the response is blocked.


"contextualGroundingPolicyConfig": {
    "filtersConfig": [
        {"type": "GROUNDING", "threshold": 0.75},  # answer must be supported by context
        {"type": "RELEVANCE", "threshold": 0.70},  # answer must address the query
    ]
}

To use this, pass the retrieved context to the Converse call as guardContent blocks; the guardrail compares the response against those blocks. Pair this with Knowledge Bases for end-to-end hallucination control.

2. Creating a Guardrail


import boto3

bedrock = boto3.client("bedrock", region_name="us-west-2")

resp = bedrock.create_guardrail(
    name="support-bot-guardrail",
    description="Default guardrail for the customer-support assistant.",
    blockedInputMessaging  ="I can't help with that request.",
    blockedOutputsMessaging="I can't share that information.",
    topicPolicyConfig={
        "topicsConfig": [{
            "name": "Investment Advice",
            "definition": "Personalized recommendations on securities, funds, or crypto.",
            "examples": ["Should I buy AAPL?", "Is BTC a good long hold?"],
            "type": "DENY",
        }]
    },
    contentPolicyConfig={"filtersConfig": [
        {"type": "HATE",          "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "VIOLENCE",      "inputStrength": "HIGH",   "outputStrength": "HIGH"},
        {"type": "PROMPT_ATTACK", "inputStrength": "HIGH",   "outputStrength": "NONE"},
    ]},
    sensitiveInformationPolicyConfig={"piiEntitiesConfig": [
        {"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
        {"type": "CREDIT_DEBIT_CARD_NUMBER",  "action": "BLOCK"},
        {"type": "EMAIL",                     "action": "ANONYMIZE"},
    ]},
    contextualGroundingPolicyConfig={"filtersConfig": [
        {"type": "GROUNDING", "threshold": 0.75},
        {"type": "RELEVANCE", "threshold": 0.70},
    ]},
)

guardrail_id = resp["guardrailId"]
print("Created", guardrail_id, "version", resp["version"])

3. Attaching to Converse / InvokeModel

Pass guardrailConfig to any Bedrock runtime call. The guardrail runs on the input first; if blocked, the model is never called. It runs again on the output before returning.


runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

resp = runtime.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [{"text": "My SSN is 123-45-6789, can you help with my account?"}]}],
    guardrailConfig={
        "guardrailIdentifier": "gr-pii-strict",
        "guardrailVersion":    "3",
        "trace":               "enabled",
    },
)

print("Stop:", resp["stopReason"])  # 'guardrail_intervened' on a block
print(resp["output"]["message"]["content"][0]["text"])

3.1 Tagging Specific Content for Grounding Checks

Wrap the retrieved context in guardContent blocks so the contextual-grounding filter knows what the answer is supposed to be grounded in:


runtime.converse(
    modelId="anthropic.claude-opus-4-7",
    messages=[{"role": "user", "content": [
        {"guardContent": {"text": {
            "text": "Source: 2026 leave policy. EMEA staff receive 18 weeks of paid parental leave.",
            "qualifiers": ["grounding_source"],
        }}},
        {"text": "How many weeks of parental leave do EMEA employees get?"},
    ]}],
    guardrailConfig={
        "guardrailIdentifier": guardrail_id,
        "guardrailVersion":    "DRAFT",
        "trace":               "enabled",
    },
)

4. Standalone Invocation with ApplyGuardrail

ApplyGuardrail runs a guardrail against arbitrary text without invoking a model. Use it to screen completions from non-Bedrock models (OpenAI, Azure, a self-hosted Llama), to validate user-generated content before storage, or to gate any text crossing a trust boundary.


from openai import OpenAI

openai = OpenAI()
runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

# 1. Screen the user prompt against the guardrail
prompt = "Tell me how to bypass my company's expense-policy approvals."

input_check = runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion="3",
    source="INPUT",
    content=[{"text": {"text": prompt}}],
)

if input_check["action"] == "GUARDRAIL_INTERVENED":
    print("Blocked at input:", input_check["assessments"])
    raise SystemExit

# 2. Call OpenAI (or any non-Bedrock model)
gpt = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
).choices[0].message.content

# 3. Screen the model output before returning to the user
output_check = runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion="3",
    source="OUTPUT",
    content=[{"text": {"text": gpt}}],
)

if output_check["action"] == "GUARDRAIL_INTERVENED":
    final = output_check["outputs"][0]["text"]  # the blockedOutputsMessaging string
else:
    final = gpt

print(final)

This is the building block for "BYO model + AWS-native safety": the model runs anywhere; the guardrail runs in your AWS account with a complete CloudTrail audit log of every intervention.

5. Reading the Intervention Trace

With trace: "enabled", the response includes assessments showing exactly which filter triggered and at what confidence. Log these for tuning and incident review.


trace = resp.get("trace", {}).get("guardrail", {})

for kind in ("inputAssessment", "outputAssessments"):
    for asm in (trace.get(kind, {}).values() if kind == "inputAssessment" else trace.get(kind, [])):
        # Topic policy
        for t in asm.get("topicPolicy", {}).get("topics", []):
            print(f"TOPIC {t['name']}: {t['action']}")
        # Content policy
        for f in asm.get("contentPolicy", {}).get("filters", []):
            print(f"CONTENT {f['type']}: {f['action']} confidence={f['confidence']}")
        # PII
        for p in asm.get("sensitiveInformationPolicy", {}).get("piiEntities", []):
            print(f"PII {p['type']}: {p['action']} match='{p['match']}'")
        # Grounding
        for g in asm.get("contextualGroundingPolicy", {}).get("filters", []):
            print(f"GROUNDING {g['type']}: {g['action']} score={g['score']} threshold={g['threshold']}")

6. Versions, Aliases & Drafts

Guardrails are versioned the same way as Lambda: a DRAFT you can mutate, plus immutable numbered versions you can pin in production. Always pin a numeric version in production callers — never reference DRAFT from a live application, or a guardrail-author's edit can ship to prod accidentally.


# Edit the draft
bedrock.update_guardrail(guardrailIdentifier=guardrail_id, name="support-bot-guardrail", ...)

# Cut a new immutable version
v = bedrock.create_guardrail_version(
    guardrailIdentifier=guardrail_id,
    description="Tightened HATE input strength to HIGH after 2026-Q2 incident review.",
)
print("Pin this in callers:", v["version"])

7. Comparison: Azure Content Safety, OpenAI Moderation

Bedrock Guardrails: Topics + content + words + PII + grounding in one config; works as a Bedrock filter or standalone via ApplyGuardrail; CloudTrail and CloudWatch native; pinned via versioned IDs. Ties cleanly into Bedrock IAM policies.
Azure AI Content Safety: Strong, well-tuned content categories (Hate, Sexual, Violence, Self-harm) with severity levels; jailbreak / prompt-injection detection; "Groundedness" detection for RAG. Covers the same surface area as Bedrock content + grounding filters; less flexible "denied topics" — you express those as custom blocklists or as detection on top of categories.
OpenAI Moderation API: Free, single-call classifier across hate, harassment, self-harm, sexual, violence (with /minors sub-categories) and prompt-attack signal. No PII redaction, no grounding check, no denied-topic concept. Best as a cheap first line of defense; not a full guardrail layer.

If you're multi-cloud, the practical pattern is: pick one guardrail surface as the canonical one (whichever cloud hosts most of your inference), then apply it to all model output via the standalone endpoints (ApplyGuardrail, Azure Content Safety REST). That way one team owns "the policy" and every channel enforces the same rules.

8. Operational Tips

Always pin a numeric version in production. Reference guardrailVersion: "3", not "DRAFT". Cut a new version (with a description that explains why) every time you change behavior, the same way you'd cut a release.
Log every intervention. Send the trace's assessments block to CloudWatch Logs Insights or a separate "guardrail events" S3 prefix. False positives are how you tune; false negatives are how you patch policy gaps.
Tune topic examples first, strength dials second. Denied-topic accuracy improves dramatically with 5–8 representative examples. Bumping content-filter strength is a blunter instrument and tends to add false positives across the board.
Test guardrails in CI. Maintain a small fixture set of "must block" and "must pass" prompts and run them against the guardrail before promoting a new version to PROD. Treat policy regressions like code regressions.
Customize the blocked messages. The default "Sorry, I can't help with that" is jarring. Set blockedInputMessaging and blockedOutputsMessaging to something product-appropriate, possibly with a deflection ("...try the support form at /help").
Mind the latency budget. The guardrail adds one extra inference per direction (input then output). For low-latency chat UIs, stream the model output and run the output guardrail on the assembled response — not on every token.

9. Quotas & Limits to Watch

Topics per guardrail: Soft cap (~30); ask AWS for an increase if you need more, or fold related topics into a single broader definition.
Custom words: Up to 10,000 entries per guardrail; the managed Profanity list is separate and doesn't count against this.
Regex patterns: Limited count per guardrail and bounded complexity — keep them simple (no nested quantifiers).
Versions per guardrail: Soft cap; old versions can be deleted but only if no live caller pins them.
Throughput: ApplyGuardrail has its own per-second TPS quota separate from model TPS — track it independently.

↑ Back to Top