Module 8: Guardrails
Guardrails define what the agent is allowed to do, and more importantly, what it's not.
Why You Need Them​
An AI agent with tool access can take real actions: send emails, update databases, approve workflows. Without guardrails, a prompt injection attack, an unexpected input, or a model reasoning error could lead to actions your business never intended.
Guardrails aren't about limiting the AI's intelligence. They're about defining the playing field. The agent can be as smart as it wants inside the boundaries.
Types of Guardrails​
Input Guardrails​
Filter what the agent receives before it starts processing.
- Content filtering - Block harmful, offensive, or irrelevant inputs
- PII detection - Identify and redact personally identifiable information before processing
- Prompt injection detection - Catch attempts to override the agent's instructions
- Input validation - Reject malformed or suspicious inputs
Output Guardrails​
Filter what the agent produces before it reaches the user or takes action.
- Denied topics - Prevent the agent from discussing topics outside its scope
- Fact checking - Verify claims against known data (grounding)
- Format validation - Ensure output matches expected schema
- Toxicity detection - Filter harmful or inappropriate language
Action Guardrails​
Control what the agent is allowed to do in the real world.
- Approval thresholds - Contracts above $X require human approval
- Read-only mode - Agent can analyze but not modify data
- Rate limiting - Maximum number of actions per time period
- Scope restrictions - Agent can only access data for its assigned client/project
In Our Contract Workflow​
| Guardrail | Type | Rule |
|---|---|---|
| No contract approval over $500K | Action | Route to human if value exceeds threshold |
| No document modification | Action | Agent has read-only access to original contract |
| Client data isolation | Action | Agent cannot access other clients' contracts |
| Policy minimum enforcement | Output | Cannot recommend accepting below-minimum terms |
| PII redaction in reports | Output | Social security numbers, bank accounts redacted |
| Denied topics | Output | Agent cannot provide legal advice, only analysis |
AWS Bedrock Guardrails​
Bedrock provides built-in guardrail configuration:
- Content filters - Adjustable thresholds for hate, insult, sexual, violence, misconduct
- Denied topics - Define topics the model should refuse to discuss
- Word filters - Block specific words or phrases in output
- PII filters - Automatically detect and redact sensitive data
- Contextual grounding - Check responses against provided context for accuracy
These run at the API level, meaning they apply regardless of what prompt the agent receives. Even a successful prompt injection can't bypass Bedrock guardrails.
Layered Approach​
No single guardrail catches everything. Use layers:
- Bedrock Guardrails - Platform-level content and PII filtering
- Application guardrails - Business rules enforced in your orchestrator code
- Output validation - Schema validation and grounding checks on every response
- Human review - Final check for high-stakes decisions (Module 9)
What's Next​
Some decisions need a human in the loop even after all guardrails pass. In Module 9: Human-in-the-Loop, we cover how to build approval workflows that pause the agent for human judgment.
Guardrails Lab
Configure Bedrock Guardrails with content filters, denied topics, PII detection, and build custom application-level guardrails for contract processing.