Module 8: Guardrails

Guardrails define what the agent is allowed to do, and more importantly, what it's not.

Why You Need Them

An AI agent with tool access can take real actions: send emails, update databases, approve workflows. Without guardrails, a prompt injection attack, an unexpected input, or a model reasoning error could lead to actions your business never intended.

Guardrails aren't about limiting the AI's intelligence. They're about defining the playing field. The agent can be as smart as it wants inside the boundaries.

Types of Guardrails

Input Guardrails

Filter what the agent receives before it starts processing.

Content filtering - Block harmful, offensive, or irrelevant inputs
PII detection - Identify and redact personally identifiable information before processing
Prompt injection detection - Catch attempts to override the agent's instructions
Input validation - Reject malformed or suspicious inputs

Output Guardrails

Filter what the agent produces before it reaches the user or takes action.

Denied topics - Prevent the agent from discussing topics outside its scope
Fact checking - Verify claims against known data (grounding)
Format validation - Ensure output matches expected schema
Toxicity detection - Filter harmful or inappropriate language

Action Guardrails

Control what the agent is allowed to do in the real world.

Approval thresholds - Contracts above $X require human approval
Read-only mode - Agent can analyze but not modify data
Rate limiting - Maximum number of actions per time period
Scope restrictions - Agent can only access data for its assigned client/project

In Our Contract Workflow

Guardrail	Type	Rule
No contract approval over $500K	Action	Route to human if value exceeds threshold
No document modification	Action	Agent has read-only access to original contract
Client data isolation	Action	Agent cannot access other clients' contracts
Policy minimum enforcement	Output	Cannot recommend accepting below-minimum terms
PII redaction in reports	Output	Social security numbers, bank accounts redacted
Denied topics	Output	Agent cannot provide legal advice, only analysis

AWS Bedrock Guardrails

Bedrock provides built-in guardrail configuration:

Content filters - Adjustable thresholds for hate, insult, sexual, violence, misconduct
Denied topics - Define topics the model should refuse to discuss
Word filters - Block specific words or phrases in output
PII filters - Automatically detect and redact sensitive data
Contextual grounding - Check responses against provided context for accuracy

These run at the API level, meaning they apply regardless of what prompt the agent receives. Even a successful prompt injection can't bypass Bedrock guardrails.

Layered Approach

No single guardrail catches everything. Use layers:

Bedrock Guardrails - Platform-level content and PII filtering
Application guardrails - Business rules enforced in your orchestrator code
Output validation - Schema validation and grounding checks on every response
Human review - Final check for high-stakes decisions (Module 9)

What's Next

Some decisions need a human in the loop even after all guardrails pass. In Module 9: Human-in-the-Loop, we cover how to build approval workflows that pause the agent for human judgment.

Premium

Guardrails Lab

Configure Bedrock Guardrails with content filters, denied topics, PII detection, and build custom application-level guardrails for contract processing.

Why You Need Them​

Types of Guardrails​

Input Guardrails​

Output Guardrails​

Action Guardrails​

In Our Contract Workflow​

AWS Bedrock Guardrails​

Layered Approach​

What's Next​