Skip to main content

Module 8: Guardrails

Guardrails define what the agent is allowed to do, and more importantly, what it's not.

Why You Need Them​

An AI agent with tool access can take real actions: send emails, update databases, approve workflows. Without guardrails, a prompt injection attack, an unexpected input, or a model reasoning error could lead to actions your business never intended.

Guardrails aren't about limiting the AI's intelligence. They're about defining the playing field. The agent can be as smart as it wants inside the boundaries.

Types of Guardrails​

Input Guardrails​

Filter what the agent receives before it starts processing.

  • Content filtering - Block harmful, offensive, or irrelevant inputs
  • PII detection - Identify and redact personally identifiable information before processing
  • Prompt injection detection - Catch attempts to override the agent's instructions
  • Input validation - Reject malformed or suspicious inputs

Output Guardrails​

Filter what the agent produces before it reaches the user or takes action.

  • Denied topics - Prevent the agent from discussing topics outside its scope
  • Fact checking - Verify claims against known data (grounding)
  • Format validation - Ensure output matches expected schema
  • Toxicity detection - Filter harmful or inappropriate language

Action Guardrails​

Control what the agent is allowed to do in the real world.

  • Approval thresholds - Contracts above $X require human approval
  • Read-only mode - Agent can analyze but not modify data
  • Rate limiting - Maximum number of actions per time period
  • Scope restrictions - Agent can only access data for its assigned client/project

In Our Contract Workflow​

GuardrailTypeRule
No contract approval over $500KActionRoute to human if value exceeds threshold
No document modificationActionAgent has read-only access to original contract
Client data isolationActionAgent cannot access other clients' contracts
Policy minimum enforcementOutputCannot recommend accepting below-minimum terms
PII redaction in reportsOutputSocial security numbers, bank accounts redacted
Denied topicsOutputAgent cannot provide legal advice, only analysis

AWS Bedrock Guardrails​

Bedrock provides built-in guardrail configuration:

  • Content filters - Adjustable thresholds for hate, insult, sexual, violence, misconduct
  • Denied topics - Define topics the model should refuse to discuss
  • Word filters - Block specific words or phrases in output
  • PII filters - Automatically detect and redact sensitive data
  • Contextual grounding - Check responses against provided context for accuracy

These run at the API level, meaning they apply regardless of what prompt the agent receives. Even a successful prompt injection can't bypass Bedrock guardrails.

Layered Approach​

No single guardrail catches everything. Use layers:

  1. Bedrock Guardrails - Platform-level content and PII filtering
  2. Application guardrails - Business rules enforced in your orchestrator code
  3. Output validation - Schema validation and grounding checks on every response
  4. Human review - Final check for high-stakes decisions (Module 9)

What's Next​

Some decisions need a human in the loop even after all guardrails pass. In Module 9: Human-in-the-Loop, we cover how to build approval workflows that pause the agent for human judgment.

Premium

Guardrails Lab

Configure Bedrock Guardrails with content filters, denied topics, PII detection, and build custom application-level guardrails for contract processing.