Module 6: RAG (Retrieval Augmented Generation)
An AI model knows what it learned during training. It doesn't know your company's policies, your contract templates, or what happened with this vendor last quarter. RAG bridges that gap.
The Concept​
RAG is a two-step process:
- Retrieve relevant information from your data stores based on the current query
- Generate a response using that retrieved information as context
The model doesn't need to be retrained on your data. You give it the relevant context at query time, and it reasons over it.
Why Not Just Fine-Tune?​
Fine-tuning bakes knowledge into the model's weights. It's expensive, slow, and static. When your company policy changes, you'd need to fine-tune again.
RAG keeps the knowledge external. Update the policy document in your store, and the next query picks up the new version automatically. No retraining, no redeployment.
| Fine-Tuning | RAG | |
|---|---|---|
| Knowledge update | Retrain the model | Update the document |
| Cost | Expensive (GPU hours) | Cheap (embedding + storage) |
| Freshness | Stale until retrained | Always current |
| Traceability | Can't point to source | Can cite exact source |
| Best for | Behavior/style changes | Knowledge/fact retrieval |
In Our Contract Workflow​
The compliance agent needs to compare contract clauses against your company's approved templates. Those templates live in a document store, not in Claude's training data.
Contract clause: "Vendor liability capped at $50,000"
↓
RAG retrieves your policy:
"Minimum vendor liability cap: $100,000
for Tier 1 vendors, $50,000 for Tier 2"
↓
Agent reasons: "This vendor is Tier 1.
$50,000 cap is below our $100,000 minimum.
Flag for review."
Without RAG, the agent would either hallucinate a policy or give a generic answer. With RAG, it applies your specific standards.
The RAG Pipeline​
| Step | What Happens |
|---|---|
| Embed | Convert your documents into vector embeddings and store them |
| Query | Convert the agent's question into an embedding |
| Search | Find the most similar document chunks by vector distance |
| Filter | Remove irrelevant results, apply access controls |
| Inject | Add the retrieved chunks into the agent's prompt as context |
| Generate | The model reasons over the retrieved context and produces an answer |
Chunking Strategy​
How you split your documents matters. Too large and the chunks contain irrelevant information that dilutes the useful parts. Too small and you lose context.
For contracts and policy documents, we've found that splitting by logical section (each clause as a chunk) works better than fixed-size splitting. A 500-token chunk that cuts a clause in half is worse than a 300-token chunk that contains the complete clause.
Common Mistakes​
- No metadata filtering - Retrieving policy documents from 2019 when 2026 versions exist. Always filter by recency, document type, or relevance.
- Too many chunks - Stuffing 20 retrieved chunks into the prompt overwhelms the model. 3 to 5 highly relevant chunks beats 20 loosely related ones.
- No source tracking - If the agent cites a policy, you need to verify it actually came from the retrieved document and not from the model's imagination. This is where grounding (Module 7) comes in.
What's Next​
RAG gives the agent access to your data. But how do you ensure the agent actually uses that data instead of making things up? In Module 7: Grounding, we cover how to connect agent outputs to verified sources.
RAG Implementation Lab
Build a production RAG pipeline with OpenSearch Serverless, Bedrock embeddings, metadata filtering, and hybrid search for contract analysis.