Skip to main content

Module 6: RAG (Retrieval Augmented Generation)

An AI model knows what it learned during training. It doesn't know your company's policies, your contract templates, or what happened with this vendor last quarter. RAG bridges that gap.

The Concept​

RAG is a two-step process:

  1. Retrieve relevant information from your data stores based on the current query
  2. Generate a response using that retrieved information as context

The model doesn't need to be retrained on your data. You give it the relevant context at query time, and it reasons over it.

Why Not Just Fine-Tune?​

Fine-tuning bakes knowledge into the model's weights. It's expensive, slow, and static. When your company policy changes, you'd need to fine-tune again.

RAG keeps the knowledge external. Update the policy document in your store, and the next query picks up the new version automatically. No retraining, no redeployment.

Fine-TuningRAG
Knowledge updateRetrain the modelUpdate the document
CostExpensive (GPU hours)Cheap (embedding + storage)
FreshnessStale until retrainedAlways current
TraceabilityCan't point to sourceCan cite exact source
Best forBehavior/style changesKnowledge/fact retrieval

In Our Contract Workflow​

The compliance agent needs to compare contract clauses against your company's approved templates. Those templates live in a document store, not in Claude's training data.

Contract clause: "Vendor liability capped at $50,000"
↓
RAG retrieves your policy:
"Minimum vendor liability cap: $100,000
for Tier 1 vendors, $50,000 for Tier 2"
↓
Agent reasons: "This vendor is Tier 1.
$50,000 cap is below our $100,000 minimum.
Flag for review."

Without RAG, the agent would either hallucinate a policy or give a generic answer. With RAG, it applies your specific standards.

The RAG Pipeline​

StepWhat Happens
EmbedConvert your documents into vector embeddings and store them
QueryConvert the agent's question into an embedding
SearchFind the most similar document chunks by vector distance
FilterRemove irrelevant results, apply access controls
InjectAdd the retrieved chunks into the agent's prompt as context
GenerateThe model reasons over the retrieved context and produces an answer

Chunking Strategy​

How you split your documents matters. Too large and the chunks contain irrelevant information that dilutes the useful parts. Too small and you lose context.

For contracts and policy documents, we've found that splitting by logical section (each clause as a chunk) works better than fixed-size splitting. A 500-token chunk that cuts a clause in half is worse than a 300-token chunk that contains the complete clause.

Common Mistakes​

  • No metadata filtering - Retrieving policy documents from 2019 when 2026 versions exist. Always filter by recency, document type, or relevance.
  • Too many chunks - Stuffing 20 retrieved chunks into the prompt overwhelms the model. 3 to 5 highly relevant chunks beats 20 loosely related ones.
  • No source tracking - If the agent cites a policy, you need to verify it actually came from the retrieved document and not from the model's imagination. This is where grounding (Module 7) comes in.

What's Next​

RAG gives the agent access to your data. But how do you ensure the agent actually uses that data instead of making things up? In Module 7: Grounding, we cover how to connect agent outputs to verified sources.

Premium

RAG Implementation Lab

Build a production RAG pipeline with OpenSearch Serverless, Bedrock embeddings, metadata filtering, and hybrid search for contract analysis.