Module 6: RAG (Retrieval Augmented Generation)

An AI model knows what it learned during training. It doesn't know your company's policies, your contract templates, or what happened with this vendor last quarter. RAG bridges that gap.

The Concept

RAG is a two-step process:

Retrieve relevant information from your data stores based on the current query
Generate a response using that retrieved information as context

The model doesn't need to be retrained on your data. You give it the relevant context at query time, and it reasons over it.

Why Not Just Fine-Tune?

Fine-tuning bakes knowledge into the model's weights. It's expensive, slow, and static. When your company policy changes, you'd need to fine-tune again.

RAG keeps the knowledge external. Update the policy document in your store, and the next query picks up the new version automatically. No retraining, no redeployment.

	Fine-Tuning	RAG
Knowledge update	Retrain the model	Update the document
Cost	Expensive (GPU hours)	Cheap (embedding + storage)
Freshness	Stale until retrained	Always current
Traceability	Can't point to source	Can cite exact source
Best for	Behavior/style changes	Knowledge/fact retrieval

In Our Contract Workflow

The compliance agent needs to compare contract clauses against your company's approved templates. Those templates live in a document store, not in Claude's training data.

Contract clause: "Vendor liability capped at $50,000"
                           ↓
              RAG retrieves your policy:
              "Minimum vendor liability cap: $100,000
               for Tier 1 vendors, $50,000 for Tier 2"
                           ↓
              Agent reasons: "This vendor is Tier 1.
              $50,000 cap is below our $100,000 minimum.
              Flag for review."

Without RAG, the agent would either hallucinate a policy or give a generic answer. With RAG, it applies your specific standards.

The RAG Pipeline

Step	What Happens
Embed	Convert your documents into vector embeddings and store them
Query	Convert the agent's question into an embedding
Search	Find the most similar document chunks by vector distance
Filter	Remove irrelevant results, apply access controls
Inject	Add the retrieved chunks into the agent's prompt as context
Generate	The model reasons over the retrieved context and produces an answer

Chunking Strategy

How you split your documents matters. Too large and the chunks contain irrelevant information that dilutes the useful parts. Too small and you lose context.

For contracts and policy documents, we've found that splitting by logical section (each clause as a chunk) works better than fixed-size splitting. A 500-token chunk that cuts a clause in half is worse than a 300-token chunk that contains the complete clause.

Common Mistakes

No metadata filtering - Retrieving policy documents from 2019 when 2026 versions exist. Always filter by recency, document type, or relevance.
Too many chunks - Stuffing 20 retrieved chunks into the prompt overwhelms the model. 3 to 5 highly relevant chunks beats 20 loosely related ones.
No source tracking - If the agent cites a policy, you need to verify it actually came from the retrieved document and not from the model's imagination. This is where grounding (Module 7) comes in.

What's Next

RAG gives the agent access to your data. But how do you ensure the agent actually uses that data instead of making things up? In Module 7: Grounding, we cover how to connect agent outputs to verified sources.

Premium

RAG Implementation Lab

Build a production RAG pipeline with OpenSearch Serverless, Bedrock embeddings, metadata filtering, and hybrid search for contract analysis.

The Concept​

Why Not Just Fine-Tune?​

In Our Contract Workflow​

The RAG Pipeline​

Chunking Strategy​

Common Mistakes​

What's Next​