Claude Model Selection Guide

Choosing the right Claude model is the highest-impact optimization decision you will make. The wrong model wastes money on simple tasks or produces poor results on complex ones. This guide gives you a practical framework for model selection based on real production workloads running on Amazon Bedrock.

Claude Model Family Comparison

Feature	Claude Haiku 4.5	Claude Sonnet 4.6	Claude Opus 4.6
Speed	~150-180 tok/s	~80-120 tok/s	~40-70 tok/s
Input Cost	$0.80 / 1M tokens	$3.00 / 1M tokens	$15.00 / 1M tokens
Output Cost	$4.00 / 1M tokens	$15.00 / 1M tokens	$75.00 / 1M tokens
Context Window	200K tokens	200K tokens	200K tokens
Extended Thinking	No	Yes	Yes
Vision	Yes	Yes	Yes
Best For	High-volume, simple tasks	General production work	Complex reasoning

Pricing as of early 2026. Check AWS Bedrock pricing page for current rates.

When to Use Each Model

Claude Haiku 4.5 -- The Workhorse

Use Haiku when speed and cost matter more than nuanced reasoning:

Text classification (sentiment, intent, category)
Entity extraction from structured documents
Simple Q&A over retrieved context (RAG responses)
Content moderation and safety filtering
Data transformation (reformatting, summarizing short text)
High-volume pipelines where you process thousands of items

The classify_ticket() function uses Haiku with a prefilled assistant response to classify support tickets into categories with minimal tokens and maximum speed.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Claude Sonnet 4.6 -- The All-Rounder

Sonnet is the default choice for most production applications:

Code generation and review
Complex document analysis (contracts, reports, technical docs)
Multi-step reasoning that does not need extended thinking
Creative writing and content generation
Tool use and function calling workflows
Chatbots where quality matters

Claude Opus 4.6 -- The Heavy Lifter

Reserve Opus for tasks where quality justifies the 5x cost over Sonnet:

Complex multi-step reasoning with extended thinking enabled
Research synthesis across many sources
Nuanced analysis where subtle distinctions matter
Agentic workflows with complex tool orchestration
Code architecture decisions and system design

Context Window Differences and Impact

All three models share a 200K token context window, but how you use that window varies:

Scenario	Recommended Model	Why
Short context (under 4K tokens)	Haiku	Fast processing, low cost
Medium context (4K-50K tokens)	Sonnet	Good balance of comprehension and speed
Long context (50K-200K tokens)	Sonnet or Opus	Better at maintaining coherence across long documents

Long-context requests have higher latency regardless of model. A 200K-token input takes meaningfully longer to process than a 4K-token input, even though input processing is parallelized.

The select_model() function estimates token count and routes to the appropriate model tier, automatically upgrading simple tasks to Sonnet when the context exceeds 50K tokens.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Model Routing Pattern

The most cost-effective production architecture routes requests to different models based on task complexity. This pattern can reduce costs by 60-80% compared to sending everything to Sonnet.

The ClaudeRouter class maps task types (classify, extract, summarize, generate_code, research, etc.) to the optimal model tier and invokes Bedrock accordingly, defaulting to Sonnet for unrecognized tasks.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Flashcards

1 / 6

Question

What is the approximate cost difference between Haiku and Opus for output tokens?

Click to reveal

Answer

Opus output tokens cost about 19x more than Haiku ($75 vs $4 per million tokens). For input tokens, Opus is about 19x more expensive as well ($15 vs $0.80 per million).

Key Insight

Start with Haiku for everything, then upgrade only the tasks where quality is insufficient. Most teams find that 60-70% of their requests work perfectly fine with Haiku, saving significant cost and latency.

Claude Model Family Comparison​

When to Use Each Model​

Claude Haiku 4.5 -- The Workhorse​

Claude Sonnet 4.6 -- The All-Rounder​

Claude Opus 4.6 -- The Heavy Lifter​

Context Window Differences and Impact​

Model Routing Pattern​

Flashcards​

Claude Model Family Comparison

When to Use Each Model

Claude Haiku 4.5 -- The Workhorse

Claude Sonnet 4.6 -- The All-Rounder

Claude Opus 4.6 -- The Heavy Lifter

Context Window Differences and Impact

Model Routing Pattern

Flashcards