Claude Model Selection Guide
Choosing the right Claude model is the highest-impact optimization decision you will make. The wrong model wastes money on simple tasks or produces poor results on complex ones. This guide gives you a practical framework for model selection based on real production workloads running on Amazon Bedrock.
Claude Model Family Comparison​
| Feature | Claude Haiku 4.5 | Claude Sonnet 4.6 | Claude Opus 4.6 |
|---|---|---|---|
| Speed | ~150-180 tok/s | ~80-120 tok/s | ~40-70 tok/s |
| Input Cost | $0.80 / 1M tokens | $3.00 / 1M tokens | $15.00 / 1M tokens |
| Output Cost | $4.00 / 1M tokens | $15.00 / 1M tokens | $75.00 / 1M tokens |
| Context Window | 200K tokens | 200K tokens | 200K tokens |
| Extended Thinking | No | Yes | Yes |
| Vision | Yes | Yes | Yes |
| Best For | High-volume, simple tasks | General production work | Complex reasoning |
Pricing as of early 2026. Check AWS Bedrock pricing page for current rates.
When to Use Each Model​
Claude Haiku 4.5 -- The Workhorse​
Use Haiku when speed and cost matter more than nuanced reasoning:
- Text classification (sentiment, intent, category)
- Entity extraction from structured documents
- Simple Q&A over retrieved context (RAG responses)
- Content moderation and safety filtering
- Data transformation (reformatting, summarizing short text)
- High-volume pipelines where you process thousands of items
The classify_ticket() function uses Haiku with a prefilled assistant response to classify support tickets into categories with minimal tokens and maximum speed.
Model Routing Implementation
Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.
Claude Sonnet 4.6 -- The All-Rounder​
Sonnet is the default choice for most production applications:
- Code generation and review
- Complex document analysis (contracts, reports, technical docs)
- Multi-step reasoning that does not need extended thinking
- Creative writing and content generation
- Tool use and function calling workflows
- Chatbots where quality matters
Claude Opus 4.6 -- The Heavy Lifter​
Reserve Opus for tasks where quality justifies the 5x cost over Sonnet:
- Complex multi-step reasoning with extended thinking enabled
- Research synthesis across many sources
- Nuanced analysis where subtle distinctions matter
- Agentic workflows with complex tool orchestration
- Code architecture decisions and system design
Context Window Differences and Impact​
All three models share a 200K token context window, but how you use that window varies:
| Scenario | Recommended Model | Why |
|---|---|---|
| Short context (under 4K tokens) | Haiku | Fast processing, low cost |
| Medium context (4K-50K tokens) | Sonnet | Good balance of comprehension and speed |
| Long context (50K-200K tokens) | Sonnet or Opus | Better at maintaining coherence across long documents |
Long-context requests have higher latency regardless of model. A 200K-token input takes meaningfully longer to process than a 4K-token input, even though input processing is parallelized.
The select_model() function estimates token count and routes to the appropriate model tier, automatically upgrading simple tasks to Sonnet when the context exceeds 50K tokens.
Model Routing Implementation
Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.
Model Routing Pattern​
The most cost-effective production architecture routes requests to different models based on task complexity. This pattern can reduce costs by 60-80% compared to sending everything to Sonnet.
The ClaudeRouter class maps task types (classify, extract, summarize, generate_code, research, etc.) to the optimal model tier and invokes Bedrock accordingly, defaulting to Sonnet for unrecognized tasks.
Model Routing Implementation
Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.
Flashcards​
What is the approximate cost difference between Haiku and Opus for output tokens?
Click to revealOpus output tokens cost about 19x more than Haiku ($75 vs $4 per million tokens). For input tokens, Opus is about 19x more expensive as well ($15 vs $0.80 per million).
Start with Haiku for everything, then upgrade only the tasks where quality is insufficient. Most teams find that 60-70% of their requests work perfectly fine with Haiku, saving significant cost and latency.