Skip to main content

Claude Model Selection Guide

Choosing the right Claude model is the highest-impact optimization decision you will make. The wrong model wastes money on simple tasks or produces poor results on complex ones. This guide gives you a practical framework for model selection based on real production workloads running on Amazon Bedrock.

Claude Model Family Comparison​

FeatureClaude Haiku 4.5Claude Sonnet 4.6Claude Opus 4.6
Speed~150-180 tok/s~80-120 tok/s~40-70 tok/s
Input Cost$0.80 / 1M tokens$3.00 / 1M tokens$15.00 / 1M tokens
Output Cost$4.00 / 1M tokens$15.00 / 1M tokens$75.00 / 1M tokens
Context Window200K tokens200K tokens200K tokens
Extended ThinkingNoYesYes
VisionYesYesYes
Best ForHigh-volume, simple tasksGeneral production workComplex reasoning

Pricing as of early 2026. Check AWS Bedrock pricing page for current rates.

When to Use Each Model​

Claude Haiku 4.5 -- The Workhorse​

Use Haiku when speed and cost matter more than nuanced reasoning:

  • Text classification (sentiment, intent, category)
  • Entity extraction from structured documents
  • Simple Q&A over retrieved context (RAG responses)
  • Content moderation and safety filtering
  • Data transformation (reformatting, summarizing short text)
  • High-volume pipelines where you process thousands of items

The classify_ticket() function uses Haiku with a prefilled assistant response to classify support tickets into categories with minimal tokens and maximum speed.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Claude Sonnet 4.6 -- The All-Rounder​

Sonnet is the default choice for most production applications:

  • Code generation and review
  • Complex document analysis (contracts, reports, technical docs)
  • Multi-step reasoning that does not need extended thinking
  • Creative writing and content generation
  • Tool use and function calling workflows
  • Chatbots where quality matters

Claude Opus 4.6 -- The Heavy Lifter​

Reserve Opus for tasks where quality justifies the 5x cost over Sonnet:

  • Complex multi-step reasoning with extended thinking enabled
  • Research synthesis across many sources
  • Nuanced analysis where subtle distinctions matter
  • Agentic workflows with complex tool orchestration
  • Code architecture decisions and system design

Context Window Differences and Impact​

All three models share a 200K token context window, but how you use that window varies:

ScenarioRecommended ModelWhy
Short context (under 4K tokens)HaikuFast processing, low cost
Medium context (4K-50K tokens)SonnetGood balance of comprehension and speed
Long context (50K-200K tokens)Sonnet or OpusBetter at maintaining coherence across long documents

Long-context requests have higher latency regardless of model. A 200K-token input takes meaningfully longer to process than a 4K-token input, even though input processing is parallelized.

The select_model() function estimates token count and routes to the appropriate model tier, automatically upgrading simple tasks to Sonnet when the context exceeds 50K tokens.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Model Routing Pattern​

The most cost-effective production architecture routes requests to different models based on task complexity. This pattern can reduce costs by 60-80% compared to sending everything to Sonnet.

The ClaudeRouter class maps task types (classify, extract, summarize, generate_code, research, etc.) to the optimal model tier and invokes Bedrock accordingly, defaulting to Sonnet for unrecognized tasks.

Premium

Model Routing Implementation

Get the complete ClaudeRouter class with task-based model selection, fallback chains, and cost-optimized routing logic.

Flashcards​

1 / 6
Question

What is the approximate cost difference between Haiku and Opus for output tokens?

Click to reveal
Answer

Opus output tokens cost about 19x more than Haiku ($75 vs $4 per million tokens). For input tokens, Opus is about 19x more expensive as well ($15 vs $0.80 per million).

Key Insight

Start with Haiku for everything, then upgrade only the tasks where quality is insufficient. Most teams find that 60-70% of their requests work perfectly fine with Haiku, saving significant cost and latency.