Skip to main content

Compute and Containers

While SageMaker abstracts most infrastructure decisions, understanding the underlying compute options is essential for optimizing cost and performance. This section covers EC2 instance types for ML, container workflows, and strategies for reducing training costs.

Overview

ServiceWhat It DoesWhen to Use
Amazon EC2Virtual servers with GPU options (P3, P4, G4) for ML training and inferenceCustom ML environments needing full infrastructure control
Amazon ECRDocker container registryStore custom Docker images for SageMaker training/inference (BYOC pattern)
Amazon ECS / EKSContainer orchestration (ECS = AWS-native, EKS = Kubernetes)Run containerized ML workloads outside SageMaker
AWS FargateServerless compute for containersRun containers without managing servers
AWS BatchManaged batch computing with schedulingLarge-scale batch processing, HPC. Supports GPU and Spot Instances
Deep Learning AMIs (DLAMI)Pre-configured EC2 AMIs with ML frameworks (TensorFlow, PyTorch, MXNet)Quick-start ML development on EC2 with pre-installed CUDA, cuDNN, and frameworks

EC2 Instance Types for ML

Instance FamilyGPUBest For
P3NVIDIA V100Training (general deep learning)
P4NVIDIA A100Training (large-scale, latest generation)
G4NVIDIA T4Inference (cost-effective GPU inference)
Inf1AWS InferentiaInference (custom AWS ML chip, best price-performance)

The BYOC Pattern (Bring Your Own Container)

When SageMaker's built-in algorithms or pre-built framework containers do not meet your needs, use the BYOC pattern:

  1. Build a Docker image with your custom algorithm or framework
  2. Push the image to Amazon ECR
  3. Reference the ECR image URI in your SageMaker Training Job or Endpoint configuration

This pattern gives you full control over the training and inference environment while still leveraging SageMaker's managed infrastructure.

Training Cost Optimization

StrategyHow It Helps
Spot Instances + CheckpointingSave up to 90% on training. SageMaker Managed Spot Training handles interruptions; checkpointing saves progress so training resumes instead of restarting
Pipe/FastFile modeStream data from S3 during training instead of downloading — faster startup, lower storage needs
Training CompilerOptimizes deep learning computation graphs for PyTorch and TensorFlow — up to 50% faster training without code changes
Elastic InferenceAttach fractional GPU to a CPU instance for inference — right-size GPU allocation when a full GPU is underutilized

When to Use

Use SageMaker's managed training infrastructure for most ML workloads — it handles provisioning, scaling, and cleanup automatically. Drop down to raw EC2 or AWS Batch when you need full control over the environment, existing Hadoop/Spark integration, or HPC-style batch processing. Use the BYOC pattern with ECR when you need custom frameworks or algorithms in SageMaker.

Flashcards

1 / 7
Question

What is the BYOC pattern in SageMaker?

Click to reveal
Answer

Bring Your Own Container: Build a Docker image → Push to ECR → Reference in SageMaker. Used when built-in algorithms or pre-built framework containers don't meet your needs.

Key Insight

For GPU-based inference, evaluate whether a full GPU is actually needed. If GPU utilization is low, Elastic Inference (fractional GPU on a CPU instance) or AWS Inferentia (Inf1 instances) can deliver significant cost savings compared to P3/P4 instances.