Compute and Containers

While SageMaker abstracts most infrastructure decisions, understanding the underlying compute options is essential for optimizing cost and performance. This section covers EC2 instance types for ML, container workflows, and strategies for reducing training costs.

Overview

Service	What It Does	When to Use
Amazon EC2	Virtual servers with GPU options (P3, P4, G4) for ML training and inference	Custom ML environments needing full infrastructure control
Amazon ECR	Docker container registry	Store custom Docker images for SageMaker training/inference (BYOC pattern)
Amazon ECS / EKS	Container orchestration (ECS = AWS-native, EKS = Kubernetes)	Run containerized ML workloads outside SageMaker
AWS Fargate	Serverless compute for containers	Run containers without managing servers
AWS Batch	Managed batch computing with scheduling	Large-scale batch processing, HPC. Supports GPU and Spot Instances
Deep Learning AMIs (DLAMI)	Pre-configured EC2 AMIs with ML frameworks (TensorFlow, PyTorch, MXNet)	Quick-start ML development on EC2 with pre-installed CUDA, cuDNN, and frameworks

EC2 Instance Types for ML

Instance Family	GPU	Best For
P3	NVIDIA V100	Training (general deep learning)
P4	NVIDIA A100	Training (large-scale, latest generation)
G4	NVIDIA T4	Inference (cost-effective GPU inference)
Inf1	AWS Inferentia	Inference (custom AWS ML chip, best price-performance)

The BYOC Pattern (Bring Your Own Container)

When SageMaker's built-in algorithms or pre-built framework containers do not meet your needs, use the BYOC pattern:

Build a Docker image with your custom algorithm or framework
Push the image to Amazon ECR
Reference the ECR image URI in your SageMaker Training Job or Endpoint configuration

This pattern gives you full control over the training and inference environment while still leveraging SageMaker's managed infrastructure.

Training Cost Optimization

Strategy	How It Helps
Spot Instances + Checkpointing	Save up to 90% on training. SageMaker Managed Spot Training handles interruptions; checkpointing saves progress so training resumes instead of restarting
Pipe/FastFile mode	Stream data from S3 during training instead of downloading — faster startup, lower storage needs
Training Compiler	Optimizes deep learning computation graphs for PyTorch and TensorFlow — up to 50% faster training without code changes
Elastic Inference	Attach fractional GPU to a CPU instance for inference — right-size GPU allocation when a full GPU is underutilized

When to Use

Use SageMaker's managed training infrastructure for most ML workloads — it handles provisioning, scaling, and cleanup automatically. Drop down to raw EC2 or AWS Batch when you need full control over the environment, existing Hadoop/Spark integration, or HPC-style batch processing. Use the BYOC pattern with ECR when you need custom frameworks or algorithms in SageMaker.

Flashcards

1 / 7

Question

What is the BYOC pattern in SageMaker?

Click to reveal

Answer

Bring Your Own Container: Build a Docker image → Push to ECR → Reference in SageMaker. Used when built-in algorithms or pre-built framework containers don't meet your needs.

Key Insight

For GPU-based inference, evaluate whether a full GPU is actually needed. If GPU utilization is low, Elastic Inference (fractional GPU on a CPU instance) or AWS Inferentia (Inf1 instances) can deliver significant cost savings compared to P3/P4 instances.

Overview​

EC2 Instance Types for ML​

The BYOC Pattern (Bring Your Own Container)​

Training Cost Optimization​

When to Use​

Flashcards​