Security and Governance
ML workloads handle sensitive data — training datasets, model parameters, and inference inputs/outputs all need protection. AWS provides a layered security model covering identity, encryption, network isolation, and auditing that applies to every stage of the ML lifecycle.
Overview
| Service | What It Does | When to Use |
|---|---|---|
| AWS IAM | Identity and access management — users, groups, roles, policies | Control who can access what. SageMaker uses execution roles (not access keys) for all components |
| AWS KMS | Key Management Service — create and manage encryption keys | Encrypt data at rest in S3, EBS, SageMaker volumes, and inter-container communication |
| Amazon VPC | Virtual Private Cloud — network isolation | Isolate SageMaker resources from the public internet |
| AWS CloudTrail | Logs all API calls across your AWS account | Auditing, compliance, security investigation |
| AWS Secrets Manager | Store and rotate secrets (database credentials, API keys) | Securely store credentials used by SageMaker notebooks to connect to Redshift/RDS |
SageMaker Security Architecture
Identity and Access
SageMaker uses IAM execution roles as the primary access mechanism — never access keys. The execution role is attached to notebooks, training jobs, and endpoints, granting them permissions to access S3, ECR, CloudWatch, and other services.
Encryption
| Layer | How to Encrypt |
|---|---|
| Data at rest (S3) | KMS Customer Managed Key (CMK) on the S3 bucket |
| Data at rest (EBS) | KMS encryption on training instance volumes |
| Data in transit | TLS by default. Enable inter-container encryption for distributed training |
| Model artifacts | KMS CMK applied when writing to S3 |
KMS vs. CloudHSM: KMS is fully managed — AWS maintains the root of trust and logs all key usage. CloudHSM gives you dedicated hardware security modules where you maintain the root of trust.
Network Isolation
The standard pattern for running SageMaker without internet access:
- Place SageMaker resources in a VPC with private subnets
- Add an S3 Gateway Endpoint for S3 access without internet
- Add VPC Interface Endpoints (PrivateLink) for SageMaker API access
- Optionally enable
network_isolation=Truein the training job configuration to fully block all network access
This pattern ensures training data and model artifacts never traverse the public internet.
Security Pattern Summary
| Requirement | Solution |
|---|---|
| Control SageMaker access | IAM execution roles |
| No internet access for SageMaker | VPC + S3 Gateway Endpoint + VPC Interface Endpoints |
| Encrypt training data at rest | KMS CMK on S3 + enable volume encryption |
| Encrypt data between training containers | Enable inter-container encryption |
| Full network isolation | network_isolation=True in training config |
| Audit all API calls | CloudTrail |
| Secure database credentials | Secrets Manager |
| Column-level data access | Lake Formation (covered in Analytics section) |
When to Use
Security is not optional — every production ML workload should have IAM roles, KMS encryption, and VPC isolation configured. CloudTrail should always be enabled for auditing. Use Secrets Manager whenever notebooks or jobs need database credentials.
Flashcards
How does SageMaker handle authentication — access keys or IAM roles?
Click to revealIAM execution roles, never access keys. Every SageMaker component (notebook, training job, endpoint) has an execution role that grants permissions to AWS services.
The VPC + S3 Gateway Endpoint + VPC Interface Endpoints pattern is the standard approach for running SageMaker in a private network. The S3 Gateway Endpoint is free and provides access to S3 without internet. VPC Interface Endpoints (PrivateLink) provide access to the SageMaker API itself.