Amazon SageMaker

Amazon SageMaker is AWS's end-to-end machine learning platform. It provides the tools to label data, build and train models, tune hyperparameters, deploy to managed endpoints, and monitor models in production — all without provisioning or managing the underlying GPU/CPU infrastructure directly.

Key Components:

SageMaker Studio: Web-based IDE that unifies notebooks, experiments, pipelines, and deployed endpoints into a single workspace.
Training Jobs: Distributed training on managed GPU/CPU clusters with built-in algorithms, bring-your-own-container, or framework containers (PyTorch, TensorFlow, Hugging Face, XGBoost).
SageMaker JumpStart: Catalog of pre-trained models (including Llama, Mistral, Stable Diffusion, embeddings) that can be deployed with one click or fine-tuned on your data.
Endpoints (Real-Time, Serverless, Async, Batch): Four inference modes to match latency and cost requirements; real-time endpoints auto-scale on instance count.
Pipelines: Native ML CI/CD for orchestrating preprocessing, training, evaluation, and deployment as versioned, reproducible DAGs.
Feature Store: Online/offline store for engineered features, enabling consistent features between training and inference.
Model Monitor & Clarify: Detects data drift, model drift, bias, and explains predictions (SHAP) in production.
Ground Truth: Managed data labeling with human workforces (Mechanical Turk, your own team, or third-party vendors) and active-learning to reduce labeling cost.

Common Use Cases:

Custom Model Training: Train deep learning models on proprietary data using PyTorch/TensorFlow on multi-GPU instances without managing the cluster.
Foundation Model Fine-Tuning: Fine-tune open-source LLMs (Llama, Mistral, Falcon) from JumpStart on domain data and deploy to private endpoints.
Production Inference: Serve real-time predictions for fraud detection, recommendations, or personalization with auto-scaling endpoints.
Batch Inference: Score large datasets offline (e.g., nightly churn predictions) without provisioning a persistent endpoint.
ML Platform Standardization: Provide data-science teams a shared, governed platform with lineage, access control, and reproducible pipelines.

Example: Deploy a Hugging Face Model to a SageMaker Endpoint


from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()

model = HuggingFaceModel(
    model_data="s3://my-bucket/models/distilbert.tar.gz",
    role=role,
    transformers_version="4.37",
    pytorch_version="2.1",
    py_version="py310",
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.xlarge",
    endpoint_name="distilbert-sentiment",
)

print(predictor.predict({"inputs": "SageMaker makes model deployment straightforward."}))

SageMaker vs. Bedrock:

Bedrock is the managed-API path for consuming foundation models; SageMaker is the full ML platform for teams that need to train, host, and operate their own models. Many production architectures combine both — Bedrock for generic text/embedding tasks and SageMaker for custom models and specialized inference.