Amazon Redshift Architecture

Amazon Redshift is a fully managed, massively parallel processing (MPP) cloud data warehouse. It is designed for analytical workloads such as reporting, dashboards, and large-scale SQL queries over structured data. This tutorial explains the main components of Redshift architecture and how they work together.

High-Level Architecture Overview

The diagram below summarizes the high-level Amazon Redshift architecture.

High-level Amazon Redshift architecture diagram with client applications, leader node, compute nodes, and storage.

Cluster Components

1. Leader Node

2. Compute Nodes

Storage and Data Lake Integration

1. Managed Storage (RA3)

2. Redshift Spectrum: Querying Data in S3

Redshift Spectrum allows you to run queries against structured data stored directly in Amazon S3, without loading it into local Redshift tables.

Redshift Spectrum architecture diagram showing Redshift cluster querying data stored in Amazon S3 via an external data catalog.

Data Sharing Between Redshift Clusters

Amazon Redshift enables secure data sharing across clusters within the same AWS account or across accounts. This is useful for sharing curated data with different business units, teams, or workloads without copying data.

Redshift data sharing diagram showing a producer cluster sharing data with one or more consumer clusters.

Workload Management and Scaling

1. Workload Management (WLM)

2. Concurrency Scaling and Elastic Resize

Security and Networking

1. Network Isolation

2. Encryption and Access Control

Typical Query Lifecycle

  1. A client application sends a SQL query to the Redshift endpoint.
  2. The leader node parses, optimizes, and generates a distributed execution plan.
  3. The leader node dispatches work to slices on the compute nodes.
  4. Compute nodes scan columnar data from local or managed storage and optionally read external data from S3 via Spectrum.
  5. Compute nodes aggregate intermediate results and send them back to the leader node.
  6. The leader node performs the final aggregation or sorting and returns the result set to the client.

Putting It All Together

In practice, a Redshift-based analytics platform typically includes:

Use the diagrams above together with the section explanations as a visual tutorial for understanding how Amazon Redshift is structured and how data flows through the system.

Data Ingestion Pipeline

The Redshift ingestion pipeline represents the flow of data from operational systems and external data sources into Amazon Redshift. This process typically involves extraction from sources, transformation into optimized formats, and loading into Amazon S3 before final ingestion into Redshift.

Redshift ingestion pipeline showing data sources flowing through ETL into S3 and then into Amazon Redshift

End-to-End Architecture

The end-to-end Redshift architecture shows the complete lifecycle of data – from initial collection, through ingestion and transformation, into Amazon Redshift, and finally consumed by analytics and BI tools.

Redshift end-to-end architecture diagram showing data sources, ingestion, AWS Glue transformations, Redshift warehouse, and analytics outputs