Flask — Production Deployment

1. Overview

Flask's built-in flask run / app.run() server is the Werkzeug development server. It is single-threaded by default, does not implement robust process supervision, leaks memory under sustained load, and explicitly prints a warning telling you not to use it in production. A real production deployment separates concerns into four layers:

A typical production stack looks like: client → ALB (TLS) → Nginx → Gunicorn (unix socket) → Flask, with Postgres and Redis behind the app, everything containerized and scheduled by Kubernetes.

2. WSGI Servers

Choice of WSGI server is largely driven by workload shape (CPU-bound vs I/O-bound) and whether you need async.

Server Worker Model Best For Notes
Gunicorn Pre-fork (sync / gthread / gevent / eventlet) General-purpose Flask/Django on Linux De facto Python standard; simple config; excellent signal handling.
uWSGI Pre-fork + threads; emperor mode Multi-app hosting, tight resource caps Very feature-rich (200+ options); steep learning curve; dev pace has slowed.
Waitress Single process, thread pool Windows deployments, simple internal tools Pure Python, no C deps; cross-platform; lower throughput than Gunicorn.
Hypercorn ASGI (asyncio / trio / uvloop) Flask 2+ async views, HTTP/2, WebSockets Required if you use async def views; supports HTTP/3 (experimental).
mod_wsgi Embedded in Apache httpd Legacy Apache shops Rarely the right choice for new deployments; couples app to Apache lifecycle.

Rule of thumb: start with Gunicorn + gthread workers. Switch to gevent only if profiling shows I/O-bound bottlenecks (lots of downstream HTTP calls, slow DB queries). Reach for Hypercorn only if the codebase is genuinely async-native.

3. Gunicorn Configuration

Drop a gunicorn.conf.py next to your app. Keeping it as Python (rather than CLI flags) makes it version-controllable and lets you compute values at startup.

# gunicorn.conf.py
import multiprocessing
import os

# Server socket
bind = os.environ.get("GUNICORN_BIND", "unix:/run/gunicorn/app.sock")
backlog = 2048

# Worker processes
# Rule of thumb: (2 x $num_cores) + 1 for sync / gthread workers.
workers = int(os.environ.get(
    "GUNICORN_WORKERS",
    (multiprocessing.cpu_count() * 2) + 1,
))
worker_class = os.environ.get("GUNICORN_WORKER_CLASS", "gthread")
threads = int(os.environ.get("GUNICORN_THREADS", 4))
worker_connections = 1000

# Timeouts
timeout = 30          # kill workers that block for >30s
graceful_timeout = 30  # drain window on SIGTERM
keepalive = 5         # behind nginx this is fine; bump to 75 behind an ALB

# Recycle workers to contain slow memory leaks / fragmentation
max_requests = 1000
max_requests_jitter = 100

# Load the app before forking workers so shared code lives in
# copy-on-write memory (saves RAM with many workers).
preload_app = True

# Logging — send everything to stdout/stderr; let the container
# runtime / systemd ship logs to the aggregator.
accesslog = "-"
errorlog = "-"
loglevel = os.environ.get("GUNICORN_LOGLEVEL", "info")
access_log_format = (
    '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s '
    '"%(f)s" "%(a)s" %(L)s %({x-request-id}i)s'
)

# Process naming (easier to spot in `ps` / `top`)
proc_name = "flask-app"

# Lifecycle hooks
def post_fork(server, worker):
    server.log.info("Worker spawned (pid: %s)", worker.pid)

def worker_int(worker):
    worker.log.info("Worker received INT or QUIT signal")

def on_exit(server):
    server.log.info("Shutting down master")

Worker class guidance:

Start Gunicorn with:

gunicorn --config gunicorn.conf.py "myapp:create_app()"

4. Nginx as Reverse Proxy

Nginx terminates TLS, buffers slow clients so Gunicorn workers don't stall, serves static assets directly, and adds security headers. The Flask app never faces the public internet.

# /etc/nginx/sites-available/flask-app
upstream flask_app {
    server unix:/run/gunicorn/app.sock fail_timeout=0;
    keepalive 32;
}

# HTTP → HTTPS redirect
server {
    listen 80;
    listen [::]:80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name api.example.com;

    # TLS
    ssl_certificate     /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers on;
    ssl_session_cache   shared:SSL:10m;
    ssl_session_timeout 10m;

    # Request size — tune per endpoint; default low for safety
    client_max_body_size 10m;
    client_body_timeout  30s;
    client_header_timeout 10s;
    send_timeout         30s;

    # Buffers (protect upstream from slow clients / header floods)
    client_body_buffer_size    128k;
    client_header_buffer_size  4k;
    large_client_header_buffers 4 16k;

    # Gzip (brotli is better if module available)
    gzip             on;
    gzip_vary        on;
    gzip_min_length  1024;
    gzip_proxied     any;
    gzip_comp_level  6;
    gzip_types       text/plain text/css application/json application/javascript
                     text/xml application/xml application/xml+rss text/javascript;

    # Security headers
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
    add_header X-Frame-Options            "DENY" always;
    add_header X-Content-Type-Options     "nosniff" always;
    add_header Referrer-Policy            "strict-origin-when-cross-origin" always;
    add_header Permissions-Policy         "geolocation=(), microphone=(), camera=()" always;
    add_header Content-Security-Policy    "default-src 'self'; frame-ancestors 'none'" always;

    # Static files served directly by nginx
    location /static/ {
        alias /srv/flask-app/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # Health check — do not log to keep access log clean
    location = /healthz {
        proxy_pass http://flask_app;
        access_log off;
    }

    # Everything else → Gunicorn
    location / {
        proxy_pass         http://flask_app;
        proxy_http_version 1.1;
        proxy_set_header   Host              $host;
        proxy_set_header   X-Real-IP         $remote_addr;
        proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_set_header   X-Request-ID      $request_id;
        proxy_set_header   Connection        "";

        proxy_connect_timeout 5s;
        proxy_send_timeout    30s;
        proxy_read_timeout    30s;

        proxy_buffering       on;
        proxy_buffer_size     8k;
        proxy_buffers         8 16k;
        proxy_busy_buffers_size 32k;

        proxy_redirect        off;
    }
}

Behind Nginx, remember to enable ProxyFix in Flask so request.remote_addr reflects X-Forwarded-For:

from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1, x_prefix=1)

5. Docker

Multi-stage builds keep the runtime image small and free of compilers. Run as a non-root user, add a HEALTHCHECK, and order layers for cache hits.

# Dockerfile
# ---- Stage 1: builder ------------------------------------------------
FROM python:3.11-slim AS builder

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential gcc libpq-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt

# ---- Stage 2: runtime ------------------------------------------------
FROM python:3.11-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PATH="/install/bin:$PATH" \
    PYTHONPATH="/install/lib/python3.11/site-packages"

RUN apt-get update && apt-get install -y --no-install-recommends \
        libpq5 curl \
    && rm -rf /var/lib/apt/lists/* \
    && groupadd --system app && useradd --system --gid app --home /app app

COPY --from=builder /install /install

WORKDIR /app
COPY --chown=app:app . .

USER app
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -fsS http://127.0.0.1:8000/healthz || exit 1

CMD ["gunicorn", "--config", "gunicorn.conf.py", "myapp:create_app()"]

Companion .dockerignore — cuts build context and prevents secrets from leaking into layers:

.git
.gitignore
.env
.env.*
.venv
__pycache__
*.pyc
*.pyo
.pytest_cache
.mypy_cache
.coverage
htmlcov/
tests/
docs/
*.md
Dockerfile
docker-compose*.yml
.github/
.vscode/
.idea/

6. Docker Compose

Compose file for dev/staging parity — identical image, real Nginx, real Postgres, real Redis. Good enough to catch 90% of "works on my laptop" issues.

# docker-compose.yml
version: "3.9"

services:
  app:
    build: .
    image: flask-app:local
    environment:
      FLASK_ENV: production
      DATABASE_URL: postgresql://app:app@postgres:5432/app
      REDIS_URL: redis://redis:6379/0
      SECRET_KEY: ${SECRET_KEY:?SECRET_KEY required}
      GUNICORN_BIND: "0.0.0.0:8000"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    expose:
      - "8000"
    restart: unless-stopped

  nginx:
    image: nginx:1.27-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./deploy/nginx.conf:/etc/nginx/conf.d/default.conf:ro
      - ./deploy/certs:/etc/letsencrypt:ro
      - static:/srv/flask-app/static:ro
    depends_on:
      - app
    restart: unless-stopped

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app
      POSTGRES_DB: app
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      timeout: 3s
      retries: 10
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    command: ["redis-server", "--save", "60", "1", "--loglevel", "warning"]
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 10
    restart: unless-stopped

volumes:
  pgdata:
  redisdata:
  static:

7. Kubernetes

For anything running at real scale, Kubernetes is the target. Key objects: a Deployment with rolling update + probes, a Service, an Ingress, and an HorizontalPodAutoscaler.

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-app
  labels: { app: flask-app }
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0     # zero-downtime; always keep 3 healthy
  selector:
    matchLabels: { app: flask-app }
  template:
    metadata:
      labels: { app: flask-app }
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port:   "8000"
        prometheus.io/path:   "/metrics"
    spec:
      terminationGracePeriodSeconds: 45   # > gunicorn graceful_timeout
      containers:
        - name: app
          image: ghcr.io/acme/flask-app:1.42.0
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8000
              name: http
          env:
            - name: GUNICORN_BIND
              value: "0.0.0.0:8000"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef: { name: flask-app-secrets, key: database_url }
            - name: SECRET_KEY
              valueFrom:
                secretKeyRef: { name: flask-app-secrets, key: secret_key }
          resources:
            requests: { cpu: "250m", memory: "256Mi" }
            limits:   { cpu: "1",    memory: "512Mi" }
          readinessProbe:
            httpGet: { path: /healthz/ready, port: http }
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet: { path: /healthz/live, port: http }
            initialDelaySeconds: 15
            periodSeconds: 20
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                # Let the service endpoints controller remove this pod
                # from rotation before gunicorn starts shutting down.
                command: ["sh", "-c", "sleep 10"]
---
apiVersion: v1
kind: Service
metadata:
  name: flask-app
spec:
  selector: { app: flask-app }
  ports:
    - port: 80
      targetPort: http
      name: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  ingressClassName: nginx
  tls:
    - hosts: [ api.example.com ]
      secretName: flask-app-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: flask-app
                port: { name: http }
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flask-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: flask-app
  minReplicas: 3
  maxReplicas: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }
    - type: Resource
      resource:
        name: memory
        target: { type: Utilization, averageUtilization: 80 }
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60

Two probes with different semantics matter: /healthz/live returns 200 while the process is alive (used to restart deadlocked pods), and /healthz/ready returns 200 only when DB + Redis + downstream deps are reachable (used to gate traffic).

8. Configuration Management

Follow the 12-factor rule: all config via environment. Flask config classes keep per-environment defaults versioned; secrets stay out of git.

# config.py
import os

class BaseConfig:
    SECRET_KEY = os.environ["SECRET_KEY"]           # required
    SQLALCHEMY_DATABASE_URI = os.environ["DATABASE_URL"]
    SQLALCHEMY_TRACK_MODIFICATIONS = False
    SQLALCHEMY_ENGINE_OPTIONS = {
        "pool_size": 10,
        "max_overflow": 20,
        "pool_pre_ping": True,
        "pool_recycle": 1800,
    }
    REDIS_URL = os.environ["REDIS_URL"]
    SESSION_COOKIE_SECURE = True
    SESSION_COOKIE_HTTPONLY = True
    SESSION_COOKIE_SAMESITE = "Lax"
    PREFERRED_URL_SCHEME = "https"

class DevConfig(BaseConfig):
    DEBUG = True
    SESSION_COOKIE_SECURE = False   # http on localhost

class ProdConfig(BaseConfig):
    DEBUG = False
    TESTING = False

def load(app):
    env = os.environ.get("FLASK_ENV", "production")
    app.config.from_object({"development": DevConfig,
                            "production":  ProdConfig}[env])

Secrets precedence (prod):

  1. AWS Secrets Manager / HashiCorp Vault / GCP Secret Manager — fetched at pod start and projected as env vars (or via CSI driver).
  2. Kubernetes Secret objects — encrypted at rest with KMS.
  3. Never: .env files in the image, hard-coded constants, or secrets in Git.

For local dev only: python-dotenv loads .env, which is listed in both .gitignore and .dockerignore.

9. Logging

Containers log to stdout/stderr as a single JSON object per line. The platform (Docker, Kubernetes, ECS) ships them to CloudWatch / ELK / Loki. Correlation IDs let you stitch a single request across Nginx, Flask, and downstream services.

# logging_setup.py
import logging
import os
import uuid
from flask import g, request
from pythonjsonlogger import jsonlogger

def configure_logging(app):
    handler = logging.StreamHandler()
    fmt = jsonlogger.JsonFormatter(
        "%(asctime)s %(levelname)s %(name)s %(message)s "
        "%(request_id)s %(user_id)s %(path)s %(status)s %(duration_ms)s",
        rename_fields={"asctime": "ts", "levelname": "level"},
    )
    handler.setFormatter(fmt)

    root = logging.getLogger()
    root.handlers = [handler]
    root.setLevel(os.environ.get("LOG_LEVEL", "INFO"))

    # Quiet noisy libs
    logging.getLogger("urllib3").setLevel(logging.WARNING)
    logging.getLogger("botocore").setLevel(logging.WARNING)

    @app.before_request
    def _req_start():
        g.request_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))
        g._t0 = time.monotonic()

    @app.after_request
    def _req_end(resp):
        dur_ms = int((time.monotonic() - g._t0) * 1000)
        app.logger.info(
            "request",
            extra={
                "request_id": g.request_id,
                "user_id": getattr(g, "user_id", None),
                "path": request.path,
                "method": request.method,
                "status": resp.status_code,
                "duration_ms": dur_ms,
            },
        )
        resp.headers["X-Request-ID"] = g.request_id
        return resp

10. Observability

Three pillars, three tools:

from prometheus_flask_exporter import PrometheusMetrics
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

def configure_observability(app):
    metrics = PrometheusMetrics(app, group_by="endpoint")
    metrics.info("app_info", "Flask app", version=os.environ.get("APP_VERSION", "dev"))

    FlaskInstrumentor().instrument_app(app)
    SQLAlchemyInstrumentor().instrument(engine=app.extensions["sqlalchemy"].engine)

    sentry_sdk.init(
        dsn=os.environ.get("SENTRY_DSN"),
        integrations=[FlaskIntegration()],
        traces_sample_rate=0.05,
        profiles_sample_rate=0.01,
        environment=os.environ.get("FLASK_ENV", "production"),
        release=os.environ.get("APP_VERSION"),
    )

11. Security Hardening

12. Database Migrations

Flask-Migrate wraps Alembic. The hard part is not the tool — it's making migrations safe to run while the old version of the app is still serving traffic.

Expand-contract (three deploys per breaking change):

  1. Expand — add the new column/table/index (nullable=True, no default for large tables — fill via backfill job). Deploy app that writes to both old and new.
  2. Migrate — backfill historical rows; flip reads to the new column.
  3. Contract — remove the old column once no code references it and you've held on the previous step long enough to roll back safely.

For Postgres: use CREATE INDEX CONCURRENTLY, avoid ALTER TABLE ... ADD COLUMN NOT NULL DEFAULT on large tables (pre-14 rewrites the whole table), and set lock_timeout on migration sessions so a stuck migration doesn't freeze production.

13. Zero-Downtime Deploys

Gunicorn handles SIGTERM by closing the listening socket, stopping new request acceptance, and giving workers graceful_timeout seconds to finish in-flight requests. Getting this right in Kubernetes requires coordinating:

14. Scaling Strategies

Vertical — more CPU / RAM per pod, more Gunicorn workers. Hits diminishing returns: the GIL, DB connection pool exhaustion, and NUMA effects all cap per-pod throughput. Rule of thumb: 2–4 CPU per pod, then scale out.

Horizontal — more pods behind the Service. Scales linearly until the database becomes the bottleneck. Plan for this: read replicas, connection pooler (PgBouncer in transaction mode), cache-aside with Redis, materialized views for heavy reads.

Sessions — do not use Flask's default client-side signed cookie for sessions of any real size. Move to server-side sessions backed by Redis via Flask-Session; pods become stateless and any pod can handle any request (no sticky sessions required).

from flask_session import Session
app.config.update(
    SESSION_TYPE="redis",
    SESSION_REDIS=redis.from_url(os.environ["REDIS_URL"]),
    SESSION_USE_SIGNER=True,
    SESSION_KEY_PREFIX="sess:",
    PERMANENT_SESSION_LIFETIME=timedelta(hours=8),
)
Session(app)

15. Cost / Performance Tuning

A well-tuned Flask pod on 1 vCPU / 512MB with gthread workers can comfortably handle 200–500 req/s at sub-50ms p95 for DB-backed JSON endpoints. When you need more, scale horizontally first — it is almost always cheaper than chasing micro-optimizations.