Flask — Overview

1. Introduction

Flask is a Python micro-framework for building WSGI web applications. It was created by Armin Ronacher in 2010 as an April Fools' joke that turned into one of the most widely deployed Python web frameworks in production. Flask is "micro" in the sense that it ships with a minimal core — request routing, a templating engine, and a development server — and leaves persistence, authentication, migrations, forms, and admin interfaces to external extensions or the application author.

The two load-bearing dependencies are:

Flask's design philosophy is explicit over implicit: no ORM is imposed, no project layout is enforced, and there is no built-in admin. This makes it a common default for ML model serving, internal tools, and small-to-medium HTTP APIs where the cost of a full framework is not justified.

2. WSGI Fundamentals

Flask is a synchronous WSGI framework. WSGI (Web Server Gateway Interface) is specified in PEP 3333 and defines the contract between a Python web application and an HTTP server. The contract is deliberately simple: an application is any callable that accepts two arguments — environ (a dict of CGI-style request variables) and start_response (a callable used to emit the status line and headers) — and returns an iterable of bytes representing the response body.

def application(environ, start_response):
    status = "200 OK"
    headers = [("Content-Type", "text/plain; charset=utf-8")]
    start_response(status, headers)
    return [b"hello from raw WSGI"]

Flask's Flask object is a WSGI application — calling app(environ, start_response) dispatches through Werkzeug's routing, invokes the matched view function, converts its return value into a Response, and serialises the result back through start_response. Any WSGI-compatible server — gunicorn, uWSGI, waitress, mod_wsgi — can host a Flask app without modification. In production, Flask is typically served by gunicorn behind nginx, with multiple sync workers to amortise the GIL.

3. Request/Response Lifecycle

When a request reaches the Flask application object, it flows through a well-defined sequence:

  1. WSGI entry — the server calls app(environ, start_response). Flask wraps environ in a Werkzeug Request object.
  2. Context push — Flask pushes an application context and a request context onto thread-local stacks, exposing current_app, g, request, and session as context-local proxies.
  3. URL matching — the Werkzeug MapAdapter matches the path + method against the registered url_map, producing an endpoint name and view arguments.
  4. before_request hooks — any functions registered with @app.before_request run in registration order. If one returns a non-None value, the view is skipped and that value becomes the response.
  5. View dispatch — the view function runs with the matched arguments. Its return value (string, dict, tuple, or Response) is normalised into a Response object via make_response().
  6. after_request hooks — each @app.after_request function receives the Response and may mutate or replace it (add headers, log, etc.). They run even if the view raised, only if the error was handled.
  7. teardown_request — always runs, including on unhandled exceptions; used for closing DB sessions or releasing resources.
  8. WSGI return — the Response is called as a WSGI application itself, invoking start_response and yielding body bytes.
from flask import Flask, request, jsonify, g
import time

app = Flask(__name__)

@app.before_request
def start_timer():
    g.t0 = time.perf_counter()

@app.after_request
def log_latency(response):
    dt_ms = (time.perf_counter() - g.t0) * 1000
    app.logger.info("%s %s -> %d in %.1fms",
                    request.method, request.path, response.status_code, dt_ms)
    response.headers["X-Response-Time-ms"] = f"{dt_ms:.1f}"
    return response

@app.post("/predict")
def predict():
    payload = request.get_json(force=True)
    # model.predict(...) would go here
    return jsonify(score=0.873, label="positive")

4. Core Components

5. Flask vs FastAPI vs Django

All three are mature Python web frameworks but target different problems. The table below reflects real production trade-offs, not marketing positioning.

Dimension Flask FastAPI Django
Paradigm Sync WSGI (async partial since 2.0) Async-first ASGI, sync also supported Sync WSGI + native ASGI since 3.0
Philosophy Micro; bring-your-own components Micro; Pydantic + Starlette-based Batteries-included; ORM, admin, auth, migrations
Typing / validation None built-in (use marshmallow / Flask-Smorest) Pydantic models native; runtime validation free Form + serializer frameworks (DRF) add it
OpenAPI / docs Extension (Flask-Smorest, apispec) Auto-generated from type hints Via DRF + drf-spectacular
Throughput (sync I/O) Good with gunicorn + many workers Excellent under async I/O; sync is similar to Flask Good; overhead from middleware + ORM
ORM None; SQLAlchemy via Flask-SQLAlchemy None; SQLAlchemy / SQLModel / Tortoise Django ORM (tightly coupled, opinionated)
Templating Jinja2 Jinja2 (optional; API-first) Django Templates (or Jinja2)
Best for ML serving, small APIs, legacy/integration glue High-concurrency APIs, typed microservices CMS, CRUD-heavy apps with admin, server-rendered sites
Learning curve Low Low–moderate (type hints, async) Moderate–high (framework conventions)

Honest take: for a greenfield typed JSON API in 2026, FastAPI is the default. Flask remains strong where sync code, simple deployment, and the extension ecosystem matter more than async throughput. Django wins when you need the admin, auth, and ORM on day one.

6. When to Choose Flask

7. Ecosystem

Flask's "micro" core is viable in production only because of a mature extension ecosystem. The canonical set:

# Typical production install for a Flask API
pip install "flask>=3.0" gunicorn \
    flask-sqlalchemy flask-migrate \
    flask-jwt-extended flask-cors flask-smorest \
    psycopg2-binary

# Run behind gunicorn with 4 sync workers, 2 threads each
gunicorn -w 4 --threads 2 -b 0.0.0.0:8000 "app:create_app()"

8. Limitations

None of these make Flask wrong; they map the zone where it is and is not the right tool. For an ML-engineer workflow of "load a model, expose /predict, deploy with gunicorn behind nginx, done," Flask is still one of the most operationally predictable choices in the Python ecosystem.