Flask — Overview

1. Introduction

Flask is a Python micro-framework for building WSGI web applications. It was created by Armin Ronacher in 2010 as an April Fools' joke that turned into one of the most widely deployed Python web frameworks in production. Flask is "micro" in the sense that it ships with a minimal core — request routing, a templating engine, and a development server — and leaves persistence, authentication, migrations, forms, and admin interfaces to external extensions or the application author.

The two load-bearing dependencies are:

Werkzeug — a WSGI utility library that provides the request/response objects, URL routing (Map/Rule), the debugger, and the dev server. Flask is essentially a thin, opinionated layer over Werkzeug.
Jinja2 — a sandboxed templating engine with expression evaluation, template inheritance, autoescaping, and custom filters. Both projects are maintained under the Pallets umbrella.

Flask's design philosophy is explicit over implicit: no ORM is imposed, no project layout is enforced, and there is no built-in admin. This makes it a common default for ML model serving, internal tools, and small-to-medium HTTP APIs where the cost of a full framework is not justified.

2. WSGI Fundamentals

Flask is a synchronous WSGI framework. WSGI (Web Server Gateway Interface) is specified in PEP 3333 and defines the contract between a Python web application and an HTTP server. The contract is deliberately simple: an application is any callable that accepts two arguments — environ (a dict of CGI-style request variables) and start_response (a callable used to emit the status line and headers) — and returns an iterable of bytes representing the response body.

def application(environ, start_response):
    status = "200 OK"
    headers = [("Content-Type", "text/plain; charset=utf-8")]
    start_response(status, headers)
    return [b"hello from raw WSGI"]

Flask's Flask object is a WSGI application — calling app(environ, start_response) dispatches through Werkzeug's routing, invokes the matched view function, converts its return value into a Response, and serialises the result back through start_response. Any WSGI-compatible server — gunicorn, uWSGI, waitress, mod_wsgi — can host a Flask app without modification. In production, Flask is typically served by gunicorn behind nginx, with multiple sync workers to amortise the GIL.

3. Request/Response Lifecycle

When a request reaches the Flask application object, it flows through a well-defined sequence:

WSGI entry — the server calls app(environ, start_response). Flask wraps environ in a Werkzeug Request object.
Context push — Flask pushes an application context and a request context onto thread-local stacks, exposing current_app, g, request, and session as context-local proxies.
URL matching — the Werkzeug MapAdapter matches the path + method against the registered url_map, producing an endpoint name and view arguments.
before_request hooks — any functions registered with @app.before_request run in registration order. If one returns a non-None value, the view is skipped and that value becomes the response.
View dispatch — the view function runs with the matched arguments. Its return value (string, dict, tuple, or Response) is normalised into a Response object via make_response().
after_request hooks — each @app.after_request function receives the Response and may mutate or replace it (add headers, log, etc.). They run even if the view raised, only if the error was handled.
teardown_request — always runs, including on unhandled exceptions; used for closing DB sessions or releasing resources.
WSGI return — the Response is called as a WSGI application itself, invoking start_response and yielding body bytes.

from flask import Flask, request, jsonify, g
import time

app = Flask(__name__)

@app.before_request
def start_timer():
    g.t0 = time.perf_counter()

@app.after_request
def log_latency(response):
    dt_ms = (time.perf_counter() - g.t0) * 1000
    app.logger.info("%s %s -> %d in %.1fms",
                    request.method, request.path, response.status_code, dt_ms)
    response.headers["X-Response-Time-ms"] = f"{dt_ms:.1f}"
    return response

@app.post("/predict")
def predict():
    payload = request.get_json(force=True)
    # model.predict(...) would go here
    return jsonify(score=0.873, label="positive")

4. Core Components

Application object — Flask(__name__). Holds the URL map, view registry, config, extensions, and logger. Can be constructed inside an application factory (create_app()) to support multiple configs and testing.
Routing — decorator-based: @app.route("/users/<int:uid>"). Backed by Werkzeug's routing, supporting converters (int, float, uuid, path), HTTP method constraints, and URL building via url_for().
View functions — plain callables returning a string, dict (auto-serialised to JSON in Flask 2.x), tuple (body, status, headers), or Response. Class-based views (MethodView) are available for REST-style dispatch.
Request context — bound to a single inbound HTTP request. request and session are proxies resolved against the current request context. Essential for thread-safe access under sync WSGI.
Application context — bound to the application instance for the duration of a request (or an explicit with app.app_context(): block). current_app and g (a per-request scratchpad) live here.
Jinja2 templating — render_template() loads from the templates/ folder, with autoescape on for .html. Supports inheritance ({% extends %}, {% block %}), macros, and custom filters registered via @app.template_filter().
Blueprints — modular grouping of routes, templates, and static files; registered on the app with a URL prefix. The standard mechanism for splitting a Flask app across files.
Config — app.config is a dict populated from objects, env vars, or files (from_object, from_envvar, from_pyfile).

5. Flask vs FastAPI vs Django

All three are mature Python web frameworks but target different problems. The table below reflects real production trade-offs, not marketing positioning.

Dimension	Flask	FastAPI	Django
Paradigm	Sync WSGI (async partial since 2.0)	Async-first ASGI, sync also supported	Sync WSGI + native ASGI since 3.0
Philosophy	Micro; bring-your-own components	Micro; Pydantic + Starlette-based	Batteries-included; ORM, admin, auth, migrations
Typing / validation	None built-in (use marshmallow / Flask-Smorest)	Pydantic models native; runtime validation free	Form + serializer frameworks (DRF) add it
OpenAPI / docs	Extension (Flask-Smorest, apispec)	Auto-generated from type hints	Via DRF + drf-spectacular
Throughput (sync I/O)	Good with gunicorn + many workers	Excellent under async I/O; sync is similar to Flask	Good; overhead from middleware + ORM
ORM	None; SQLAlchemy via Flask-SQLAlchemy	None; SQLAlchemy / SQLModel / Tortoise	Django ORM (tightly coupled, opinionated)
Templating	Jinja2	Jinja2 (optional; API-first)	Django Templates (or Jinja2)
Best for	ML serving, small APIs, legacy/integration glue	High-concurrency APIs, typed microservices	CMS, CRUD-heavy apps with admin, server-rendered sites
Learning curve	Low	Low–moderate (type hints, async)	Moderate–high (framework conventions)

Honest take: for a greenfield typed JSON API in 2026, FastAPI is the default. Flask remains strong where sync code, simple deployment, and the extension ecosystem matter more than async throughput. Django wins when you need the admin, auth, and ORM on day one.

6. When to Choose Flask

ML model serving — wrapping a scikit-learn / PyTorch / XGBoost model behind a /predict endpoint. The work is CPU-bound inference, not async I/O; gunicorn with N sync workers is simpler and often faster than async. Flask is the de-facto choice for MLflow-style one-shot model servers and for sidecar prediction APIs.
Small-to-medium internal APIs — admin tools, config services, webhook receivers — where predictable throughput is fine and team familiarity matters more than raw QPS.
Legacy integration — Flask plays cleanly with mod_wsgi and uWSGI, which still dominate enterprise Linux deployments. Dropping a Flask blueprint into an existing WSGI stack (including behind Apache) is friction-free.
Prototyping — zero-boilerplate "hello world" in four lines; no type-system or framework convention to fight while exploring an idea.
Education and interviewing — the framework's surface area is small enough to teach routing, WSGI, and request lifecycle without the framework obscuring what's happening.

7. Ecosystem

Flask's "micro" core is viable in production only because of a mature extension ecosystem. The canonical set:

Flask-SQLAlchemy — session-per-request integration with SQLAlchemy, scoped sessions tied to the request context, declarative models.
Flask-Migrate — Alembic wrapper for schema migrations, exposed through the Flask CLI (flask db migrate, flask db upgrade).
Flask-Login — user session management, login/logout, @login_required, remember-me cookies. Pluggable user loader.
Flask-RESTful — class-based resources and request-parsing. Widely used historically; Flask-Smorest is the more modern successor.
Flask-Smorest — OpenAPI 3 generation via marshmallow schemas; closes most of the gap with FastAPI for teams committed to Flask.
Flask-JWT-Extended — stateless JWT auth with access/refresh tokens, token revocation lists, and decorator-based protection.
Flask-CORS — CORS headers by blueprint, endpoint, or globally. Essential for browser-facing APIs on a separate origin.
Flask-Caching, Flask-Limiter, Flask-Admin, Flask-Mail — round out caching, rate-limiting, generated admin UIs, and outbound email.

# Typical production install for a Flask API
pip install "flask>=3.0" gunicorn \
    flask-sqlalchemy flask-migrate \
    flask-jwt-extended flask-cors flask-smorest \
    psycopg2-binary

# Run behind gunicorn with 4 sync workers, 2 threads each
gunicorn -w 4 --threads 2 -b 0.0.0.0:8000 "app:create_app()"

8. Limitations

GIL-bound sync model. Flask's default execution model is one request per worker thread; the Python GIL serialises CPU-bound code. Scaling means more processes (gunicorn workers) rather than more threads — fine for CPU-bound ML inference, limiting for IO-heavy fan-out workloads that FastAPI + asyncio handle with a single process.
Async is retrofitted, not native. Flask 2.0 added async def view support, but the core is still WSGI: every async view is run inside a per-request event loop by a sync worker. There is no free lunch versus an ASGI-native framework. High-concurrency async workloads belong on Starlette / FastAPI / Quart (the async Flask-API-compatible fork).
Everything useful is an extension. Auth, ORM, migrations, schemas, admin — each comes from a separate package with its own maintenance cadence. Integration work is real, and extension quality varies.
Not ideal for WebSockets / long-lived connections. WSGI was designed for short request/response cycles. Flask-SocketIO exists but requires an alternate worker class (eventlet/gevent). For WebSocket-first designs, reach for ASGI frameworks directly.
No built-in validation or serialisation. Request bodies arrive as dict; validation is the developer's problem unless an extension like Flask-Smorest or pydantic is added. In a typed codebase this feels archaic next to FastAPI's free Pydantic integration.
Implicit globals. request, g, current_app, session are context-locals — convenient, but they complicate testing and make dependency wiring less explicit than FastAPI's Depends() system.

None of these make Flask wrong; they map the zone where it is and is not the right tool. For an ML-engineer workflow of "load a model, expose /predict, deploy with gunicorn behind nginx, done," Flask is still one of the most operationally predictable choices in the Python ecosystem.