LLM Engineering

Production patterns for building LLM-backed systems in 2026 — agents that actually work, structured output that actually parses, retrieval that actually retrieves, and inference stacks that actually scale. The pages below are notes from shipping these patterns in real systems, not vendor pitches.

The cluster covers six concrete areas: agent loops and the Model Context Protocol; reliable function calling and structured output across providers; evaluating RAG without fooling yourself; hybrid search and reranking; self-hosting open-weights models with vLLM; and orchestration frameworks (LangGraph, DSPy) — including when none of them are needed.


Pages in this Cluster


Companion pages on the rest of the site: Amazon Bedrock, RAG, Vector Databases, Hugging Face.