Building Effective Agents
Task → Workflow → Agent ladder. Five workflow patterns. Start simple; add complexity only when it earns its keep.
Yan's 7-pattern map
The 2×2 — data ↔ user × defensive ↔ offensive — that organizes the whole applied-LLM surface.
Interwoven workflow
Middle path between utopia and skepticism. Three-phase model: deterministic → stochastic → interwoven.
Harness > model
Once frontier models are within 10% of each other, the harness — loop, tools, memory, sandbox — is the differentiator.
Context engineering
Write / Select / Compress / Isolate — Martin's taxonomy for filling the context window with the right tokens.
Context rot
Four failure modes: Poisoning, Distraction, Confusion, Clash. Fixes: Pruning, Summarization, Offloading.
Caching · thinking · compute
Prompt cache (highest-ROI quick win), extended thinking, computer use. Distinct from Yan's semantic caching.
Plan mode
Plan = design document. Never let an agent write code until the plan is approved.
Spec-driven coding
Humans write structured specs/intent docs; agents fill them in. Spec becomes the eval.
Coding-agent loops
Edit → test → read output → decide. Critic loops add verification when tests are weak.
Evals & error analysis
Spend 60–80% of dev time on error analysis. Match metric to task. LLM-as-judge with four bias mitigations.
SWE-bench Verified
Frontier scores + the harness-delta finding (scaffolding moves the same model 5–15 points on SWE-bench Pro).
Feedback flywheel
Capture explicit + implicit signals. They become the next eval set and the next fine-tune corpus.
Skills · Hooks · Subagents
Capability bundles (auto-loaded), deterministic lifecycle guards, isolated-context specialists.
Multi-agent debate
+90% on internal evals (Anthropic) vs fragile-by-default (Cognition). Resolution: reads fan out, writes single-threaded.
Reads fan out
Read with many, write with one. Best one-line guidance for clients designing their first multi-agent system.
Model Context Protocol
Typed, auditable agent-to-product interface. ~78% enterprise adoption, ~9,400 public servers.
Verifier / critic
Pair a generator with a separate model whose job is to find faults. Used by CodeRabbit, Greptile, Graphite.
Guardrails
Validate inputs and outputs — distinct from Hooks. Four layers: structural → syntactic → semantic → safety.
Defensive UX
Five principles to anticipate ML imperfection. Engineering clients under-invest because their users are other engineers.
Jaggedness
Frontier models pass PhD exams but lack taste, judgment, reliability. The vocabulary for the SWE-bench/production gap.
Viral CLAUDE.md
The 110k-star file. Four rules + MEMORY.md / ERRORS.md. Useful boilerplate; ignore the unverified accuracy claim.