1. From vibe coding to agentic engineering
Karpathy's 2026 Sequoia framing is the executive-friendly narrative: discipline replaces vibes, the new craft is guiding agents. He named December 2025 his personal inflection point — from writing 80% of his code to delegating 80%.
For execs who are being pushed toward "fully autonomous" pitches on one side or anti-AI skepticism on the other, pair the Karpathy frame with Zed's middle-path framing: the work is integration, not replacement, and most engineering orgs are already in the interwoven state.
→ "Vibe coding is passé" (The New Stack) · People · Karpathy · Patterns · Interwoven workflow
2. Context engineering is the new prompt engineering
Most agent failures are context failures. Karpathy / Lance Martin / Drew Breunig converged on this in mid-2025. The taxonomy of failure (poisoning, distraction, confusion, clash) and fixes (write, select, compress, isolate) is the diagnostic frame we use in every engagement.
→ Patterns · Context engineering · Patterns · Context rot
3. The harness matters more than the model
Once frontier models are within ~10% on SWE-bench, the differentiator is the harness: loop, tool surface, memory, sandbox, validation. SWE-bench Pro data shows switching scaffolding moves the same model 5–15 points. Reinforced by Sutskever's Nov 2025 framing of an "end of the scaling era" — gains now come from harness, context, and evals, not raw model size. His term for the gap between benchmark and production performance is "jaggedness": models are superhuman at test-taking, unreliable in practice.
→ Patterns · Harness thesis · Patterns · Jaggedness · People · Sutskever
4. Don't build agents for everything. Don't build multi-agent unless work is genuinely parallel
Anthropic's workflow ladder gives clients a defensible "no" to over-engineered designs. Cognition's "reads fan out, writes stay single-threaded" rule resolves the multi-agent debate in practice.
→ Patterns · Building Effective Agents · Patterns · Multi-agent debate
5. Evals first, autonomy second
Hamel Husain's thesis, validated everywhere we look: 60–80% of dev time should be error analysis; products that ship without evals fail. This is the prerequisite to every loop — and the most under-sold work in the field.
→ Patterns · Evals · People · Hamel Husain
Time-to-merge
−30 to −60%
on bounded ticket types (bumps, lint, typos, simple endpoints)
PR review throughput
+50 to +200%
with critic-style review agent paired
Agent-authored code
Junior–mid
senior review still required (Ronacher, Hashimoto)
Costs without discipline
3–10×
cache hygiene + batch routing + seat plans