Top-level synthesis

Agentic coding — state of the art

Internal position for the consulting practice. Five anchor ideas, expected outcomes, house principles, risk frame, tooling stance. Detail in Patterns, People, Harnesses, Production.

Snapshot: May 2026 Source: RESEARCH.md

01 The five ideas that anchor our pitch #

1. From vibe coding to agentic engineering

Karpathy's 2026 Sequoia framing is the executive-friendly narrative: discipline replaces vibes, the new craft is guiding agents. He named December 2025 his personal inflection point — from writing 80% of his code to delegating 80%.

For execs who are being pushed toward "fully autonomous" pitches on one side or anti-AI skepticism on the other, pair the Karpathy frame with Zed's middle-path framing: the work is integration, not replacement, and most engineering orgs are already in the interwoven state.

"Vibe coding is passé" (The New Stack) · People · Karpathy · Patterns · Interwoven workflow

2. Context engineering is the new prompt engineering

Most agent failures are context failures. Karpathy / Lance Martin / Drew Breunig converged on this in mid-2025. The taxonomy of failure (poisoning, distraction, confusion, clash) and fixes (write, select, compress, isolate) is the diagnostic frame we use in every engagement.

Patterns · Context engineering · Patterns · Context rot

3. The harness matters more than the model

Once frontier models are within ~10% on SWE-bench, the differentiator is the harness: loop, tool surface, memory, sandbox, validation. SWE-bench Pro data shows switching scaffolding moves the same model 5–15 points. Reinforced by Sutskever's Nov 2025 framing of an "end of the scaling era" — gains now come from harness, context, and evals, not raw model size. His term for the gap between benchmark and production performance is "jaggedness": models are superhuman at test-taking, unreliable in practice.

Patterns · Harness thesis · Patterns · Jaggedness · People · Sutskever

4. Don't build agents for everything. Don't build multi-agent unless work is genuinely parallel

Anthropic's workflow ladder gives clients a defensible "no" to over-engineered designs. Cognition's "reads fan out, writes stay single-threaded" rule resolves the multi-agent debate in practice.

Patterns · Building Effective Agents · Patterns · Multi-agent debate

5. Evals first, autonomy second

Hamel Husain's thesis, validated everywhere we look: 60–80% of dev time should be error analysis; products that ship without evals fail. This is the prerequisite to every loop — and the most under-sold work in the field.

Patterns · Evals · People · Hamel Husain

02 What clients should expect (honest numbers) #

Time-to-merge
−30 to −60%
on bounded ticket types (bumps, lint, typos, simple endpoints)
PR review throughput
+50 to +200%
with critic-style review agent paired
Agent-authored code
Junior–mid
senior review still required (Ronacher, Hashimoto)
Costs without discipline
3–10×
cache hygiene + batch routing + seat plans
Refuse to repeat "X% of PRs are agent-authored" without normalization by complexity (lines × files × test churn). Vendor numbers are not comparable.

03 House-style design principles #

01 · PLAN MODE Plan before code
Never let an agent write code until the plan is approved. (Cherny)
02 · PLATFORM Harness > model
Standardize one gateway; avoid raw per-token where seats exist.
03 · EXTENSIBILITY Skills · Hooks · Subagents
Capability bundles, deterministic guards, bounded specialists.
04 · INTEGRATION MCP, with guardrails
SSO, audit, read-only default, scoped tokens, supply-chain review.
05 · RIGOR Evals as prerequisite
Working eval suite before any autonomous loop ships.
06 · DIAGNOSTICS Context, not prompts
Audit via write / select / compress / isolate.
07 · ARCHITECTURE Single-threaded writes
Multi-agent only when parallel and conflict-free.
08 · TRUST Read before write
Especially incident/on-call and prod-data access.
Eight principles we install in every engagement. The first four are platform; the next four are practice.

04 Risk slides (use directly with clients) #

Real, dated, primary-sourced incidents. Full dossier in Production · Incidents.

05 Tooling stance (defaults) #

Client profileRecommended core
Modern web / TS shopClaude Code + Cursor in parallel
Monorepo / large codebaseSourcegraph Amp + Claude Code
OSS startup, cost-sensitiveAider + OpenRouter / LiteLLM
Regulated enterprise / JetBrains shopJetBrains AI / Junie + gateway
Self-hosting requirementOpenHands or Continue.dev + gateway
Ticket-driven product teamDevin or Copilot agent for issue→PR + Claude Code
AlwaysPair a high-recall + a low-noise code-review agent

Detail and pricing → Harnesses

06 Top-10 reading list #

  1. Anthropic, "Building Effective Agents"
  2. Karpathy, Sequoia AI Ascent 2026 talk on agentic engineering
  3. Cherny, Latent Space + Lenny's
  4. Ronacher, "Agentic Coding Recommendations" + "A Year Of Vibes"
  5. Hashimoto, Zed blog conversation
  6. Lance Martin, "Context Engineering for Agents"
  7. Drew Breunig, "How Long Contexts Fail" + "How to Fix Your Context"
  8. Cognition, "Don't Build Multi-Agents" + "What's Actually Working"
  9. Anthropic, "How We Built Our Multi-Agent Research System"
  10. Hamel Husain, "LLM Evals: Everything You Need to Know"

07 Open research gaps to chase next #