Jaggedness — Patterns

← All patterns Source: research/patterns.md

Definition. Frontier models pass the hardest PhD-level exams but lack the taste, judgment, and reliability of a human expert in the same domain. RL on benchmark distributions creates models that are "superhuman at test-taking" but unreliable in practice.

Why now. Sutskever also argues the era of pure scaling is over — another 100× would help but not transform. Returns now come from new learning paradigms, not raw compute.

Jaggedness. Frontier models spike past human experts on benchmarks but crater on neighbouring tasks the benchmark didn't cover. This is why agent-authored code remains junior-to-mid.

Ilya Sutskever · Dwarkesh Podcast (Nov 25 2025) · People · Sutskever

Use jaggedness as the vocabulary when execs are dazzled by SWE-bench scores. Pairs with the SWE-bench Pro finding that scaffolding moves the same model 5–15 points.