← All patterns Source: research/patterns.md Claude Mythos Preview93.9% Claude Opus 4.7 (Adaptive)87.6% GPT-5.3 Codex85.0% OpenHands + Claude 4 (OSS top)72.0% Average across 83 models63.4% SWE-bench Verified — May 2026. On SWE-bench Pro, scaffolding alone moves the same model 5–15 points. Context retrieval is the bottleneck, not raw capability. SWE-bench Verified Don't let clients pick by SWE-bench score alone. Cite the 5–15 point harness delta — reinforces the "invest in harness, swap models" thesis.