Caching · thinking · compute

← All patterns Source: research/patterns.md

Prompt cache — ~5 min TTL; massively reduces cost on stable prefixes. Cache-hit rate is a top-tier observability metric.
Extended thinking — model spends additional tokens reasoning before output.
Computer use — GUI-controlling tool surface; still niche for dev but relevant for QA loops and legacy-app automation.

Distinct from semantic caching. Yan's Caching pattern covers GPTCache-style semantic caching (match requests by embedding similarity). Generally avoid in agent loops — silent false-match risk is high. Safe applications are narrow: pre-computed summaries against item IDs or constrained input combinations a human can verify.

Auditing prompt-cache utilization is one of the highest-ROI quick wins — many teams burn 3–5× more than they need to. Recommend semantic caching only against item IDs, never against free-text queries inside an agent loop.