- Prompt cache — ~5 min TTL; massively reduces cost on stable prefixes. Cache-hit rate is a top-tier observability metric.
- Extended thinking — model spends additional tokens reasoning before output.
- Computer use — GUI-controlling tool surface; still niche for dev but relevant for QA loops and legacy-app automation.
Distinct from semantic caching. Yan's Caching pattern covers GPTCache-style semantic caching (match requests by embedding similarity). Generally avoid in agent loops — silent false-match risk is high. Safe applications are narrow: pre-computed summaries against item IDs or constrained input combinations a human can verify.
Auditing prompt-cache utilization is one of the highest-ROI quick wins — many teams burn 3–5× more than they need to. Recommend semantic caching only against item IDs, never against free-text queries inside an agent loop.