Capturing explicit and implicit signals from users to feed evals, error analysis, fine-tuning data, and reward models.
Signals worth capturing in agentic-coding deployments
- PR-level: merge vs discard · human edit distance · time-to-merge · reverts within 7 days.
- Review agent: comment resolved vs dismissed · human edit on agent comment · fraction of PRs where humans added comments the agent missed.
- Plan mode: plan accepted as-is vs revised vs rejected — direct proxy for plan quality.
- Eng-bot: thread closed vs escalated · follow-up question rate.
Closing the loop. These signals don't just measure — they become the next eval set and the next fine-tune corpus. Without instrumentation up front, the data is gone.
Build the instrumentation as part of the pilot, not after. Cheap version (week 1): structured logs of every agent decision + outcome, tagged with run-id, written to a single table.