Definition. Validating LLM outputs for syntactic correctness, factuality, safety, and structural conformance — and validating inputs for adversarial content. Distinct from Claude Code Hooks, which fire on tool calls. Hooks block actions; guardrails validate text.
Four layers in order of preference
- Structural guidance — constrain generation to a valid format (Microsoft Guidance, OpenAI structured outputs, JSON-mode). Beats post-hoc validation.
- Syntactic guardrails — post-validate: valid JSON, parseable SQL, value in allowed range, diff applies cleanly, tests pass.
- Semantic guardrails — does the output match the source / pass a fact-check? Often a second LLM. ↔ verifier pattern.
- Safety guardrails — bad-word lists for easy cases; LLM evaluator for nuanced (toxicity, PII, prompt-injection echoes). Apply to inputs too — see incident dossier.
Yan's Guardrails pattern · NeMo-Guardrails · Microsoft Guidance
Sell as a single audit deliverable: "your agent's defense-in-depth map, with the gaps named." Separate hooks from guardrails on the whiteboard; both belong in the same diagram in the order shown.