Guardrails — Patterns

← All patterns Source: research/patterns.md

Definition. Validating LLM outputs for syntactic correctness, factuality, safety, and structural conformance — and validating inputs for adversarial content. Distinct from Claude Code Hooks, which fire on tool calls. Hooks block actions; guardrails validate text.

Guardrails ≠ Hooks. Hooks are deterministic and fire on tool actions. Guardrails are probabilistic and fire on text. Most clients conflate them.

Four layers in order of preference

Structural guidance — constrain generation to a valid format (Microsoft Guidance, OpenAI structured outputs, JSON-mode). Beats post-hoc validation.
Syntactic guardrails — post-validate: valid JSON, parseable SQL, value in allowed range, diff applies cleanly, tests pass.
Semantic guardrails — does the output match the source / pass a fact-check? Often a second LLM. ↔ verifier pattern.
Safety guardrails — bad-word lists for easy cases; LLM evaluator for nuanced (toxicity, PII, prompt-injection echoes). Apply to inputs too — see incident dossier.

Yan's Guardrails pattern · NeMo-Guardrails · Microsoft Guidance

Sell as a single audit deliverable: "your agent's defense-in-depth map, with the gaps named." Separate hooks from guardrails on the whiteboard; both belong in the same diagram in the order shown.