Pattern · Discipline

Guardrails

Yan · NeMo · MS Guidance

Definition. Validating LLM outputs for syntactic correctness, factuality, safety, and structural conformance — and validating inputs for adversarial content. Distinct from Claude Code Hooks, which fire on tool calls. Hooks block actions; guardrails validate text.

INPUT GUARDRAIL probabilistic validates text HOOK deterministic block / allow TOOL side effect writes / reads OUTPUT GUARDRAIL probabilistic validate / reject Defense-in-depth: both layers belong in the same diagram, in this order
Guardrails ≠ Hooks. Hooks are deterministic and fire on tool actions. Guardrails are probabilistic and fire on text. Most clients conflate them.

Four layers in order of preference

  1. Structural guidance — constrain generation to a valid format (Microsoft Guidance, OpenAI structured outputs, JSON-mode). Beats post-hoc validation.
  2. Syntactic guardrails — post-validate: valid JSON, parseable SQL, value in allowed range, diff applies cleanly, tests pass.
  3. Semantic guardrails — does the output match the source / pass a fact-check? Often a second LLM. ↔ verifier pattern.
  4. Safety guardrails — bad-word lists for easy cases; LLM evaluator for nuanced (toxicity, PII, prompt-injection echoes). Apply to inputs too — see incident dossier.

Yan's Guardrails pattern · NeMo-Guardrails · Microsoft Guidance

Sell as a single audit deliverable: "your agent's defense-in-depth map, with the gaps named." Separate hooks from guardrails on the whiteboard; both belong in the same diagram in the order shown.