Nathan Schrader

Reliability

LLM Reliability Plan

LLM output must be evidence-based or abstain.

Structured JSON responses enforce deterministic behavior.

Hybrid retrieval (vector + keyword) improves recall and precision.

Evaluation is continuous: golden set, regressions, and human review.

Grounding Rules

Every factual claim needs a citation.
If no evidence is found, respond with a safe abstention.
Structured data overrides unstructured retrieval.

Decision

We treat abstention as a product feature, not an error path.

Reliability Pipeline

Structured Output Schema

answer: concise response.
citations: list of evidence references with snippets.
confidence: low | medium | high.
follow_ups: suggested clarifying questions.

Tradeoff

Schema enforcement reduces model flexibility but prevents UI breakage.

Refusal & Escalation

Safety-critical requests require explicit evidence.
If high risk and evidence is weak, escalate to a checklist or supervisor.
Unsupported requests receive an explicit out-of-scope response.

Risk

Any ungrounded advice in electrical, gas, or refrigerant workflows is unacceptable.

Evaluation Plan

Golden set: curated questions with expected citations.
Regression tests: run on every prompt or retrieval change.
Human review loop: weekly sampling of production answers.

Open questions / next steps: choose initial model(s), define minimum coverage for the golden set, and set thresholds for auto-rollback.