Reliability

LLM Reliability Plan

LLM output must be evidence-based or abstain.
Structured JSON responses enforce deterministic behavior.
Hybrid retrieval (vector + keyword) improves recall and precision.
Evaluation is continuous: golden set, regressions, and human review.

Grounding Rules

  • Every factual claim needs a citation.
  • If no evidence is found, respond with a safe abstention.
  • Structured data overrides unstructured retrieval.
Decision
We treat abstention as a product feature, not an error path.

Reliability Pipeline

Structured Output Schema

  • answer: concise response.
  • citations: list of evidence references with snippets.
  • confidence: low | medium | high.
  • follow_ups: suggested clarifying questions.
Tradeoff
Schema enforcement reduces model flexibility but prevents UI breakage.

Refusal & Escalation

  • Safety-critical requests require explicit evidence.
  • If high risk and evidence is weak, escalate to a checklist or supervisor.
  • Unsupported requests receive an explicit out-of-scope response.
Risk
Any ungrounded advice in electrical, gas, or refrigerant workflows is unacceptable.

Evaluation Plan

  • Golden set: curated questions with expected citations.
  • Regression tests: run on every prompt or retrieval change.
  • Human review loop: weekly sampling of production answers.