LLM Reliability Plan
LLM output must be evidence-based or abstain.
Structured JSON responses enforce deterministic behavior.
Hybrid retrieval (vector + keyword) improves recall and precision.
Evaluation is continuous: golden set, regressions, and human review.
Grounding Rules
- Every factual claim needs a citation.
- If no evidence is found, respond with a safe abstention.
- Structured data overrides unstructured retrieval.
Decision
We treat abstention as a product feature, not an error path.
Reliability Pipeline
Structured Output Schema
- answer: concise response.
- citations: list of evidence references with snippets.
- confidence: low | medium | high.
- follow_ups: suggested clarifying questions.
Tradeoff
Schema enforcement reduces model flexibility but prevents UI breakage.
Refusal & Escalation
- Safety-critical requests require explicit evidence.
- If high risk and evidence is weak, escalate to a checklist or supervisor.
- Unsupported requests receive an explicit out-of-scope response.
Risk
Any ungrounded advice in electrical, gas, or refrigerant workflows is unacceptable.
Evaluation Plan
- Golden set: curated questions with expected citations.
- Regression tests: run on every prompt or retrieval change.
- Human review loop: weekly sampling of production answers.