Architecture
Cloudflare Workers orchestrate AI requests at the edge.
D1 holds structured job context; Vectorize + R2 serve retrieval.
AI Gateway handles model routing, caching, and logging.
Job-scoped context is enforced at every layer.
System Context
Decision
Cloudflare-first architecture is chosen for global latency, integrated services, and cost predictability.
Data Flow (Ingestion → Retrieval → Response)
- Structured context pulled from D1 (job, client, property, equipment).
- Semantic retrieval queries Vectorize for notes and event history.
- Prompt assembly combines structured facts + retrieved evidence.
- Model inference runs via AI Gateway with logging and caching.
- Post-processing enforces schema and citations, then logs an audit trail.
Tradeoff
We do not precompute everything: on-demand context stays fresh but adds latency.
Tenancy Model
- Each request carries a tenant-scoped JWT.
- Retrieval filters include tenant_id and job_id.
- Audit logs store tenant context, evidence IDs, and output hashes.
Risk
Any missing tenant filter is a data isolation failure; these checks are non-negotiable.