Architecture

Architecture

Cloudflare Workers orchestrate AI requests at the edge.
D1 holds structured job context; Vectorize + R2 serve retrieval.
AI Gateway handles model routing, caching, and logging.
Job-scoped context is enforced at every layer.

System Context

Decision
Cloudflare-first architecture is chosen for global latency, integrated services, and cost predictability.

Data Flow (Ingestion → Retrieval → Response)

  1. Structured context pulled from D1 (job, client, property, equipment).
  2. Semantic retrieval queries Vectorize for notes and event history.
  3. Prompt assembly combines structured facts + retrieved evidence.
  4. Model inference runs via AI Gateway with logging and caching.
  5. Post-processing enforces schema and citations, then logs an audit trail.
Tradeoff
We do not precompute everything: on-demand context stays fresh but adds latency.

Tenancy Model

  • Each request carries a tenant-scoped JWT.
  • Retrieval filters include tenant_id and job_id.
  • Audit logs store tenant context, evidence IDs, and output hashes.
Risk
Any missing tenant filter is a data isolation failure; these checks are non-negotiable.