Nathan Schrader

Architecture

Cloudflare Workers orchestrate AI requests at the edge.

D1 holds structured job context; Vectorize + R2 serve retrieval.

AI Gateway handles model routing, caching, and logging.

Job-scoped context is enforced at every layer.

System Context

Decision

Cloudflare-first architecture is chosen for global latency, integrated services, and cost predictability.

Data Flow (Ingestion → Retrieval → Response)

Structured context pulled from D1 (job, client, property, equipment).
Semantic retrieval queries Vectorize for notes and event history.
Prompt assembly combines structured facts + retrieved evidence.
Model inference runs via AI Gateway with logging and caching.
Post-processing enforces schema and citations, then logs an audit trail.

Tradeoff

We do not precompute everything: on-demand context stays fresh but adds latency.

Tenancy Model

Each request carries a tenant-scoped JWT.
Retrieval filters include tenant_id and job_id.
Audit logs store tenant context, evidence IDs, and output hashes.

Risk

Any missing tenant filter is a data isolation failure; these checks are non-negotiable.

Open questions / next steps: confirm whether Durable Objects are needed for session state in the first release and define retention policies for R2 artifacts.