MediFlow
At the end of a clinical day, the Operations Copilot proposes tomorrow morning's vitals review. The clinician approves it, edits it, or declines it. AI proposes; humans commit.
The problem
Most clinical workflow software falls into one of two failure modes. Either it is an old hospital IT product that treats the clinician as a data-entry clerk: a grid of forms and dropdowns optimized for billing, not for the clinician's actual decision flow. Or it is a modern SaaS product shaped like a B2B analytics dashboard: busy cards, equal-weight grids, generic blue chrome, the same UI vocabulary as a CRM. Neither lets a clinician walk into the workspace mid-day and instantly know who is next, what is urgent, what needs review.
Adding AI to either of those products without rethinking them makes it worse. An autonomous agent inside a data-entry product becomes a faster way to enter the wrong data. An autonomous agent inside a SaaS dashboard becomes a confident-sounding summary of busy cards. In a clinical context, an AI primitive that can act without confirmation is the wrong primitive even when it is right ninety-nine percent of the time. The one percent is the case that defines the product.
The right primitive is a copilot that proposes and a clinician that commits. The case study from here is about what that takes: under regulated healthcare constraints, in a four-month build, on top of a four-layer architecture that an agent-authored codebase cannot quietly violate.
The thesis
MediFlow is a clinical workspace for clinicians moving through a full day of patient care. It is built on Appwrite as the entire backend, on a strict four-layer Clean Architecture in src/, with an Operations Copilot — the safe-AI surface — sitting behind a server-side, provider-neutral runtime boundary.
The brand brief reads: Calm. Assured. Humane. Not the standard SaaS positioning language; closer to a defense-grade instrument spec. The product target is composure first, confidence second, urgency only when a real patient demands it. The Operations Copilot inherits that voice. It proposes; it does not act alone.
Six postures hold the safety claim together. Each one is a stance MediFlow takes about how the Operations Copilot, the workspace, and the codebase relate. None of them is optional; together they explain why "safe Operations Copilot" is a defensible phrase and not a marketing one.
- PHI-minimized by default — what enters the model, what does not.
- Writes need approval — AI proposes; humans commit.
- Server-side, provider-neutral runtime — runtime control over demo magic.
- Scope-driven authz — org boundaries are not decorative.
- Clean architecture enforcement — boundaries that survive agent-generated code.
- Agent-assisted development, human-reviewed merge — speed bounded by review fidelity.
The rest of this case study is the postures, in order, with the artifacts behind each one and the edges where each one currently leaks.
- Commits since first build
- 533
- PRs merged
- 104
- ADRs in the design trail
- 6
- Superpowers plans
- 16
Posture 1 — PHI-minimized by default
Stance
What enters the model is minimized by default; protected health information does not cross the agent-runtime boundary unless an explicit gate opens it.
Why we hold it
A clinical assistant whose default is "ingest everything" cannot be safe by inspection. The shape of the agent's input contract becomes the safety property. Defaulting the other way means a reviewer can read what the prompt assembler is allowed to attach and see the redaction in the code, not in policy.
In code
ADR-0004, the Operations Copilot ADR dated 2026-04-28, codifies the posture. The runtime exposes an ai_phi_enabled flag as the gate. Server-side prompt assembly composes the minimum-necessary input — patient identifiers reduced to surrogate IDs, free-text notes excluded by default, structured fields whitelisted rather than blacklisted. OpenAI's live web search is excluded from PHI workflows. Audit log entries record Operations Copilot actions without copying PHI into the log line.
Where it leaks
ADR-0004 documents ai_phi_enabled as a required gate but leaves its shape — global per deployment, per organization, or per user — implementation-deferred. Until the shape lands, the gate is policy-by-convention rather than enforced-by-data. A redaction-test audit cadence (sample prompts before they reach the model, confirm no PHI surfaces) is on the work list, not in place.
Posture 2 — Writes need approval
Stance
The Operations Copilot proposes; a human commits. The agent has no path to a write that bypasses an explicit approval. AI proposes; humans commit.
Why we hold it
An AI that can act without confirmation in a clinical context is the wrong primitive even when it is correct most of the time. The case that defines the product is the case where the agent was confident and wrong. The cost of one such case crossing the boundary unobserved is structurally higher than the cost of asking a clinician for a click.
In code
The function fleet (patients-*, appointments-*, vitals-*, documents-*) is the canonical mutation boundary. Every patient or document write executes inside an Appwrite Function with declared scopes. The Operations Copilot runs as its own function — ai-operations-copilot — and emits proposals as structured payloads, not as mutations. The approval UI surfaces the proposed mutation in human-readable form with the affected fields highlighted. The user-context call to the mutation function is what actually writes. The copilot's runtime has no direct mutation path; it does not hold the credentials that would allow one.
Where it leaks
The approval gate is a UX surface. A clinician under pressure who clicks through proposals without reading them turns the gate ceremonial. The mitigation lives in how the approval UI presents the proposed change — affected fields highlighted, the original value adjacent — not in the architectural claim. UX-side hardening to verify on a per-proposal-type basis.
Posture 3 — Server-side, provider-neutral runtime
Stance
AI calls run server-side only. The application code talks to a MediFlow-owned interface, not to OpenAI directly. Provider choice is a runtime detail, not an architectural commitment.
Why we hold it
Browser-side AI calls leak credentials, fail to centralize logging, and tie the product to a single vendor's client library. A provider-neutral interface keeps the door open to swap the implementation — Anthropic, Vertex AI, an on-prem inference endpoint — without touching the application layer. In a regulated domain, the option to move providers is a compliance posture, not a fantasy.
In code
The ai-operations-copilot Appwrite Function is the runtime. The current implementation uses OpenAI Agents SDK 0.8.5, but the application layer imports a MediFlow-owned interface — it does not import OpenAI types. The interface defines what a copilot can do; an adapter binds it to a specific vendor. Zod contracts at the boundary check what the agent receives and emits.
The case for staying on Appwrite over moving the AI surface to a separate Vercel deployment is captured in migrationplantovercel.md in the MediFlow repo. The investigation looked at whether splitting AI off would simplify the runtime; the decision against was about keeping the backend control surface — functions, auth, storage, audit — under one vendor. In a regulated context, splitting that surface across providers compounds the BAA-aware design work without adding capability.
Where it leaks
Provider-neutral is a discipline today, not a verified property. Until a second adapter exists in the tree — even a stub one for tests — "the interface is portable" is a claim, not a verified property. The work to land a second adapter is queued.
- Appwrite FunctionsServer-side runtime for every AI call. The mutation boundary stays on the function side.
- OpenAI Agents SDKCurrent copilot implementation behind a MediFlow-owned provider-neutral interface. Swappable without touching the application layer.
- Provider-neutral interfaceMediFlow code does not import OpenAI types. The interface defines what a copilot can do; the adapter binds it to a vendor.
- ZodContracts at the runtime boundary. What the agent receives and emits is shape-checked.
Posture 4 — Scope-driven authz
Stance
Every mutation runs as an Appwrite Function with declared scopes. The scope declaration is the authorization model — there is no application-layer "permission check" layered above it.
Why we hold it
Layering custom permission checks on top of a scoped backend invites drift. The application says yes, the backend says yes by default, and a missed scope declaration becomes a quiet escalation. Treating scope as the authoritative primitive forces every authz change to land in one place — the function definition itself.
In code
scripts/deploy-appwrite-function.mjs declares the scopes per function. ADR-0003, dated 2026-03-27, documents the function-scopes decision alongside the org-context-truthfulness work that landed with it. authStore.ts now falls back to teams.listMemberships() when the embedded membership data from teams.list() is missing, preventing the auth store from silently downgrading a clinician with active permissions to readonly. The probe-type discipline (ADR-0003, decision #5) is the matching debugging rule: server-key probes and browser/user-context probes are not equivalent, and any debugging session against a function under regulated constraints must show both.
Where it leaks
WorkspaceShell.tsx still renders hardcoded organization choices and names. ADR-0003 names this as UI debt. The fix is queued, not landed. Until it ships, the rendered org context in the shell cannot be used as evidence of the backend's view of org context — they can disagree.
Posture 5 — Clean architecture enforcement
Stance
Four layers, ESLint-enforced. The boundaries are load-bearing, not aspirational.
Why we hold it
An agent-assisted codebase produces more code per week than a human-paced one. Boundaries that depend on developer judgment to maintain do not survive that throughput. Boundaries enforced by lint — boundary violations break the build — do. The cost of writing one ESLint rule is paid once; the cost of policing 503 source files by code review is paid every day.
In code
ADR-0001 covers the architecture decision. The source tree is src/{domain,application,interface,infrastructure} with ESLint boundary rules on src/**. Approximate file counts at the snapshot: 10 domain, 85 application, 340 interface, 69 infrastructure. The composition root — the single allowed violation of the dependency direction — lives at src/infrastructure/bootstrap/. The Phase 0 through Phase 3 migration is narrated in commit subjects (the prefix is greppable in git log); the cutover commit is 225279a.
Where it leaks
Phase 4 patient-mutation stabilization is in progress, not complete. Patient and document workflows verified green on live Appwrite as of 2026-03-31. Non-patient and non-document workflows are still being verified one vertical slice at a time. The discipline forbidding broad repo-wide rewrites during stabilization is named explicitly in the repo's context.md and is the reason this case study quotes a snapshot rather than a final state.
Posture 6 — Agent-assisted development, human-reviewed merge
Stance
Most code in MediFlow is authored by Codex agents. Every pull request is reviewed and merged by a human. The Operations Copilot is one safe-AI surface inside the product; the agent-authored codebase is the other one, and it has the same posture: AI proposes; humans commit.
Why we hold it
Agent-driven contribution scales the work, not the review. A clinical codebase needs a human signature at the merge boundary. The discipline lives in the human PR review, not in the agent's confidence at proposal time. Skipping that boundary turns agent throughput into a liability.
In code
533 commits since 2026-01-24; latest 8d865f9. 104 PRs merged. Most feature branches are named codex/.... Merges attribute to the Mujadarah account, the human reviewer. 16 Superpowers plans under docs/superpowers/plans/, most tied to merged PRs and authored in advance of the work. Worktree parallelism via .worktrees/ enables three or more independent UI surfaces to evolve concurrently. AGENTS.md (203 KB) and CLAUDE.md (38 KB) at the repo root capture the operational discipline that bounds the agents.
Where it leaks
The bottleneck shifts to review fidelity. 104 PRs in four months is roughly one to three merges per active day; sustained review at that cadence is the constraint that matters. Review-throughput against PR-creation rate is the metric to watch, not raw commit count. There is no automated check that catches a rubber-stamp merge; the discipline is human.
Trade-offs and open edges
The per-posture "Where it leaks" notes are the honest edges for each posture. This section is the cross-cutting list — debt and deferrals that do not fit cleanly into one posture, and the things the case study deliberately does not claim.
WorkspaceShell.tsx renders hardcoded organization choices and names. Named as UI debt in ADR-0003. Queued for the fix, not yet landed.
ADR-0004's ai_phi_enabled flag is documented as required but its shape — global per deployment, per organization, per user — is implementation-deferred. The gate exists in name. Its enforcement shape is the next decision.
Phase 4 patient-mutation stabilization is in progress. Patient and document workflows are verified green on live Appwrite as of 2026-03-31. Non-patient and non-document workflows are verified one vertical slice at a time, per the explicit discipline in the repo's context.md.
The 241 KB of AGENTS.md and CLAUDE.md at the repo root capture operational discipline this case study does not summarize. The deeper read lives in the wiki dossier, not here.
"Provider-neutral runtime" is a discipline today, not a verified property. Until a second adapter exists in the tree, the portability claim is a posture, not a test.
What the case study does not claim, and what would be needed to claim it for a production-grade healthcare deployment: signed Business Associate Agreements with downstream model and infrastructure providers; a redaction-test audit cadence with automated sampling; scope-coverage tests that enumerate every function and confirm its declared scopes match its actual capability; formal external review of the agent-runtime boundary; an incident-response plan for AI-mediated proposals that get approved despite being wrong.
This is a stabilization snapshot, not a launch announcement. The postures and the numbers are what the project can defend today, on a four-month-old codebase that has decided early what kind of product it is and how it intends to stay safe.