How Classified builds AI operations, written as transferable principles you can apply to your own work. Everything here is drawn from real systems running in production on Zo: an intelligence platform processing live market data around the clock, a voice assistant that spawns and supervises background agents, an operational dashboard that ties a whole working life together, and a content-production terminal. The names and business details do not matter. The patterns do.
The core idea: build operations, not chatbots
A chatbot answers questions. An operation does work on a schedule, holds state, survives restarts, alerts you when something breaks, and gets more valuable every month because its knowledge accumulates.
The shift in thinking: stop asking "what can I ask the AI?" and start asking "what job can I permanently delegate?" A delegated job needs what any employee needs: a briefing, tools scoped to the role, a place to write down what it learned, and someone checking its work. The rest of this guide is how we provide each of those.
Principle 1: Files are the memory. Conversations are disposable.
The single highest-leverage habit. Chat history evaporates and does not scale. Files persist, and every future conversation can read them.
In practice, every project we run has one briefing file at its root. Ours is called AGENTS.md, the name is irrelevant. It answers, in one page or two:
- What is this? One sentence of identity.
- How does it run? Service names, ports, and the exact copy-pasteable commands to build and restart it.
- Where is everything? A map of the key paths and what each does.
- What will bite you? The gotchas, stated specifically.
- What is happening now? Current branch, current focus, known issues.
Any agent, or any human, reads that file first and is productive in minutes instead of re-deriving the project from scratch. When a durable fact changes (a port, a command, a new gotcha), the briefing is updated in the same piece of work. Briefings are orientation, not changelogs: when something is wrong, fix the line, do not append a correction.
The test of a good briefing: could a fresh agent with no history do useful work after reading only that file? If not, the briefing is missing something.
Principle 2: Separate knowledge from behavior
An agent is composed at the moment you spawn it from two ingredients:
- Project context: the what. Comes from the briefing file plus the code itself, pulled on demand.
- Role: the how. A small behavior preset layered on top: research (read and cite, change nothing), code (implement, verify before claiming done), review (read-only and skeptical), content (brand voice, no codebase access).
"Code on project X" is X's briefing plus the coding role. "Review project X" is the same briefing plus the review role. You do not need a zoo of bespoke agents. You need good briefings and a handful of roles.
This is why long threads never have to "remember" anything. Each job starts clean, reads the briefing, does the work, and writes durable changes back to files. Knowledge is expensive to build, so we store it. Behavior is cheap, so we swap it.
Principle 3: Boring stacks, few moving parts
Every production system we run sits on deliberately unexciting foundations: a mainstream web framework or a single TypeScript server, SQLite or Postgres for data, plain processes under a process supervisor. No exotic infrastructure.
The reasoning is operational, not aesthetic. You, plus an AI, are the entire engineering team. Every additional moving part is something that can break at 2am, and the AI debugs boring, well-documented technology far better than clever architecture. SQLite in particular is criminally underrated for this: one of our systems runs its entire product on a single SQLite file, gigabytes of live data, serving a real product, with one writer and a simple backup script. The discipline it forces (exactly one process writes the database) is a feature, not a limitation. The one time we accidentally ran two writers against that file, lock contention made the whole product feel broken. The constraint was the architecture telling us something.
Corollary: match what is there. When you or an agent extends an existing system, follow its existing style and stack choices even when you would have chosen differently. "Modernizing" working code is how systems rot.
Principle 4: Know exactly how everything builds and restarts
Trivial-sounding. It is the difference between calm operations and chaos.
Every long-running thing we operate runs under a process supervisor, and the briefing records the exact build-and-restart command for it, copy-pasteable, no thinking required. Two hard rules fall out of running things this way:
- Never kill a supervised process directly. The supervisor respawns it instantly and now two processes fight over one port. Always restart through the supervisor's own controls.
- Know your build artifacts. One of our systems has two: the web app and a separately bundled background worker. Rebuilding one and not the other produced a class of bug that looked like data corruption and was actually just a stale bundle. The fix was not cleverness. It was writing the full two-step build command into the briefing so it can never be half-done again.
If you cannot state, from memory or from a file, exactly how your system gets from "code changed" to "new code serving traffic," stop and write it down before you change anything else.
Principle 5: Guardrails are architecture, not instructions
When an AI can act on the world, "I told it to be careful" is not a safety model. We build guardrails as structure:
- Scoped sub-agents. When our assistant spawns background workers, each one gets a scope (research, write, admin) with an explicit tool blocklist, and its actions are audited after the run against that scope. The scope is a boundary, not a hint.
- Approval gates. Risky actions (sending messages, spending money, deleting things) go into an approval queue for a human instead of executing directly.
- Action logs. Everything an agent does gets logged. When something goes wrong, you reconstruct what happened from the log instead of guessing.
- Kill switches. Any integration that touches the outside world gets an environment flag that turns it off without a deploy. We have used ours. When a delivery integration started wedging a critical pipeline, one flag flip isolated it while everything else kept running.
- Depth limits. Our sub-agents cannot spawn their own sub-agents. Recursive delegation is how you wake up to a swarm.
The pattern behind all five: assume the agent will eventually do the wrong thing, and make the wrong thing either impossible or recoverable. There is a full guide on applying this to public-facing AI: Personas, Scopes, and Guardrails.
Principle 6: External dependencies are fallbacks, never load-bearing
Third-party APIs disappear, rate-limit you, change pricing, and end their free trials. The rule we build by: an external data source may enrich a feature, but the feature must survive that source going away.
One of our platforms layers third-party enrichment on top of its own collected data. Every one of those integrations is wrapped so that when it fails or vanishes, the product degrades gracefully to its own data instead of breaking. Before you build on any external API, ask: what happens to this feature the day this API says no? If the answer is "the feature dies," redesign it before you ship it.
Principle 7: Verified beats shipped
Our internal tracker has separate statuses for "shipped" and "verified," and they are never collapsed. Shipped means the code is deployed. Verified means someone confirmed the behavior is real. The gap between those two is where embarrassments live.
The same discipline applies to debugging. The rule we hold agents (and ourselves) to: reproduce and instrument before fixing. Confirm the root cause with evidence, then fix it once. Guess-and-iterate debugging feels fast and is slow; it burns trust, time, and money, and it frequently "fixes" a symptom while the cause keeps smoldering.
And because verification needs to keep happening when nobody is looking: anything that matters gets a watchdog. Our always-on systems have dead-man monitors that alert from outside the machine if a heartbeat goes stale, because a monitor that lives on the box it monitors dies with the box.
Principle 8: Write the gotcha down where the next reader will trip
Every painful lesson becomes one specific line in the project's briefing: which file is misleading, which process must never be killed, which build step cannot be skipped, which folder looks canonical but is not. Specific beats general: "the entry point is server.ts, not index.ts, ignore the README" saves an hour; "be careful with the codebase" saves nothing.
This is the compounding loop that makes the whole method work. Mistakes become documentation, documentation becomes context, context makes every future agent and every future you faster. Systems built this way get easier to operate over time. Systems built on chat history get harder.
A worked example, end to end
Here is the shape of one real build, sanitized, so you can see the principles together.
The system: a real-time intelligence platform. It ingests live data from public market APIs continuously, stores everything in a single SQLite database, computes signals over it, and exposes the results three ways: a web dashboard for humans, a REST API for programs, and an MCP server so other AI agents can use it as a tool.
- Two processes, one writer (Principles 3 and 4): a web server, and one background worker that owns all writes to the database. They are separate build artifacts under a supervisor, and the exact build-both-then-restart command lives in the briefing.
- A briefing file (Principle 1) that opens with the build rules, because those are what bite newcomers, then maps the codebase so agents grep instead of reading everything.
- Guardrails (Principle 5): every API route is auth-gated by default with explicit public exceptions, server-to-server calls use an internal token, and outbound integrations have kill-switch flags.
- Fallback posture (Principle 6): its own collected data is the foundation; third-party enrichment is layered on top and the product survives any of it disappearing.
- Watchdogs (Principle 7): worker heartbeat, data-freshness checks, and an off-box alert if any of them go stale, plus a daily off-box backup of the database.
Nothing in that list is exotic. That is the point. The method is mostly the discipline of doing unglamorous things every time: write the briefing, scope the agent, know the build, verify the claim, write down the lesson.
Where to start
Do not start with all eight principles. Start with two:
- Give your current project a one-page briefing file and keep it current.
- Pick the job you delegate most often and write down exactly how it should be done, what the agent may touch, and how you will know it worked.
The rest of the method grows from the habit those two create: treating your AI like staff that deserves a real briefing, real boundaries, and real review.