AI Agent Orchestration in Practice
I Wanted to Build It Myself
Running an algorithm study group, there were recurring frustrations. Problem assignments, submission checks, code review requests — every week, the same tasks done by hand. As the group grew, the management overhead scaled exponentially, and the thought naturally came: "It'd be great if there were a platform to automate all this."
Around that time, I came across the concept of AI agent orchestration. The idea of assigning different roles to multiple AI agents and having them collaborate as a single system. The more I read about it, the more curious I became: "What would happen if I applied this to building an actual service?"
I couldn't just sit around thinking about it. I decided to try it myself. Build an MSA-based study platform solo, while running AI agents like team members.
The Walls You Hit When Building Alone
Once development started, the walls appeared quickly. Not because I couldn't write code, but because it's hard to maintain both productivity and consistency at the same time when you're alone.
The Limits of Productivity
Working across 6 microservices means heavy context-switching costs. In the morning, you're designing a Saga pattern; in the afternoon, you switch to frontend SSE integration; in the evening, you're editing k8s manifests. When you come back to the Saga the next day, the reasoning behind yesterday's decisions has already faded. Time spent re-reading code from scratch piles up, and actual progress slows down.
The Absence of Consistency
When you develop alone, your standards start to wobble. "Just hardcode it this once," "I'll write tests later." With no one to stop you, it's easy to give in to temptation. Commit conventions, file structure, error handling patterns — they drift a little bit each day. With 6 services, inconsistencies multiply sixfold.
These two issues were ultimately pointing at the same problem. A single person's head can't hold the full system's context and standards at the same time.
First Attempt: 8 Agents
I built an agent orchestration system based on Claude Code. I started with 8 agents.
Oracle (Arbiter), Conductor (Orchestrator), Gatekeeper (Guardian), Librarian (Records Keeper), Architect (Infrastructure Designer), Postman (Delivery Agent), Curator (Problem Setter), Herald (Messenger). The structure mapped real development team roles onto AI agents.
At this stage, I gave each agent a full persona all at once. Role description, behavioral rules, code conventions, security principles — everything crammed into a single agent's prompt. I expected that giving them an identity like "this is who you are" would make them figure out the rest on their own.
How to Design the Workflow
Creating the agents was the easy part. The real problem was in what order and how to make them collaborate. Initially, I called agents as needed, but the limitations showed up quickly.
The agents couldn't maintain context. Calling the same agent twice meant it didn't remember decisions from the first call. Work ordering between agents also got tangled. If Conductor modified the API before Librarian changed the schema, types wouldn't match.
From Personas to Rules
The bigger problem was consistency. Giving each of the 8 agents a full persona caused shared rules to subtly diverge. Conductor would keep functions under 20 lines while Herald would produce 30-line functions. Commit message formats also varied slightly between agents.
In the end, I extracted the shared rules into a single file. persona-base.md — a shared behavioral rules file that every agent must read before starting work. Code conventions, security principles, reporting structure, escalation paths — all consolidated here. Each agent's individual prompt was left with only domain-specific knowledge.
There was a context optimization benefit too. With shorter agent prompts, they followed core instructions better. And with shared rules in one place, updates only needed to be made once.
From 8 to 12
Running the 8-agent system, gaps started to appear.
Documentation was piling up with no agent to manage it. Sprint ADRs, MEMORY.md, prompt change history — Librarian was handling these on the side, but DB schema management and documentation management are fundamentally different tasks. I split out Scribe (Chronicler).
Adding the AI analysis feature made Sensei (Analyst) necessary. Claude API calls, Circuit Breaker patterns, response parsing — Herald handling both frontend and AI was too broad a scope.
As the frontend grew, a similar problem emerged. Herald was responsible for both page development and the design system, but separating component specs from page logic made more sense. Palette (Designer) took over the design system.
Finally, Scout (Recon). There was no role to verify agents' output from the user's perspective. We needed an agent to catch cases where a feature worked but the UX felt awkward.
Mission Critical
If the service breaks, everything stops. Uses the most powerful model (Opus).
Planning decisions, agent coordination
Saga orchestration
Auth, JWT, OAuth, security
DB schema, migrations
Core
Infrastructure foundation and core business logic
k8s, CI/CD, monitoring
GitHub integration, MQ consumer
Problem management, deadlines, stats
Docs, memory, prompts
Enhancement
Roles that enhance service value. Work begins after Echelon 1 and 2 are stable
AI analysis, Claude API
Frontend, SSE, UX
Design system, accessibility
User-perspective testing
I (the PM) only talk to Oracle. Oracle analyzes tasks and delegates to the appropriate agent, then consolidates and reports the results. Agents don't communicate directly with each other — when conflicts arise, Oracle mediates.
Moments I Felt It in Practice
The DB Disaster Librarian Prevented
I was changing the schema for the Problem service. I tried to rename a column, but Librarian's rules blocked it.
Expand-Contract pattern enforced. Column deletion/rename must use a 3-phase deployment. Always assume a situation where old and new versions coexist during Rolling Updates.
Instead of a simple rename, I proceeded in three phases: (1) add a new column, (2) copy data + switch application code, (3) drop the old column. It felt tedious, but the change was completed with zero downtime in production. If I'd been on my own, I would've just thought "a rename will be fine" and moved on.
The Security Remnants Gatekeeper Caught
While refactoring authentication logic, there was leftover code storing tokens in localStorage. Even after switching to httpOnly Cookies, remnants of the old approach were hiding throughout the codebase. Gatekeeper's rules caught them.
SSE authentication:
EventSource(url, { withCredentials: true })— localStorage tokens cannot be used in an httpOnly Cookie environment. Sensitive data strictly prohibited: raw JWT, X-Internal-Key, OAuth tokens.
A human might think "I already fixed that, so it's probably fine" and move on. An agent inspects everything, every time.
Context That Never Disappears
Over 67 sprints, there were countless moments of "Why did we decide this before?" Because every decision was recorded in the MEMORY.md and Sprint ADRs that Scribe manages, I could find decisions from three months ago in five seconds.
The scariest thing about developing solo isn't features not working — it's forgetting why you built things a certain way. When context disappears, you repeat the same mistakes or agonize over alternatives you've already evaluated. The agent system solved this problem structurally.
Trial and Error in Workflow Design
Even with agents, if you get the calling order wrong, it's useless. Early on, I called whichever agent I needed on the spot, but as work ordering got tangled, one agent's output would break another agent's assumptions — over and over.
Eventually, I introduced an echelon-based execution order. Echelon 1 (Conductor, Gatekeeper, Librarian) locks down infrastructure and safety first, Echelon 2 (Architect, Postman, Curator, Scribe) implements features, and Echelon 3 (Sensei, Herald, Palette, Scout) wraps things up. Following this order significantly reduced dependency conflicts.
- Echelon 1Infrastructure · safety nets
- Echelon 2Feature implementation · business logic
- Echelon 3UX · analysis · wrap-up
The Direction Ahead
Running the agent system, there's a question I keep coming back to. How do you maintain quality while controlling AI and boosting productivity?
Tighten control, and productivity drops. Reviewing every agent's output manually is no different from doing it all yourself. Loosen the reins, and quality wavers. The moment will come when an agent's independent decision clashes with the overall architecture.
The balance I've found so far is this. Define clear role boundaries, manage shared rules from a single source, and keep escalation paths open. Agents make autonomous decisions within their domain, but when they step outside it, they must go through Oracle. Rules are managed as SSoT (Single Source of Truth) to prevent inconsistencies, and decisions that are too difficult get escalated to a human.
Whether this is the perfect answer, I still don't know. With each sprint, new problems surface, and I keep tweaking the system bit by bit. One thing is certain — this kind of thinking is something I never would have done on my own.
The Difference Between Doing It and Not
Reading about AI agent orchestration and actually doing it are completely different experiences. On paper, it boils down to "divide roles and create rules." In reality, the system was forged through dozens of incidents — workflows getting tangled, context being lost, rules falling out of sync.
If I hadn't tried it, I'd have stayed at the surface-level understanding that "AI writes code for you." Because I did try, I learned things. What to delegate to agents and what to do yourself. Where to draw the line between control and autonomy. How to combine one person's judgment with twelve agents' execution power.
In the end, the important thing was just starting.