MSA Design: Humans Decide, AI Executes

ai-devarchitecturemsakubernetes

Why MSA?

When designing AlgoSu, the first question wasn't about architecture patterns. It was how to delegate work to AI agents.

With a monolith, a single agent would need to understand the entire codebase. Fixing auth logic would require knowing submission logic, understanding the DB schema, and more. I'd already learned from experience that the wider an agent's context gets, the less accurate it becomes.

What if, instead, we split services into small units? The narrower the domain an AI manages, the clearer its role becomes, and the more accurate its decisions within a single service. Gatekeeper handles only the Gateway, Librarian handles only DB schemas, Conductor handles only Submission. Aligning agent responsibility boundaries with service boundaries — that was the core reason for choosing MSA.

Of course, MSA is complex. Inter-service communication, distributed transactions, deployment pipelines — problems you'd never have to worry about in a monolith. But if AI agents can share that complexity, the tradeoff holds up.

How We Split the Services

The criteria for splitting services were simple. If data ownership differs, split the service.

Identity manages user data, Submission manages code submissions, Problem manages problems — each owns its own database, and none directly accesses another service's DB. We applied the Database per Service principle from day one, because we knew that splitting later would be migration hell.

Async tasks were extracted into separate workers. GitHub pushes and AI analysis are long-running operations — we couldn't make users wait. We made GitHub Worker and AI Analysis into independent services, connected via RabbitMQ.

The Gateway is the sole external entry point. It handles auth, routing, rate limiting, and SSE streaming. All other services communicate only within the cluster.

k3s on OCI ARM (24GB · 4 OCPU)Cloudflare Tunnel → 7 services
EdgeBrowser Entry Point

Frontend

:3001

Next.js 15 · App Router

Tailwind, shadcn/ui, SSE subscription

GatewayAPI Gateway (sole external entry path)

Gateway

:3000

NestJS

OAuth + JWT, X-Internal-Key issuance, Rate Limit, SSE

Backend ServicesBackend Services (Database per Service)

Identity

:3004

NestJS · identity_db

Users / Studies / Notifications / Share links

Submission

:3003

NestJS · submission_db

Saga Orchestrator · Code review

Problem

:3002

NestJS · problem_db

Problem CRUD · Deadline management

Async WorkersAsync Workers (RabbitMQ queue consumers)

GitHub Worker

:9100

Node.js · prefetch=2

submission.github_push queue → GitHub Push

AI Analysis

:8000

FastAPI · Circuit Breaker

submission.ai_analysis queue → Claude API

ExternalExternal Dependencies

Claude API

claude-haiku-4-5-20251001

MAX_TOKENS=8192, JSON 4-step fallback parsing

In this structure, the division of AI agent responsibilities falls into place naturally. Gatekeeper is responsible only for Gateway auth and security, Librarian manages only each service's DB schema, and Conductor handles only Submission's Saga logic. Since service boundaries are agent responsibility boundaries, the confusion of "whose job is this?" disappears.

How We Chose Communication Patterns

Once services are split, you need to decide how they talk to each other. We picked from four patterns based on use case.

  1. Sync HTTP
    Immediate response (Gateway → services)
  2. RabbitMQ Async
    Long-running tasks (GitHub Push, AI)
  3. Redis Pub/Sub
    Real-time event propagation
  4. SSE
    Browser real-time streaming

Synchronous HTTP is used when an immediate response is needed. Calls from the Gateway to Identity, Submission, and Problem fall here. Internal calls carry an X-Internal-Key header to block external access. The key validation logic was designed by Gatekeeper — it even applied crypto.timingSafeEqual on its own to prevent timing attacks.

RabbitMQ async messaging is used for long-running tasks. If we processed GitHub pushes and AI analysis synchronously after submission, users would wait over 30 seconds. With MQ-based async processing, we can respond immediately after submission. GitHub Worker's prefetch=2 was a setting proposed by Architect — a concurrency limit that accounts for GitHub API rate limits and OCI Free Tier resources.

Redis Pub/Sub is used for real-time event propagation between services. Every time a submission status changes, a message is published to the submission:status:{id} channel, and the Gateway's SSE Controller subscribes to it.

SSE is the final leg that streams data in real-time to the browser. Max connection time of 5 minutes, 30-second heartbeat, ownership verification — these safeguards were built through collaboration between Gatekeeper and Conductor. One notable issue was that creating a Redis subscriber per SSE connection would exhaust the connection pool. Solving this with a shared subscriber pattern was Conductor's call.

The choice of each communication pattern was made by a human. "GitHub push after submission must be async" — that's an architectural decision. But the detailed design that carries out that decision was filled in by AI agents.

The Journey of a Single Submission

Let's trace what actually happens with a single code submission to see this architecture in action.

When a user clicks "Submit," the Gateway validates the JWT, and the Submission Service's Saga Orchestrator manages the entire flow.

Saga state transitions — including failure branches

In this Saga design, what humans decided and what AI executed are clearly separated.

What humans decided: The order of saving to DB first, then publishing to MQ. If a service restarts, the record remains in the DB but the MQ message is lost. This idempotency ordering is an architectural principle — not something AI should decide on its own.

What AI (Conductor) executed: The implementation of optimistic locking to prevent backward transitions. Every state transition includes a WHERE sagaStep = currentStep condition to prevent duplicate processing. Conductor also designed the per-timeout retry logic — 5 minutes for DB_SAVED, 15 minutes for GITHUB_QUEUED, 30 minutes for AI_QUEUED. It even built recovery logic that automatically resumes incomplete Sagas within 1 hour of a service restart.

The human established the principle: "data must never be lost, even on failure." The AI built the concrete mechanisms to uphold that principle.

Auth, Security, Deployment — Who Decided What?

The remaining architectural decisions follow the same pattern. Humans set the direction; AI executes.

Auth: Whether to use httpOnly cookies or localStorage was a human decision. The reason was defense against XSS attacks. But the implementation of the OAuth flow, the automatic JWT renewal logic (TokenRefreshInterceptor, 5 minutes before expiry), and pinning the algorithm to HS256 — these details were executed by Gatekeeper following security principles.

Security: "All containers run as non-root, and privilege escalation is blocked" — that principle was set by a human. Architect applied it consistently across every k8s manifest. readOnlyRootFilesystem, capabilities.drop: ALL, emptyDir mounts only for necessary paths. Applying identical security settings across 6 services without missing a single one is easy to slip up on when doing it alone. Applying default-deny NetworkPolicy and whitelisting only required communication follows the same logic.

Deployment: Choosing GitOps was a human decision. When you push to main, GitHub Actions builds ARM aarch64 images, pushes them to GHCR with main-{git-sha} tags, and ArgoCD auto-deploys to the k3s cluster. Never using latest tags, guaranteeing zero-downtime deployment with maxUnavailable: 0 in RollingUpdate, running DB migrations via initContainer before the app starts — these decisions were designed by Architect following infrastructure principles.

Resource allocation on OCI ARM Free Tier follows the same pattern. Within the constraints of 24GB RAM and 4 OCPU, Architect determined how to distribute request/limit for each service. AI Analysis is the only service with a 2Gi memory limit — a decision based on measured data showing that parsing Claude API responses requires significant memory.

The Boundaries of Design

The most important thing I discovered while designing MSA with AI was boundaries.

There are design decisions you can delegate to AI, and there are ones humans must make. "Split databases per service," "handle external API calls asynchronously," "use httpOnly cookies for auth" — these architectural directions require understanding business context and tradeoffs. That's still too broad a domain for AI to judge on its own.

On the other hand, the execution after the direction is set — implementing optimistic locking, consistently applying security settings, tuning timeout values, detailed communication pattern design — is something AI does better than humans. It doesn't miss anything, stays consistent, and applies the same principles across 6 services uniformly.

So was choosing MSA the right call? We split services to make AI agent roles clear, and in practice, management became easier as agent responsibility boundaries aligned with service boundaries. Of course, the inherent complexity of MSA remains. But with AI sharing that complexity, I was able to maintain 6 services at a consistent quality level, even working solo.

There's no single right answer in architecture. But one thing is certain: when "developing with AI" becomes a premise, the architecture that fits that premise changes.