ADR-002: Outbox Pattern Adoption Review
ADR-002: Outbox Pattern Adoption Review
- Status: Deferred
- Date: 2026-03-08
- Decision maker: Oracle
Context
The Submission service performs DB storage and RabbitMQ message publication sequentially when processing submissions. These two operations are not atomic — if MQ publication fails after DB commit, data inconsistency can occur.
Current flow:
- Save Submission to DB (commit)
- Publish analysis request message to RabbitMQ
- Publish push request message to GitHub Worker
Risk scenarios:
- 1 success + 2 failure: Submission saved but AI analysis never starts
- 1 success + 3 failure: Submission saved but GitHub push never executes
Current Compensating Controls (Applied in Sprint 43 W2)
- Optimistic lock: Prevents Lost Update via version column on Saga state changes
- Timeout resume: cron detects incomplete Sagas within a time window and restarts them
- Idempotency check: Each worker detects duplicate messages and prevents reprocessing (idempotency key)
These controls ensure eventual consistency even in the event of message loss, via timeout resume.
Options Under Review
Option A: Outbox Table + Polling Publisher
Store messages together in the Outbox table within the DB transaction; a separate Polling Publisher periodically reads the Outbox and publishes to MQ.
Pros:
- Guarantees atomicity of DB storage and message publication
- Eliminates possibility of message loss
Cons:
- Additional operational overhead of Outbox table + Polling service
- Increased processing latency due to polling delay
- Additional resource consumption on single OCI ARM instance
Option B: CDC (Change Data Capture, Debezium, etc.)
Captures PostgreSQL WAL and automatically publishes change events to MQ.
Pros:
- Minimal application code changes
- Near-real-time event propagation
Cons:
- Requires Debezium + Kafka Connect infrastructure (resource-heavy)
- Operationally unrealistic on OCI ARM 4 OCPU / 24GB environment
- Significantly increases operational complexity
Option C: Keep Current + Compensate (Currently Adopted)
Retain the optimistic lock + timeout resume + idempotency check applied in Sprint 43 W2.
Pros:
- No additional infrastructure needed
- Sufficient stability at current traffic levels
- Minimal operational complexity
Cons:
- Theoretical message loss possibility exists (recoverable via timeout resume)
- Timeout resume load may increase under traffic spikes
Decision
Option C (keep current + compensate) adopted.
Rationale:
- Resources are insufficient to operate additional Outbox Polling or CDC infrastructure on a single OCI ARM instance (4 OCPU / 24GB).
- At current traffic levels (study group scale, dozens of concurrent users), the actual probability of a failure caused by DB-MQ non-atomicity is extremely low.
- The timeout resume mechanism automatically recovers lost messages, ensuring eventual consistency.
Re-evaluation Triggers
Re-evaluate introduction of Option A (Outbox Pattern) if any of the following conditions are met:
- 500+ concurrent users or 1,000+ daily submissions
- 10+ duplicate processing events per month caused by timeout resume
- Infrastructure scale-up (e.g., multi-node migration) providing resource headroom