CI Improvement — Full Expansion + AI Coverage Integration + Measurement
Sprint 104 — CI Improvement: Full Expansion + AI Coverage Integration + Measurement
Background
Continuing from Sprint 103's "pilot then expand" decision, this sprint's core tasks were expanding the setup-node-service composite action from github-worker only to all Node services, and integrating ai-analysis (FastAPI/pytest) into the global coverage-gate. Simultaneously, the Oracle __AGENT_DONE__ marker bug carried over from Sprint 102 was resolved through Oracle self-fix, restoring agent chain automation reliability.
This sprint also completes the expansion phase of Channeltalk CI refactoring Principle ⑤ (validate in a small service first, then expand).
Goals
- Expand
setup-node-servicecomposite action to all Node services (quality-nestjs / audit-npm / test-node matrix integration) - Integrate ai-analysis (Python/pytest) coverage into the global coverage-gate
- Restore automation reliability by fixing the Oracle agent chain
__AGENT_DONE__marker missing bug - Obtain Sprint 103 pilot actual measurement data (github-worker pre/post timing comparison)
Work Summary
| Task | Agent | Output |
|---|---|---|
| Composite action full-service expansion | Architect | PR #113 (00650bf) |
| ai-analysis coverage-gate integration | Architect | PR #113 (00650bf) |
| Oracle runner trap fix | Oracle (direct) | ~/.claude/oracle/bin/oracle-spawn.sh |
| github-worker timing measurement | Sensei | ✅ Sprint 105 retrospective completion |
| ADR update | Scribe | This document + sprint-103.md retrospective |
Changes
1. Composite Action Expansion (ci.yml)
- Removed
matrix.service != 'github-worker'condition from quality-nestjs / audit-npm / test-node 3 jobs - Inline 3 steps (Setup Node + Cache + Install) × 3 jobs = 67 lines deleted
- All Node services (gateway, identity, submission, problem, github-worker) now go through composite
- audit-npm retains
install-command: 'npm ci --ignore-scripts'(security scan policy)
2. AI Coverage-Gate Integration (ci.yml)
test-ai-analysis: added--cov-report=lcov:coverage/lcov.infoto pytest- artifact path expanded to multi-line (xml + lcov)
test-ai-analysisadded tocoverage-gateneeds, if condition expandedscripts/check-coverage.mjs: no changes required (recursive discovery automatically recognizes ai-analysis)
3. Oracle Runner __AGENT_DONE__ Marker (Local Infrastructure)
~/.claude/oracle/bin/oracle-spawn.sh:127-149runner heredoc reconstructedset -euo pipefail→set -uo pipefail(errexit removed)cleanup()function +trap cleanup EXIT: guarantees marker echo + lock removal + reap + dispatch on any exit path- Covers all 4 failure modes: tee pipe SIGPIPE / empty output / tmux detached
- Reference memory:
memory/issue-oracle-pipeline-agent-done-marker.md
4. Measurement Attempt (Sensei)
Sensei attempted Pre/Post sample collection via gh run list + jobs API. Pre 4-run average (Quality 22.2s / Audit 19.8s / Test 19.2s) successfully obtained, but both PR #111/#113 modified only workflow files — detect-changes skipped github-worker → Post sample count: 0. No runs observed where composite action executed through actual install/test path. Full report: ~/.claude/oracle/inbox/sensei-sprint-104-timing.md
Sprint 105 [B] retrospective actual measurement result (task-20260421-sp105-b3-finalize):
| Job | Pre-change avg (n=4) | Post-change avg (n=3)² | Difference | Practical verdict |
|---|---|---|---|---|
| Quality — github-worker | 22.2s (σ 5.8s) | 22.3s (σ 2.5s) | +0.1s (+0.4%) | ✅ No change |
| Audit — github-worker | 19.8s (σ 3.7s) | 18.0s (σ 3.0s) | -1.8s (-8.9%) | ✅ No change |
| Test GitHub Worker | 19.2s (σ 1.9s) | 20.0s (σ 1.0s) | +0.8s (+3.9%) | ✅ No change |
² Post n=3: 2 natural runs (run 24702740418·24702828670) + 1 synthetic (run 24703075569 rebuild_all). Detail: ~/.claude/oracle/inbox/sensei-sprint-105-timing.md
Lessons Learned
1. CI infrastructure PRs cannot generate their own actual measurement data due to detect-changes skip
Both PR #111 (Sprint 103) and PR #113 (Sprint 104) modified only .github/workflows/ci.yml. detect-changes did not detect services/github-worker/** changes — entire matrix skipped. Response: Standardize workflow_dispatch(rebuild_all=true) once alongside composite/pipeline change PRs, or use a dummy touch commit. Deferred to Sprint 105.
2. "Pilot then expand" decisions are justified by code duplication removal alone
67-line deletion (25% compression) secures maintainability. Actual performance measurement is a follow-up validation, not the basis for the expansion decision. Reconfirmed that Sprint 103's original plan ("measure from 2nd run after merge") was an asynchronous structure.
3. trap EXIT covers all 4 shell runner failure modes in one guard
set -e + tee pipeline fails to execute subsequent cleanup if even one of normal exit / claude failure / SIGPIPE / signal receipt is missing. trap EXIT intercepts all exit paths — single guard handles all 4 cases. Borrowable for other dispatch system designs.
4. Script-zero-change is the best approach for AI coverage integration
scripts/check-coverage.mjs already has recursive discovery structure implying extensibility. Simply adding the lcov reporter to pytest enables automatic recognition. Maintains the simplicity of handling Python/Node coverage in a single script. Sprint 103 lesson "test 0-artifact scenario for infrastructure PRs" is also applicable here.
Carried Over (Sprint 105)
- github-worker actual measurement completion — re-measurement trigger: when 5 real github-worker change PRs are available
workflow_dispatch(rebuild_all=true)guardrail introduction (Sensei recommendation, Sprint 104 Lesson 1 response)- L2 cache layer (build output) — scope definition needed
- Frontend build optimization — scope definition needed
- Global coverage threshold upgrade to 70% — after actual measurement data obtained
- commitlint scope-enum management automation (pre-commit hook consideration)
References
- Sprint 103 ADR:
docs/adr/sprints/sprint-103.md - PR #113: composite expansion + AI coverage (
00650bf) - Channeltalk CI refactoring: https://channel.io/ko/team/blog/articles/backend-ci-refactoring-73fca77d
- Sensei measurement report:
~/.claude/oracle/inbox/sensei-sprint-104-timing.md - Oracle issue memory:
~/.claude/projects/-Users-leokim-Desktop-leo-kim-AlgoSu/memory/issue-oracle-pipeline-agent-done-marker.md