Session Lifetime Mismatch Bug Fix + SessionPolicy Modularization (Feedback-Based)
Sprint 71: Session Lifetime Mismatch Bug Fix + SessionPolicy Modularization (Feedback-Based)
Context
User feedback: "Sessions are too short. It's annoying to have to log in again when I come back from doing something else."
The planned JWT lifetime was 2 hours, with tokens refreshing while the user was active — sliding session was the intended behavior. However, diagnosis revealed a dual mismatch bug across 4 layers:
- Production env
JWT_EXPIRES_IN=7d— Mismatch with planned value2h. Legacy value was embedded in Sealed Secret. cookie.util.ts'sCOOKIE_MAX_AGE_SECONDS=60*60hardcoded — Browser deletes cookie after 1 hour regardless of JWT TTL.token-refresh.interceptor.ts'sREFRESH_THRESHOLD_SECONDS=5*60— Refresh only within 5 minutes of expiry → sliding effectively doesn't work even while user is active.useSessionKeepAlive.ts'sSESSION_TIMEOUT_MS=65*60*1000— Frontend forces logout at 65 minutes, conflicting with the planned 2h JWT.
In other words, fixing any single layer would cause another layer to terminate the session. The state was 4 separate sources of truth (SSoT).
The initial diagnosis derived a foundation plan (D2~D4) for converting cookie.util.ts to JWT exp SSoT and hardcoding threshold/frontend timeout fixes, but PM feedback ("don't hardcode sessions, modularize") was accepted and Oracle extended it to an alpha design (new SessionPolicyModule + GET /auth/session-policy public endpoint). Unified the single source of truth to server env and restructured so both server and client read derived policy values.
Decisions
D1: JWT_EXPIRES_IN 7d → 2h
- Context: Legacy
7dembedded in production Sealed Secret conflicted with planned value (2h). Token lifetime policy — not even a secret — was locked inside Sealed Secret, making changes unnecessarily difficult. - Choice: Add
env: JWT_EXPIRES_IN: "2h"block to Deployment in aether-gitopsalgosu/base/gateway.yaml, overriding envFrom (Sealed Secret) value using Kubernetes env override rule. Deploymentenv:takes precedence overenvFrom:, allowing policy value changes without re-sealing. - Alternatives: Re-seal Sealed Secret — requires
kubesealand restoring original plaintext.env. Token lifetime is not a secret so plaintext manifest management is structurally reasonable. Rejected. - Code Paths:
aether-gitops:algosu/base/gateway.yaml(commitbeb7f7d)
D2: Convert Cookie maxAge to JWT exp claim SSoT
- Context:
COOKIE_MAX_AGE_SECONDS=60*60hardcoded value deletes browser cookie after 1 hour regardless of JWT TTL. Cookie TTL and JWT TTL were in a dual source of truth state. - Choice: In
cookie.util.ts, extractexpclaim withjsonwebtoken.decodeand dynamically calculatemaxAge = (exp * 1000) - Date.now(). Defensive fallback1h+ JSON structured log (event: 'cookie_maxage_fallback') for missing exp/decode failure/already expired cases. Cookie maxAge auto-syncs whenever a new token is issued by sliding refresh. - Alternatives: Parse
JWT_EXPIRES_INwithConfigService(mslibrary needed) — adds new dependency + maintains dual path ignoring JWT's ownexp. Using JWT exp as SSoT is structurally safer. - Code Paths:
services/gateway/src/auth/cookie.util.ts,services/gateway/src/auth/cookie.util.spec.ts
D3: Sliding refresh threshold 5 minutes → 60 minutes
- Context: Existing
REFRESH_THRESHOLD_SECONDS=5*60(5 minutes) practically fails to implement the "reset session if user is active" policy. With 2h TTL, no refresh occurs for the first 1h 55m even when requests come in, so a user returning after 55 minutes of inactivity immediately gets logged out due to insufficient remaining time. - Choice: Extend
REFRESH_THRESHOLD_SECONDSto60*60(1 hour, 50% of TTL). From halfway through the 2h TTL, sliding refresh occurs per request → effectively infinite sliding for active users. - Alternatives: Refresh on every request — JWT re-signing CPU overhead + Set-Cookie header on every response increases response size. 50% threshold is performance/UX tradeoff.
- Code Paths:
services/gateway/src/auth/token-refresh.interceptor.ts,services/gateway/src/auth/token-refresh.interceptor.spec.ts - Note: → Re-implemented with SessionPolicyService injection in 71-1R. Hardcoded constant removed.
D4: Frontend SESSION_TIMEOUT_MS 65 minutes → 125 minutes
- Context: Frontend heartbeat forces logout judgment at 65 minutes → conflicts with planned 2h JWT (frontend terminates session before server).
- Choice: Extend
SESSION_TIMEOUT_MS = 125 * 60 * 1000(2h + 5 minute buffer). With heartbeat at 10-minute intervals, 12+ sliding refresh opportunities provided. 5-minute buffer prevents race where frontend timer expires immediately after server sliding refresh. - Alternatives: Rely only on server 401 response and remove client timeout — would lose detection UX when offline/network disconnected (session quietly terminates without last-activity modal). Keeping heartbeat is safer.
- Code Paths:
frontend/src/hooks/useSessionKeepAlive.ts - Note: → Converted to server policy fetch in 71-2R, keeping only
DEFAULT_SESSION_POLICYfallback. Hardcoded constant removed.
D5: Keep demo user 2h hardcoding
- Context:
'2h'hardcoding exists inoauth.service.ts:466for demo user JWT issuance, which coincidentally matches Sprint 71 planned value (2h). - Choice: Maintain for this sprint. Unifying to reuse
JWT_EXPIRES_INenv will be a separate refactoring task. - Alternatives: Unify immediately — demo account-specific policy (e.g., demo is 30 minutes) may be required in the future, so conservatively keep separate.
- Code Paths:
services/gateway/src/auth/oauth.service.ts:466(no change in this sprint) - Note: → Separated to
JWT_DEMO_EXPIRES_INenv in 71-1R with SessionPolicyService parsing. Extended to structure allowing separate policy (e.g., 30 minutes) for demo.
D6: SessionPolicyModule introduction + public endpoint (alpha design)
- Context: The initial plan (D2~D4) was merely replacing hardcoded constants with different values. The risk of synchronization failure across server (env), Gateway code constants, and Frontend constants remained. PM feedback ("don't hardcode sessions, modularize") prompted Oracle to redesign. Fundamental treatment requires an architecture where all policy values are derived from a single server SSoT.
- Choice: Create new
SessionPolicyModule(service + controller + spec) in Gateway. Parse 5 env vars (JWT_EXPIRES_IN,JWT_DEMO_EXPIRES_IN,SESSION_REFRESH_THRESHOLD,SESSION_HEARTBEAT_INTERVAL,SESSION_TIMEOUT_BUFFER) in milliseconds as single SSoT. Built-in duration parser (Nh|Nm|Ns|Nms|Nd) to avoid directmspackage dependency (maintain transitive-only status). Provide{ accessTokenTtlMs, heartbeatIntervalMs, sessionTimeoutMs, refreshThresholdMs }DTO to client via public endpointGET /auth/session-policy. Convert all consumers (JwtModule, OAuthService, TokenRefreshInterceptor) to inject SessionPolicyService. - Alternatives:
NEXT_PUBLIC_*build-time injection — Sprint 65 G1 (NEXT_PUBLIC_build-time inline risk) recurrence concern. Cannot separate build-time from runtime values.- Constants file (
session-policy.constants.ts) — no env integration, not a fundamental solution. Rejected. - Direct
mspackage dependency — currently transitive-only, relying on hoisting. Harder to fail-fast on missing parser, so self-parser is safer.
- Code Paths:
services/gateway/src/auth/session-policy/session-policy.module.tsservices/gateway/src/auth/session-policy/session-policy.service.tsservices/gateway/src/auth/session-policy/session-policy.controller.tsservices/gateway/src/auth/session-policy/session-policy.service.spec.tsservices/gateway/src/auth/auth.module.ts(JwtModule.registerAsync)services/gateway/src/auth/oauth/oauth.module.ts,oauth.service.ts,oauth.service.spec.tsservices/gateway/src/auth/token-refresh.interceptor.ts,token-refresh.interceptor.spec.tsservices/gateway/src/app.module.ts(JwtMiddleware exclude —/auth/session-policypublic)frontend/src/lib/session-policy.ts(new)frontend/src/lib/__tests__/session-policy.test.ts(new)frontend/src/contexts/AuthContext.tsx(sessionPolicy state + app init fetch)frontend/src/hooks/useSessionKeepAlive.ts(remove hardcoding, receive policy prop)aether-gitops:algosu/base/gateway.yaml(5 env vars, commitsbeb7f7d+9c6968f)
Patterns
P1: JWT exp claim as SSoT for downstream state
- Where:
services/gateway/src/auth/cookie.util.ts - When to Reuse: When there's a JWT token and supplementary state tied to it (Cookie maxAge, frontend session timer, cache TTL, reconnect timer, etc.). Using
jsonwebtoken.decode(token).expas SSoT instead of separate constants/env means changing the TTL policy only requires updating one place (token issuance point) and derived state auto-syncs. Defensive fallback + structured log for derived calculation failures is essential.
P2: Server → client policy value propagation pipeline
- Where:
services/gateway/src/auth/session-policy/,frontend/src/lib/session-policy.ts - When to Reuse: When policy values controlled by server env (session, rate limit, feature flags, etc.) also need to be known by the client. Place Policy Service on server to parse env as SSoT, expose DTO via public
GETendpoint, client fetches once at app boot +DEFAULT_*fallback for boot stability. Runtime propagation method that avoidsNEXT_PUBLIC_*build-time injection. Implement parser in-house rather than using external package for fail-fast guarantee.
Gotchas
G1: Hardcoded constants bypassing env vars create duplicate SSoT
- Symptom: No matter how
JWT_EXPIRES_INenv is changed to2h/7d/any value, browser always deletes cookie after exactly 1 hour. User reproduces: "I set it to 2 hours but got logged out after 1 hour." - Root Cause:
COOKIE_MAX_AGE_SECONDS = 60 * 60hardcoded constant incookie.util.tsmakes no reference toJWT_EXPIRES_INenv. Token lifetime and cookie lifetime are in a structurally separated dual SSoT state. - Fix: Elevate the source of derived state from env to the token itself (JWT
expclaim) to unify SSoT. - Lesson: Don't place constants with "coincidentally same meaning" near policy values controlled by env vars. If they share the same semantic, they must reference the same source of truth. Any point where changing one env var doesn't cascade to related state is the seed of the next bug.
G2: Non-secret policy values inside Sealed Secret
- Symptom: Token lifetime policy like
JWT_EXPIRES_INstored encrypted in Sealed Secret, requiringkubeseal+ original plaintext.envrestoration procedure to modify. Unnecessarily high change barrier. - Root Cause: Initial env var configuration sealed entire
.envinto Sealed Secret. Bundled without distinguishing secrets from non-secrets. - Fix (this sprint): Use Kubernetes rule where Deployment
env:block overridesenvFrom:(Sealed Secret) to change policy value without re-sealing. - Lesson: Manage non-secret policy values (TTL, feature flags, thresholds) with ConfigMap or Deployment
env:. Use Sealed Secret only for true secrets (JWT signing key, DB password, API token). At initial design stage, use "would exposure of this value be a security problem?" as the separation criterion.
G3: Recurring structure of hardcoded policy values
- Symptom: When constants with the same meaning as env-controlled policies (e.g., JWT TTL) — like
REFRESH_THRESHOLD_SECONDS,SESSION_TIMEOUT_MS, demo'2h'literal — are scattered in code, changing just env creates "silent mismatch" bugs where behavior doesn't change. Even the Sprint 71 initial plan (D2~D4) only replaced constant values, leaving recurrence risk. - Root Cause: "Constant" as a mental habit omits env integration. When config injection path (Config/Policy Service) is not established at initial implementation, developers start with local constants and leave them as-is.
- Fix: Unify all points expressing policy values to "policy service injection". When constant literals (number × unit form) appear with
*policy*,*session*,*ttl*,*timeout*etc., raise alert at review. Sprint 71 alpha design (D6) introduced SessionPolicyModule for structural cleanup. - Lesson: When adding env vars, "search consumers → grep similar-meaning constants → list conversion scope in PR" checklist needed. Replacing constants instead of converting to injection structure at once allows same bug to recur.
Metrics
- Commits (AlgoSu): Pending aggregation (Oracle batch commit planned)
- Commits (aether-gitops): 2
beb7f7d— gateway.yamlJWT_EXPIRES_INenv override9c6968f— SessionPolicy env 4 additions (JWT_DEMO_EXPIRES_IN,SESSION_REFRESH_THRESHOLD,SESSION_HEARTBEAT_INTERVAL,SESSION_TIMEOUT_BUFFER)
- Files changed (AlgoSu): Total 19 files
- Gateway (14 files):
- New 4:
session-policy/session-policy.module.ts,session-policy.service.ts,session-policy.controller.ts,session-policy.service.spec.ts - 71-1 original 2:
auth/cookie.util.ts,auth/cookie.util.spec.ts - Modified 8:
auth/auth.module.ts,auth/oauth/oauth.module.ts,auth/oauth.service.ts,auth/oauth.service.spec.ts,auth/token-refresh.interceptor.ts,auth/token-refresh.interceptor.spec.ts,app.module.ts(JwtMiddleware exclude),main.ts(if needed)
- New 4:
- Frontend (4 files):
- New 2:
src/lib/session-policy.ts,src/lib/__tests__/session-policy.test.ts - Modified 2:
src/hooks/useSessionKeepAlive.ts,src/contexts/AuthContext.tsx
- New 2:
- Documentation (1 file):
docs/adr/sprints/sprint-71.md(this file)
- Gateway (14 files):
- Files changed (aether-gitops): 1 (
algosu/base/gateway.yaml) - Tests: Gateway 47 suites / 688 tests pass (8 new session-policy), Frontend 110 suites / 1135 tests pass (16 new session-policy)
- New external dependencies: None (
mspackage remains existing transitive state, self-implemented duration parser.jsonwebtokenalready in use in Gateway)
Follow-up Recommendations (outside Sprint 71 scope)
Add to MEMORY.md follow-up section:
- aether-gitops
algosu/base/sealed-gateway-secrets.yamlorphan file cleanup — Found by Herald 71-3. Sealed Secret manifest not referenced anywhere. - Remove
JWT_EXPIRES_INkey from Sealed Secret — kubeseal re-sealing task. Currently neutralized by Deployment env override, but fundamentally non-secret keys should be excluded from Sealed Secret. - Integrate
cookie.util.tsfallback logging withStructuredLoggerService— Currently usingconsole.warn+ JSON payload direct output for lightweight logging. Refactoring needed for alignment with production log pipeline. - Formal Refresh Token + Redis introduction review (long-term) — Current sliding refresh re-signs single JWT. Long-term task for Refresh Token separation + server-side session revoke capability.
- Reuse
SessionPolicyModulepattern for other policy values — Review applying same pipeline (env → PolicyService → public GET DTO → client fetch) to policy values currently scattered as code constants: rate limit, circuit breaker thresholds, upload size limits, etc. GET /auth/session-policyresponse caching (SWR/ETag) — long-term — Currently sufficient with one fetch at app boot, but if policy becomes runtime-changeable (e.g., admin UI adjusts TTL), need SWR/ETag-based refresh strategy.JWT_EXPIRES_INSealed Secret legacy value (7d) cleanup — Currently hidden by Deployment env override, but recommend fundamental removal via kubeseal re-sealing. Perform alongside rebuilding sealing procedure.