Engineering roadmap — future work¶

Forward-looking engineering backlog consolidated at handover. Sourced from runbooks, ADRs, the security register, performance-test results, the docs portal audit, and project memory. Every item has a file:line reference or memory citation; severity is tagged so the customer team can sequence the work.

Distinct from: - business/roadmap.md — customer-adoption phases (commercial milestones). - delivery/phased-rollout.md — week-by-week rollout timeline. - engineering/dependencies.md — third-party SDK / vendor dependency map (sibling agent's deliverable).

Open security follow-ups live in security/internal/findings.md. This doc references but does not duplicate the SR-NNN register.

Section 1 — Executive summary¶

Priority	Items	What's in this bucket
Critical	4	Production blockers — load failure, captcha single-point-of-failure, OOM-by-default, withdraw-lock unit bug.
High	9	Functional gaps that will fail under production traffic, admin-fe stack, and known security follow-ups already triaged.
Medium	12	Performance v2, observability gaps, ADR-required design decisions, doc-portal `{{TBD}}` burndown.
Low	9	Nice-to-have refactors, abstraction polish, post-launch hardening.
Total	34	Across bugs, perf, docs, architecture, features, tech debt.

Top 3 critical items (by Section 2 ordering):

bcrypt-cost p95 collapse — sign-in p95 jumps 15 ms → 1.09 s between 200 and 1000 VU; lower the cost factor (effort: S).
Redis maxmemory unset — local stack and any deployed environment that didn't override REDIS_ARGS will grow until container OOM (effort: S).
socket.io single-instance ceiling — per-instance clientSockets Map blocks rt horizontal scaling; install @socket.io/redis-adapter (effort: M).

All three are required before the perf-test envelope is re-run at 10k VU.

Section 2 — Known bugs (with locations)¶

Surfaced during build, perf-test, security review, or doc audit; not yet fixed. Severity reflects production-impact, not CVSS — combine with the CVSS in security/internal/findings.md when prioritizing security work.

How to read this table: severity is "what happens if this hits us in production" not "how bad is the vulnerability in isolation". A Critical row should block GA; a High row should be addressed in the first quarter post-handover; Medium rows fit Q3 and beyond.

#	Severity	Title	Location	Discovered by	Recommended fix
1	Critical	bcrypt cost too high under load — sign-in p95 = 1.09 s @ 1000 VU (70× degradation vs. 200 VU baseline)	`apps/api/src/.../user.service.ts` (auth)	Perf test 2026-04-25, see `performance-test-report-results.md` §"Stage 2"	Lower bcrypt cost from 10 → 8 (still ≥10⁵ guesses to break); 4× CPU reduction; expected p95 drop to ~150 ms at 1000 VU
2	Critical	Redis `maxmemory` unset → unbounded growth → host OOM under load	`docker-compose.yml` `REDIS_ARGS`; runbook trace at `runbooks/redis-memory-pressure.md:23`, `:138`, `:162`	Runbook authoring, Apr 2026	Set `--maxmemory <N>gb --maxmemory-policy allkeys-lru` in compose; author production-sizing ADR
3	Critical	socket.io has no Redis adapter → `rt` cannot horizontally scale; per-instance `clientSockets` Map drops cross-instance emits	`apps/rt/src/.../client.gateway.ts`; runbook `runbooks/ws-adapter-scale-out.md`; register row SR-030	Runbook authoring + perf test	Install `@socket.io/redis-adapter`, wire shared Redis pub/sub, migrate per-IP counters to Redis
4	Critical	`lockWithdrawOnClaimHours` unit confusion — field named "Hours" but math is `* 60 * 1000` (= minutes) → 24-hour lock locks for 24 minutes	`ebit-api/apps/api/src/promo/utils/code.utils.ts:50`; see `features/bonuses-and-promos.md:203-209`	Flow-doc audit	Confirm intended semantic with product, then either rename column or fix multiplier to `* 60 * 60 * 1000`
5	High	Captcha is single-provider (Google reCAPTCHA), URL hard-coded	`ebit-api/apps/api/src/captcha/google/recaptcha.service.ts:51`; runbook `runbooks/captcha-break-glass.md:75`	Runbook authoring	Introduce `CaptchaProvider` interface + `CAPTCHA_PROVIDER` env; add hCaptcha or Cloudflare Turnstile as backup. New ADR required.
6	High	No `DISABLE_CAPTCHA` env break-glass; only `NODE_ENV=local` short-circuits	`apps/api/src/captcha/...`; runbook `runbooks/captcha-break-glass.md:90`	Runbook authoring	5-line code change: `if (process.env.DISABLE_CAPTCHA === 'true') return;`
7	High	Doppler has stale `GEETEST_*` keys but the live code path is reCAPTCHA — env drift	Doppler `dev_perf` config; see `doppler-perf-audit.md`	Doppler audit, Apr 2026	Remove stale keys from Doppler; add CI check that env vars referenced in code exist and unreferenced keys are pruned
8	High	Prisma `connection_limit` defaults to 10 → pool exhaustion at 1000+ VU	Prisma client config; see `performance-test-report-results.md:71`	Perf test, Apr 2026	Set `connection_limit=50` in `DATABASE_URL`; document in production-sizing ADR
9	High	ebit-admin-fe — 4 stacked auth/observability bugs blocking cross-service admin tracing (cookie-name mismatch, `disable_set_cookies_and_mask_tokens=false`, silent middleware fall-through, missing `@vercel/otel`)	`ebit-admin-fe/src/middleware.ts:68-90`; `ebit-admin-fe/src/utils/cookies.ts:52-72`; `ebit-admin-fe/src/instrumentation.ts`; register row SR-025	E2E test #12 + flow-doc work; memory: `project_admin_fe_auth_bugs.md`	Three-pronged fix in admin-fe repo: (1) align cookie names + flip the body-token feature flag; (2) uncomment `leaveFromAccount` in middleware catch; (3) mirror `ebit-fe` instrumentation with `@vercel/otel` + Sentry-DSN gate
10	High	`ebit-bj` app is orphaned — port 4002 ships but no in-repo FE points at it; dropbet UX flows entirely through ebit-api's `/casino/games/house/blackjack/*`	`ebit-api/apps/bj/`; memory: `project_ebit_bj_orphan.md`; ADR-0011 §"Monorepo escape hatch"	Task #33 scoping	Disposition decision required: (a) delete `apps/bj/` outright, (b) rewire dropbet through it via proxy, or (c) document as intentionally-dormant with ADR. Image-builds today; cost not free.
11	High	Bet-place idle p95 ≈ 108 ms — already exceeds 100 ms SLO at 1 VU; root cause is the synchronous Prisma transaction wrapping BullMQ enqueue + Redis pub/sub RPC	See `performance-test-report-results.md:89`	Pre-existing baseline	Profile with `tests-perf/deep-metrics/flame-cpu.sh`; consider moving the pub/sub RPC outside the transaction or replacing with direct call
12	High	Cross-service trace propagation gap — `@ExternalControllerClient` Redis pub/sub transport doesn't carry W3C `traceparent` → speed-roulette and any future cross-app calls show as orphan roots	`ebit-api/libs/gateway/src/ms-controller/`; ADR-0005; memory: `project_otel_microservice_transport_gap.md`; register row SR-026 (accepted)	OTel coverage audit	Custom Nest microservice interceptor that serializes the active OTel context into the message envelope; re-extract on callee. Currently accepted as observability gap.
13	High	Speed-roulette per-job timeout absent — `concurrency: 1` queue can deadlock if a job exhausts retries without re-adding follow-up	`ebit-api/apps/speed-roulette/.../roulette-state.processor.ts:23`, `:147-160`; runbook `runbooks/speed-roulette-deadlock.md:186`; register row SR-024	Runbook authoring	Add `Job.opts.timeout` (e.g. `lockDuration: 30_000`) + watchdog cron that re-enqueues bootstrap when the queue is stale. ADR-required.
14	Medium	Reset-token JWT secret reused for email-verification — leak compromises both flows	`apps/api/src/.../user.service.ts:858, 893`; register row SR-011	Security audit	Separate secrets per token-type; add DB-side one-shot token table with `consumed_at` (also closes SR-012)
15	Medium	BullMQ broadcast queue depth has no Grafana alert — operators have no early warning when `rt` is back-pressuring	Grafana provisioning under `observability/grafana/...`; runbook `runbooks/ws-adapter-scale-out.md:180`	Runbook authoring	File alert: `bull:*-broadcast:wait > 100 for > 60s`

Cross-reference: every "High" + "Critical" entry above maps to either an SR-NNN row in security/internal/findings.md or a {{TBD}} slot in a runbook — see Section 4 for the docs-portal correspondence.

Section 3 — Performance bottlenecks (post-test backlog)¶

Direct extract of the recommendations table in performance-test-report-results.md §"Recommended remediations". Order = expected impact at 1000 VU.

Context: the most recent perf run (2026-04-25) measured a sign-in p95 of 15 ms at 200 VU and 1.09 s at 1000 VU on a c7g.4xlarge SUT (16 vCPU, 32 GB). Bet-place was not exercised under load because sign-in saturated first; bet-place idle p95 at 1 VU is already 108 ms (over the 100 ms SLO). Both observations point to the items in this section.

#	Action	Effort	Expected effect	Cross-ref
1	Lower bcrypt cost from default 10 → 8 (still ≥10⁵ guesses to break)	S	4× CPU reduction; p95 likely drops to ~150 ms at 1000 VU	Bug #1 in §2
2	Increase Prisma `connection_limit=50` (currently default 10)	S	Removes pool wait contention	Bug #8 in §2
3	Add ebit-api horizontal scaling (single-instance today) — autoscaling group + ALB + sticky sessions	M	Linear capacity growth	Section 5 — new ADR required
4	Wire `socket.io-redis-adapter` for `rt` horizontal scaling — currently per-instance `clientSockets` Map	M	Multi-replica `rt` works without dropping emits	Bug #3 in §2; SR-030
5	Profile with `tests-perf/deep-metrics/flame-cpu.sh` to confirm bcrypt is THE bottleneck (not just the suspected one)	S	Confirms or refutes hypothesis #1 — if wrong, redirects effort	`perf-trace-coverage-audit.md`
6	Bet-place 100 ms SLO recovery — break the synchronous transaction-wraps-RPC anti-pattern	L	Restores headroom for bet-place (currently >SLO at 1 VU)	Bug #11 in §2
7	Re-run stepped-ramp 1k → 10k VU after #1–#5 land; capture clean numbers; soak 1h sustained	M	Validates the full perf-test envelope; was deferred this run	`performance-test-report-results.md:149`
8	Tail-sampling rate review — current 10% probabilistic; revisit if dashboards show exemplar-density gaps	S	Maintains forensic coverage without overflowing 50 GB Badger budget	ADR-0012 §"Revisit triggers"

The perf-test rig (Terraform terraform/perf/) is destroyed per teardown; re-applying takes ~3.5 hr × $1.65/hr ≈ $5.78 per pass.

Section 4 — Documentation gaps (38 TBDs from PORTAL-AUDIT)¶

PORTAL-AUDIT.md v3 §3 categorized 38 engineering-fillable {{TBD}} slots. They are surfaced by the CI scanner and are correctly marked — burn-down is normal-cadence work, not a customer-share blocker.

Distribution per the audit: 8 each in handover/ (customer) and versions/ (engineering); 7 each in security/ and engineering/; 6 in runbooks/; 5 in recipes/; 4 in delivery/ (customer); 3 in api/; 2 in incidents/ (customer). The customer-fillable share is real but not engineering's domain to close.

4a. Engineering-fillable (~38 slots) — ours to close¶

Highest-leverage to close first (mirrors the bugs in §2):

Production CDN/WAF runbook (runbooks/ws-adapter-scale-out.md:159 + sibling references) — depends on customer-team CDN choice. Engineering authors the procedure; customer fills the vendor.
Redis sizing tuning (runbooks/redis-memory-pressure.md:23, 138) — once production stack is finalized.
Captcha provider abstraction ADR (runbooks/captcha-break-glass.md:75).
socket.io Redis adapter ADR (runbooks/ws-adapter-scale-out.md:137).
Postgres PITR procedure (runbooks/db-down.md:127).
Production Postgres replication setup (runbooks/db-down.md:187).
Production Postgres connection-limit sizing (runbooks/db-high-load.md:155).
Speed-roulette job-timeout policy ADR (runbooks/speed-roulette-deadlock.md:186).
BullMQ broadcast-queue Grafana alert (runbooks/ws-adapter-scale-out.md:180).
FE-side reCAPTCHA token caching policy (runbooks/captcha-break-glass.md:114).

The remaining ~28 are scattered across engineering/api.md, security back-references, and recipes/-pending-features. Each is a one-paragraph fill once the corresponding decision lands.

4b. Customer-team-fillable (~14 slots) — not engineering's domain¶

Per PORTAL-AUDIT.md:182: contact info, PagerDuty schedule names, Slack channel names, video bridge URLs, contractual SLA wording, vendor-NDA pre-cleared phrasing. Expected empty at handover.

4c. Naturally future-state — fills with time¶

Post-launch incident records (none yet); SLO actuals after first week of prod traffic; recipes for features not yet shipped (mobile companion, additional locales).

4d. Structural debt¶

Mermaid corpus is clean as of MERMAID-AUDIT.md: 81/81 blocks parse. No structural debt there. Open delivery-track anchor links (21 broken anchors per PORTAL-AUDIT.md:127-138) are a delivery-doc fix, not engineering's.

Section 5 — Architectural intent items¶

ADRs that explicitly flagged "may revisit if X" triggers, plus new ADRs that need to be authored to close {{TBD}} slots.

5a. Existing ADRs with revisit triggers¶

ADR	Trigger	Likelihood	If triggered
ADR-0011 §"Revisit triggers"	`rt` becomes the bottleneck and wants a different runtime; backend team grows past ~5; second backend stack (Python/Go) needs to coexist	Medium — `rt` scaling is the §3 item #4 question	Split `rt` into its own repo; `libs/` stay clean (no Nest-specific assumptions in shared types)
ADR-0003 §"Future Fast Track decision"	Product ships Fast Track	Low-medium — depends on commercial commitment	Set `disabled = false` at `fast-track.rmq.module.ts:8`; populate Doppler `FASTTRACK_JWT_*`; verify 11 call sites; supersede this ADR
ADR-0009 §"Considered alternatives"	Badger sweats under high-cardinality writes	Low — perf-test held within budget	Migrate to Tempo single-binary on local backend (documented fallback); accept TraceQL learning curve
ADR-0012 §"Revisit triggers"	Probabilistic 10% causes exemplar-density gaps; forensic budget grows; compliance forces 100% retention	Low	Move to higher rate; revisit when EBS budget grows

5b. New ADRs required (not yet authored)¶

Topic	Driver	Sponsoring source
Redis memory cap policy (`maxmemory` + eviction strategy)	Bug #2; runbook gap	`runbooks/redis-memory-pressure.md:162`
socket.io Redis adapter + per-IP counter migration	Bug #3; SR-030	`runbooks/ws-adapter-scale-out.md:137`
Captcha provider abstraction (interface + at least one backup)	Bug #5; runbook gap	`runbooks/captcha-break-glass.md:75`
Speed-roulette job-timeout policy	Bug #13; SR-024	`runbooks/speed-roulette-deadlock.md:186`
Production Postgres sizing — pool, replication, PITR	Bug #8 + runbook gaps	`runbooks/db-*.md`
ebit-api horizontal-scaling topology — ASG + ALB + session affinity	Perf #3	`performance-test-report-results.md:80`
`ebit-bj` orphan disposition	Bug #10; ADR-0011 escape hatch	Memory: `project_ebit_bj_orphan.md`
OTel context propagation across `@ExternalControllerClient`	Bug #12; SR-026	ADR-0005, memory: `project_otel_microservice_transport_gap.md`

Each new ADR should follow the format under adr/README.md (Context / Decision / Considered alternatives / Consequences / Revisit triggers / References). Sponsoring source means: which runbook or report is currently embedding the {{TBD}} slot the ADR closes.

Section 6 — Feature roadmap (TBD with product team)¶

Product / commercial wishlist items surfaced in code, recipes, or memories — not yet specified. Each carries {{TBD with product}} until the customer team scopes.

Feature	Status	Effort	Prereqs	Customer-team input needed
Mobile-app companion (web-only today)	Proposed	L	API contract finalization; auth-flow review (cookie-domain semantics differ on native)	`{{TBD with product}}` — target platforms (iOS/Android), distribution model
Multi-currency improvements	Proposed	M	Resolve SR-036 (request-time FX vs row-stamped), SR-037 (TETH/ETH ambiguity)	`{{TBD with product}}` — supported currency list, FX-source vendor
Provably-fair RNG audit-ability	Proposed	M	Re-enable `JwtGuard` per SR-001; close SR-008 fairness-seed race	`{{TBD with product}}` — public-audit interface scope
Affiliate v2 with tier improvements	Proposed	M	`{{TBD with product}}`	Tier definition, commission curves
VIP program enhancements	Proposed	M	`{{TBD with product}}`	VIP-level criteria, perks
Live-chat integration depth (currently Intercom embed only)	Proposed	S–M	`{{TBD with product}}`	Vendor choice; depth (chat only vs co-browse)
Sportsbook integration completeness (PM8 partial)	Scoped (partial)	L	Resolve SR-033 (sportsbook bets hidden by hard-coded filter at `bet.repository.ts:280`)	`{{TBD with product}}` — odds provider, settlement flow
Additional locales (currently `en` + `de`)	Proposed	S per locale	`next-intl` is wired; copy + translation review	`{{TBD with product}}` — target market list
Withdrawal flow depth (block per `lockWithdrawOnClaimHours`)	In-flight	S	Bug #4 fix lands	None — engineering-driven once unit bug closed

{{TBD with product}} markers on this section: 8.

6a. Feature scoping notes¶

Mobile-app companion. Cookie-based auth from ebit-api does not transfer cleanly to native shells (SameSite=Lax semantics differ across iOS WebView and React Native fetch). A dedicated /auth/mobile/* endpoint set with bearer-token issuance is the typical pattern; coordinate with the auth team before scoping. Out-of-band: app-store review timeline (~2 weeks Apple, 1 week Google) is a hard dependency on launch date, not on engineering capacity.
Provably-fair RNG audit-ability. Today seeds are exposed via /bets/house-games/info/<betId> (currently unguarded — SR-001). The fix lands authentication; the feature is a public endpoint or downloadable proof bundle that lets a third-party reproduce the RNG. Two designs: (a) per-bet downloadable JSON; (b) Merkle-tree commitment posted on-chain. (b) is materially more work and depends on {{TBD with product}} re: chain choice.
Sportsbook integration completeness. PM8 partial scope already in code; SR-033 documents that bet.repository.ts:280 filters out sportsbook bets — closing that filter is a 1-line change, but the surrounding settlement flow + odds pipeline are not yet wired.

Section 7 — Quarterly milestones¶

Placeholder buckets. Dates {{TBD assign during handover roadmap meeting}}. Sequencing reflects the dependency chain — Q1 unblocks Q2; Q3 builds on stable Q2 foundation.

Q1 — Critical bug fixes + production sizing¶

Date: {{TBD}} - Bug #1 — bcrypt cost reduction (S) - Bug #2 — Redis maxmemory policy (S) + new ADR - Bug #3 — socket.io-redis-adapter installed + ADR (M) - Bug #4 — lockWithdrawOnClaimHours unit fix (S) - Bug #8 — Prisma connection_limit raised (S) - Captcha provider abstraction (Bug #5) + DISABLE_CAPTCHA env (Bug #6) (M) - ADR backlog from §5b items 1–4

Q2 — Performance v2 + horizontal scaling¶

Date: {{TBD}} - Re-run stepped-ramp 1k → 10k VU on c7g.4xlarge (perf #7) - Add ebit-api horizontal scaling — ASG + ALB (perf #3) + ADR - Bet-place 100 ms SLO recovery (Bug #11) — break the sync-RPC-in-transaction anti-pattern - Critical/High security findings: SR-001, SR-002, SR-003, SR-004, SR-009, SR-010, SR-011, SR-012, SR-018, SR-025 (10 items per security/internal/findings.md due Q2) - admin-fe four-bug stack closed (Bug #9)

Q3 — New features + observability close-out¶

Date: {{TBD}} - Section 6 features per product priority - Cross-service trace propagation (Bug #12 / ADR-0005) — Nest microservice interceptor - Speed-roulette job-timeout policy (Bug #13) + ADR - Remaining medium security findings due Q3 (SR-007, SR-008, SR-014–SR-024 per register)

Q4 — Tech debt sweep¶

Date: {{TBD}} - Section 8 items - Monorepo split if rt scaling demands it (ADR-0011 escape hatch) - Soak/forensic re-run; SLO actuals captured for {{TBD}} post-launch slots from §4c - Low-severity register burndown (SR-031, SR-037, SR-039, SR-043, SR-045, SR-048, SR-049)

Section 8 — Tech debt ledger¶

Specific patterns flagged for refactor. Distinct from §2 bugs in that nothing is broken — these are friction points that compound over time.

#	Item	Location	Friction note	Recommendation
1	Payment-provider abstraction is convention-only — no `PaymentProviderInterface`, every provider hand-wired	`recipes/add-payment-provider.md:6, 130`; `apps/api/src/payment/...` central wiring	Medium	Introduce strategy-pattern interface; auto-discover via `@Inject` token
2	KYC abstraction is vendor-specific (Sumsub-namespaced), not strategy-pattern — plan to rewrite, not swap	`recipes/swap-kyc-provider.md:6, 71`; `apps/api/src/kyc/sumsub/...`	Very high	Strategy pattern under `KycProviderInterface` + `@Inject('KYC_PROVIDER')` token; gated by `KYC_PROVIDER_NAME` env
3	OTel transport gap on `@ExternalControllerClient` — orphan trace roots across Redis pub/sub RPC	`libs/gateway/src/ms-controller/`; ADR-0005; memory: `project_otel_microservice_transport_gap.md`	High	Custom Nest microservice interceptor that serializes the active OTel context
4	`ebit-bj` app orphan — port 4002 image-builds but receives zero traffic	`apps/bj/`; memory: `project_ebit_bj_orphan.md`	High	Disposition decision (delete / rewire / document) — see §5b
5	RabbitMQ stub — broker boots but receives zero traffic (Fast Track stub at `disabled = true`)	`apps/api/src/fast-track/rabbitmq/fast-track.rmq.module.ts:8`; ADR-0003	Low	If Fast Track ships, follow ADR-0003 §"Future Fast Track decision"; if dead, follow §"If product decides Fast Track is permanently dead" — remove broker, 11 call sites, env, ADR
6	Per-instance presence map (`clientSockets` Map) breaks at >1 `rt` replica	`apps/rt/src/.../client.gateway.ts`; SR-030	High	Redis-backed presence + socket.io adapter (see Bug #3)
7	Duplicate-email race at sign-up returns 500 instead of 400 — bots can fingerprint live users	`apps/api/src/.../auth.service.ts:67-86`; SR-020	Medium	Catch P2002, return 400 `EMAIL_TAKEN`; pre-check inside transaction
8	`O(n_sockets)` balance push iterates `clientSockets.forEach` per `BalanceUpdated`	`apps/rt/src/.../client.gateway.ts:306-315`; SR-017	Medium	Per-user socket.io rooms: `this.server.to('user:'+id).emit(...)`
9	Bet `status` index missing — power-user list degraded under traffic	Prisma `Bet` schema; SR-032	Low	Add covering index on `status`
10	`LeaderboardQueueProducer` has zero call sites	`apps/api/src/leaderboard/...`; SR-039	Low	Delete or wire — confirm intent first
11	`RACE_ENABLED` per-handler inline guard, easy to forget	`apps/api/src/leaderboard/...`; SR-038	Low	Centralize behind a feature-flag service
12	In-process Map cache 60s — api vs bo serve up to 60 s stale	SR-041 (accepted)	Low	Document staleness budget; revisit if observed in user-facing report
13	`usdAmount` request-time FX vs row-stamped	SR-036	Low	Stamp FX rate on row at insert; reconcile any historical drift
14	EvoLogger / winston coexistence with nestjs-pino	ADR-0001 §"Considered options" #3; memory: `project_evologger_trace_correlation.md`	Low	Possible future state: drop EvoLogger, use pino everywhere; not worth the effort today per ADR-0001

8a. Friction-map summary (from `recipes/integration-cookbook.md`)¶

The integration cookbook's friction map already classifies abstraction-debt items:

Recipe 1 (add-payment-provider) — medium friction — convention-only abstraction (#1 above).
Recipe 6 (swap-kyc-provider) — very high friction — vendor-specific, plan to rewrite (#2 above).

Other recipes pass the friction filter cleanly (S/M effort, low surprise).

Section 9 — How to update this doc¶

9a. Quarterly review¶

Owner: engineering lead.
Cadence: end of each quarter, before the next quarter's planning meeting.
Inputs: every new runbook authored that quarter, every ADR written/amended, every entry in security/internal/findings.md, the most recent perf-test report.
Output: each Section's tables get a delta paragraph; closed items move to a ## History annex (not yet created).

9b. New finding → backport¶

When a runbook, ADR, or security audit surfaces a new follow-up:

Add a row to the relevant Section (§2 bugs, §3 perf, §4 docs, §5 ADRs, §8 debt).
Use a stable ID per row so cross-doc references survive. Convention: RFW-NNN (roadmap-future-work, sequential), parallel to SR-NNN.
Cite source as path:line (file:line) or as an explicit memory reference (memory: <name>.md).
If the finding maps to an SR-NNN row, link both ways.

9c. Aggregator script¶

Author tools/docs/refresh-roadmap.sh:

Greps {{TBD}} markers in docs/runbooks/ + docs/adr/ + docs/engineering/.
Diffs against the previous run's snapshot (committed under tools/docs/.roadmap-tbd-snapshot).
Flags new TBDs as candidates for §4 burndown.
Runs in CI alongside the existing link-check / mdlint / terminology / TBD-detector workflow (tools/docs/).

Until that script exists, the manual incantation is:

grep -rEn '\{\{TBD' docs/runbooks/ docs/adr/ docs/engineering/ \
  | grep -v 'roadmap-future-work.md' \
  | sort -u

Cross-references¶

business/roadmap.md — customer-adoption phases (commercial milestones).
delivery/phased-rollout.md — week-by-week rollout timeline.
engineering/dependencies.md — third-party SDK / vendor dependency map (sibling agent's deliverable; create if not yet authored).
security/internal/findings.md — full SR-NNN security register.
performance-test-report-results.md — perf-test bottlenecks identified.
PORTAL-AUDIT.md §3 — engineering-fillable {{TBD}} categorization.
MERMAID-AUDIT.md — diagram corpus health (clean as of 2026-04-25).
runbooks/ — every operational runbook with embedded {{TBD}} markers.
adr/ — architecture decision records with explicit revisit triggers.

History annex¶

Append-only log of items closed since handover. Format: {{TBD: date}} — RFW-NNN — short description — closing PR / commit.

{{TBD}} — no entries yet; first quarterly review will populate this section.

When an item is closed, move its row from the live Section to this annex with the closure date and link to the PR or commit. Do not delete the row — preserving history is what makes the doc usable across a year of operations.