Skip to content

Engineering Track — README

Audience: Developers, SREs, on-call engineers. Goal: Get from "I just got handed an Evospin pager" to "I can find the right code, the right trace, and the right runbook" in under an hour of reading. Companion tracks: ../business/ (leadership, sales) · ../delivery/ (PM, integration partners) · ../handover/ (onboarding + incident).

Quick references: Glossary — terminology · FAQ — common questions · Service catalog — operator inventory · Portal audit — verification report · Inventory — every doc in the portal.

This track is an index/aggregator, not a rewrite. Every page summarises a topic and links into the canonical source — usually a directory under docs/ or a config file under observability/, terraform/, or one of the three repos. When two pages contradict, the linked source wins.

If you notice doc drift or a broken assumption while onboarding or debugging, add it to ../audits/ONBOARDING-REVIEW.md (or open an issue) so it’s tracked and can be burned down.


Reading order

If you only have an hour, read these in order:

  1. stack.md — what's running, which versions, why those choices.
  2. architecture.md — how the pieces talk; pointer to ../architecture/ and ../architecture.md.
  3. flows.md — catalogue of the 15 narrative flow docs in ../flows/.
  4. api.md — when to read live Swagger vs the static reference vs Postman.
  5. observability.md — OTel pipeline, dashboards, log correlation, audits.
  6. runbooks.md — incident → runbook map (read this when paged).

For deep-dive readers, continue with:

  1. env.md — env var reference + Doppler workspace layout.
  2. data-model.md — Prisma schemas + ER diagrams.
  3. adr.md — architecture decision records.
  4. performance.md — k6 / Playwright methodology + perf audits.
  5. tooling.md — daily-driver dev tools.

Pages in this track

# Page Read time Summary
1 stack.md ~7 min Inventory: NestJS 11, Next.js 14 (player) + Vite/React 19 (admin), Prisma 7, Postgres 13, Redis (cache + bot), BullMQ, RabbitMQ (stubbed), OTel, Jaeger v2 + Badger, Doppler, Terraform. Why each, alternatives rejected, ADR pointers.
2 architecture.md ~5 min Service map, repos-and-libraries layout, cross-cutting concerns, known structural debt (orphan ebit-bj, microservice trace gap).
3 flows.md ~4 min Catalogue of 15 flow docs grouped by user / admin / system flow.
4 api.md ~3 min Decision tree for "where do I look": live Swagger vs api-reference/ vs api/ (Postman) vs CHANGELOG.
5 observability.md ~7 min OTel collector pipeline, spanmetrics, Grafana dashboards, log correlation, tail sampling, trace coverage audit, PromQL audit.
6 runbooks.md ~4 min Incident → runbook → escalation map. Existing runbooks plus three proposed (DB load, Redis pressure, queue back-pressure).
7 env.md ~4 min Pointer to env-reference.md + Doppler workspace structure (per-repo project, dev_perf config, service tokens).
8 data-model.md ~3 min Prisma split-schema layout, six ER diagrams, conventions.
9 adr.md ~4 min Eight existing ADRs, status per ADR, three proposed ADR stubs.
10 performance.md ~5 min All perf docs: methodology, run checklist, trace coverage audit, PromQL audit, Doppler audit, Jaeger storage research, k6/Playwright assets.
11 tooling.md ~6 min Per-tool: what it is, where its config lives, daily commands.

Total read time ≈ 50 minutes for the full track.


These are the directories the engineering track aggregates. Bookmark them.


Engineering-internal sub-track

Anything that should not be customer-visible — full security findings, infra credentials walk-throughs, customer-incident post-mortems — lives under engineering/internal/. The security-classifier sibling agent will populate that subtree; today it is empty. Reading entry point will be engineering/internal/README.md once created. Until then, the unredacted security register is at ../security-register.md (will be split into customer-safe + internal layers by the security-track agent).


What you'll know after reading

  • [ ] Which repo / app / service to grep when investigating a bug.
  • [ ] Which Jaeger / Grafana / Loki query answers each class of question.
  • [ ] Which env vars to set for local dev vs CI vs perf vs production.
  • [ ] Where the known trace-propagation gaps are (microservice transport, WS, BullMQ, EvoLogger→Loki).
  • [ ] Which ADRs settle questions you might otherwise re-litigate ("why pino + winston?", "why split Prisma schema?").
  • [ ] How to run a perf test from k6 profile selection through Grafana panel reading.

What you won't find here

  • Step-by-step onboarding day-1 walkthrough — that's ../onboarding/day-one.md.
  • Commercial / SLA commitments — those are in ../business/nfr-sla.md.
  • Customer-incident post-mortems — those are internal-only, will live under engineering/internal/.

Conventions

This track follows ../STYLE.md. Highlights:

  • Every code reference uses file_path:line_number (relative to repo root) so symbols are greppable.
  • Mermaid diagrams come with a fallback table.
  • {{TBD}} markers identify gaps; Doc CI scans for them.
  • Customer-safe by default. Internal-only sections live under engineering/internal/ with a banner.