Service Catalog¶

CMDB-style master inventory of every service, datastore, and external dependency in Evospin / dropbet. One look-up surface for incident response: find the service in §"Application services" or §"Find by symptom", read its card, jump to its runbook + dashboard + owner.

Audience: Tier 1 / Tier 2 on-call, customer-team SRE, and any new-hire who needs the "what's in this thing?" map. Reading order: skim §"Find by symptom" first, then drill into the service card. The cards do not duplicate runbook or dashboard content — they link to it.

Cross-links: architecture/service-map.md for the architectural view (with Mermaid C4 diagrams), external-services.md for the protocol-level / fallback details on third-parties.

Reading the cards¶

Every service card has the same shape. Fields that don't apply are explicitly marked n/a; fields the customer team owns are marked {{TBD: customer-team}}; fields engineering owns are marked {{TBD: engineering}}.

- Type: docker / NestJS app / Postgres / external SaaS / k8s / EC2
- Repo + source path: where the code lives
- Port + URL: local + perf staging
- Health endpoint: HTTP path or TCP probe
- Owner: customer team role responsible (Tier 1/2/3)
- SLA target: from business/nfr-sla.md
- Dashboard / Runbook / API ref / Logs / Traces / Doppler config
- Dependencies / Dependents
- Critical alerts / Notes

1. Application services (5 NestJS apps)¶

`ebit-api` — main REST API¶

Type: NestJS application
Repo / source: ebit-api/apps/api/
Port: 4000 (host); 4000 (container internal)
URL (local): http://localhost:4000/swagger
URL (perf staging): <SUT_IP>:4000 — see terraform/perf/.outputs.env after terraform apply
Health endpoint: /health (TCP probe in compose; full check via Swagger reachability)
Owner: customer-team Tier 2 senior on-call · {{TBD: customer-team}}
SLA target: p95 sign-in < 200 ms, p95 bet-place < 200 ms, p95 balance < 100 ms — see business/nfr-sla.md
Dashboard: Grafana → service-overview + ebit-perf-test (/observability/grafana/provisioning/dashboards/)
Runbooks: runbooks/db-high-load.md · runbooks/db-down.md · runbooks/bullmq-job-stuck.md · general triage in handover/oncall-runbook.md
API reference: api-reference/api.md
Logs: Loki {service_name="ebit-api"}
Traces: Jaeger → service ebit-api
Doppler config: ebit-api/dev_perf (and per-env analogs)
Dependencies: Postgres (ebit-db) · Redis cache (ebit-redis) · Redis bot (ebit-redis-bot) · OTel collector
Dependents: ebit-fe (SSR + browser fetches) · ebit-admin-fe (SSR) · ebit-rt (token validation) · BullMQ producers across the fleet
Critical alerts: error rate > 5% for 5 min · p95 > 2× SLA · pg_stat_activity.idle_in_transaction > 5 — wired in {{TBD: engineering — alert routing}}
Notes: Manual tracer.startActiveSpan wraps in AuthService.login and UserService.authenticate (see adr/0007-evologger-kept-not-migrated.md). Highest blast radius — every player flow flows through this service.

`ebit-rt` — realtime / websocket gateway¶

Type: NestJS application (socket.io, websocket-only — polling disabled)
Repo / source: ebit-api/apps/rt/
Port: 4001 (host) — namespace /events
Health endpoint: TCP probe on 4001
Owner: customer-team Tier 2 senior on-call · {{TBD: customer-team}}
SLA target: connection success > 99%, p95 handshake < 200 ms — see business/nfr-sla.md
Dashboard: service-overview (rt panel) + browser RUM
Runbooks: runbooks/ws-adapter-scale-out.md
API reference: api-reference/rt-events.md
Logs / Traces: Loki {service_name="ebit-rt"} · Jaeger → ebit-rt
Doppler config: same as ebit-api
Dependencies: Postgres, Redis cache, ebit-api (token validation hop)
Dependents: ebit-fe (player live updates)
Critical alerts: throttler ban count > 0/sec for 1 min · CPU > 80% for 2 min · concurrent connections > measured ceiling
Notes: Single replica today — @socket.io/redis-adapter not installed. Horizontal scale-out blocked; see runbooks/ws-adapter-scale-out.md §Cause and AF-3 in architecture.md.

`ebit-bj` — blackjack server (orphaned)¶

Type: NestJS application
Repo / source: ebit-api/apps/bj/
Port: 4002
Owner: engineering — currently no production callers · {{TBD: engineering — disposition decision}}
Status: Orphaned. Has its own session-token scheme and EVO-Games wallet RPC. The dropbet client exclusively hits ebit-api's /casino/games/house/blackjack/* instead — bj is never called from the in-repo FE or api. AF-4 in architecture.md. Documented for completeness; do not page on its alerts.
Dashboard / Runbooks / API ref: service-overview (bj panel — usually flat) · no dedicated runbook (general triage in handover/oncall-runbook.md) · no Swagger
Logs / Traces: Loki {service_name="ebit-bj"} · Jaeger → ebit-bj (orphan trace roots — traceparent doesn't propagate over Redis pub/sub; see adr/0005-no-traceparent-on-redis-rpc.md)
Dependencies: Postgres, Redis cache
Dependents: none in-repo

`ebit-bo` — back-office API¶

Type: NestJS application (separate Swagger, separate route tree)
Repo / source: ebit-api/apps/bo/
Port: 4003
URL (local): http://localhost:4003/swagger
Health endpoint: TCP probe on 4003
Owner: customer-team Tier 2 (admin operations) · {{TBD: customer-team}}
SLA target: best-effort; not on the player-facing critical path
Dashboard: service-overview (bo panel)
API reference: api-reference/bo.md
Logs / Traces: Loki {service_name="ebit-bo"} · Jaeger → ebit-bo
Dependencies: Postgres, Redis cache
Dependents: internal ops tooling only · note: ebit-admin-fe does not call bo directly today; it routes through api (see flows/admin-bets.md).

`ebit-speed-roulette` — multiplayer roulette state machine¶

Type: NestJS application (BullMQ-driven state queue, concurrency: 1)
Repo / source: ebit-api/apps/speed-roulette/
Port: 4004
Owner: customer-team Tier 2 (multiplayer game-state) · {{TBD: customer-team}}
SLA target: round duration ≈ 27 s end-to-end (no stall > 90 s); reveal-secret correctness 100%
Dashboard: service-overview + bullmq (queue depth panel for speed-roulette:state)
Runbooks: runbooks/speed-roulette-deadlock.md
Logs / Traces: Loki {service_name="ebit-speed-roulette"} · Jaeger → ebit-speed-roulette (orphan roots same as ebit-bj)
Dependencies: Postgres, Redis cache (BullMQ), EOS blockchain (block source for RNG)
Dependents: ebit-api proxies player calls; ebit-rt pushes SpeedRouletteStateUpdate to clients
Notes: Single replica by design — concurrency: 1 is the correctness gate. EOS provider lag tips into round-stall.

2. Frontend services (2 Next.js apps)¶

`ebit-fe` — dropbet (player site)¶

Type: Next.js 14 App Router (pnpm)
Repo / source: ebit-fe/
Port: 3000
URL (local): http://localhost:3000
Owner: customer-team Tier 1 first-line · {{TBD: customer-team}}
SLA target: p95 page TTFB < 400 ms, p95 LCP < 2.5 s — see business/nfr-sla.md
Dashboard: browser-rum (/observability/grafana/provisioning/dashboards/browser-rum.json)
Runbooks: page through handover/oncall-runbook.md §3 — no FE-specific runbook today · {{TBD: engineering — author runbooks/fe-build-failure.md}}
Logs / Traces: browser RUM via @vercel/otel; SSR logs via Loki {service_name="ebit-fe"}
Dependencies: ebit-api (REST + WS via ebit-rt)
Notes: i18n via next-intl; SVG handling via @svgr/webpack. Connects to ebit-rt via socket.io-client (websocket transport only).

`ebit-admin-fe` — internal admin panel¶

Type: Next.js 14 + NextUI + Ant Design charts (pnpm)
Repo / source: ebit-admin-fe/
Port: 3001
Owner: customer-team Tier 2 (admin ops) · {{TBD: customer-team}}
Status: Sign-in flow has 4 known integration bugs — cookie-name mismatch, missing OTel, no propagateContextUrls, hardcoded API host. Use Swagger directly for admin operations until fixed. See flows/admin-sign-in.md and onboarding/day-one.md §9.
Dashboard: service-overview (admin-fe panel)
Logs / Traces: Loki {service_name="ebit-admin-fe"}; tracing currently broken — see Status above
Dependencies: ebit-api (admin endpoints)
Operator reference: 22-screen guide at admin/README.md — every screen cites its admin-fe route + the matching apps/api/src/**/admin*.controller.ts file:line

3. Data services (3)¶

`ebit-db` — Postgres 13¶

Type: Postgres 13-bullseye in compose; managed instance in production · {{TBD: customer-team — production instance type}}
Port: 5555 (host) → 5432 (container)
Owner: customer-team Tier 2 (DB) · {{TBD: customer-team}}
Dashboard: prisma-postgres
Runbooks: runbooks/db-high-load.md · runbooks/db-down.md · runbooks/login-fails-bcrypt.md
Logs: docker logs only (not in Loki by default)
Dependencies: none (root datastore)
Dependents: every NestJS app
Notes: split Prisma schema (api, blackjack, speed_roulette) — see adr/0006-split-prisma-schema.md. No replication in compose; production replication shape is {{TBD: engineering}}.

`ebit-redis` — cache Redis (`:6379`)¶

Type: redis/redis-stack:latest (stdlib + RedisJSON + RediSearch)
Port: 6379 (host) — password cache
Owner: customer-team Tier 2 (DB) · {{TBD: customer-team}}
Dashboard: redis
Runbooks: runbooks/redis-memory-pressure.md · runbooks/bullmq-job-stuck.md
Dependents: every NestJS app · all production BullMQ queues except bot queues
Notes: No maxmemory configured in docker-compose.yml. Production sizing is {{TBD: engineering — see redis-memory-pressure.md §Prevention}}.

`ebit-redis-bot` — bot Redis (`:6380`)¶

Type: same as cache; separate instance to isolate bot-driven load
Port: 6380 (host) — password bot
Owner: customer-team Tier 2 · {{TBD: customer-team}}
Dashboard: redis (filter by instance)
Runbooks: same patterns as cache Redis
Dependents: bot-related BullMQ queues only (bots-bet, bots-session-scheduler, bots-start-session, challenges)

4. Async / messaging (2)¶

BullMQ — Redis-backed job queues¶

Type: in-process queue library (no daemon); state lives in ebit-redis and ebit-redis-bot
Repo / source: ebit-api/apps/*/queue/ and ebit-api/apps/*/bull/
Owner: customer-team Tier 2 · {{TBD: customer-team}}
Dashboard: bullmq — depth per queue
Runbooks: runbooks/bullmq-job-stuck.md
Notes: 13 queues total. All production async work rides BullMQ — see adr/0003-bullmq-not-rabbitmq.md. Per-queue → Redis-instance map in runbooks/bullmq-job-stuck.md §1.

`ebit-rabbitmq` — stubbed broker (vhost `ft`)¶

Type: RabbitMQ in compose; zero traffic — wired only to apps/api/src/fast-track/rabbitmq/fast-track.rmq.module.ts which returns a stubbed no-op (disabled = true)
Port: 5672 (AMQP), 15672 (UI — user rabbitmq / pass rabbitmq, vhost ft)
Owner: engineering — disposition pending · {{TBD: engineering}}
Status: Container runs and passes healthchecks but is unused. Don't page on it.
Notes: When debugging a stalled queued job, look in Redis (KEYS bull:*), not RabbitMQ — see /CLAUDE.md "Async queues — BullMQ, not RabbitMQ".

5. Observability (5)¶

`otel-collector`¶

Type: OpenTelemetry Collector (OTLP gateway)
Port: 4317 (gRPC), 4318 (HTTP), 13133 (health)
Owner: customer-team Tier 2 (observability) · {{TBD: customer-team}}
Runbooks: runbooks/trace-missing.md · runbooks/loki-no-logs.md
Config: /observability/otel-collector.yml
Dependents: every NestJS app + browser RUM exporter
Notes: spanmetrics connector derives RED metrics from spans — see adr/0002-spanmetrics-over-prisma-metrics.md.

`jaeger` (v2 + Badger)¶

Type: Jaeger v2 with Badger storage backend
Port: 16686 (UI), 4317 (OTLP)
Owner: customer-team Tier 2 · {{TBD: customer-team}}
Config: /observability/jaeger-v2-config.yaml
Notes: storage decisions documented in audits/jaeger-storage-research.md. Tail-sampling is {{TBD: engineering — ADR pending}}.

`prometheus`¶

Type: Prometheus TSDB
Port: 9090
Config: /observability/prometheus.yml
Owner: customer-team Tier 2 · {{TBD: customer-team}}
Notes: scrapes otel-collector for spanmetrics-derived RED.

`loki`¶

Type: Grafana Loki (log aggregator)
Port: 3100
Config: /observability/loki.yml
Owner: customer-team Tier 2 · {{TBD: customer-team}}
Runbooks: runbooks/loki-no-logs.md

`grafana`¶

Type: Grafana (dashboards + Explore)
Port: 3003 (admin / grafana for local)
Dashboards: /observability/grafana/provisioning/dashboards/ — 8 provisioned (service-overview, perf-test, perf-system, logs-trace-pivot, prisma-postgres, bullmq, redis, browser-rum)
Owner: customer-team Tier 2 · {{TBD: customer-team}}

6. Performance-test infrastructure (3, transient)¶

These exist only when a perf run is active. Provisioned via /terraform/perf/; destroyed via terraform destroy after capture.

Service	Instance	Role
SUT (System Under Test)	EC2 `c7g.4xlarge`	Runs the full Evospin stack from ECR (api / rt / bj / bo / speed-roulette + both FEs + Postgres + 2× Redis + RabbitMQ)
Monitoring	EC2 `c7g.xlarge`	OTel Collector + Prometheus + Grafana + Loki + Jaeger + node-exporter
Loadgen	EC2 `c7g.4xlarge`	k6 v0.56 + Node 22 + pnpm 9.11 + Playwright (100 concurrent Chromium) + node_exporter

Owner: customer-team Tier 2 (perf) · {{TBD: customer-team}}
Dashboards: monitoring host serves the same Grafana provisioning as local
Runbooks: perf-run-checklist.md · performance-testing.md (methodology) · performance-test-report.md (last run)
Doppler config: dev_perf (separate from local)

7. Third-party / external dependencies¶

These are services we consume but don't operate. The card is shorter — what we use it for, fallback if unavailable, contract owner.

Service	What we use it for	Fallback if unavailable	Owner
Doppler (workspace `ebit`)	Secrets distribution for runtime + CI	`.env` files (local only); production has no fallback — outage = no-deploy	customer-team {{TBD}}
Google reCAPTCHA v3	Sign-up + sign-in + forgot-password gating	`runbooks/captcha-break-glass.md` — currently no backup provider configured (engineering follow-up)	customer-team {{TBD}}
Sumsub	KYC verification	Manual review queue	customer-team {{TBD}}
CCPAYMENT	Crypto payments processor	Alternative provider via {{TBD: engineering — payments-abstraction layer}}	customer-team {{TBD}}
NowPayments	Crypto payments processor (secondary)	CCPAYMENT primary	customer-team {{TBD}}
Softswiss	Slots provider	Other slot providers in catalog (`apps/api/src/casino/slots/providers/`)	customer-team {{TBD}}
PM8	Slots provider	Same	customer-team {{TBD}}
MaxMind GeoIP	Country gating + restricted-country list	Block all on lookup failure (fail-secure)	customer-team {{TBD}}
CoinGecko	Exchange rates for `ExchangeRatesService.toUsd()`	Cached rates degrade quietly; alert on stale rate	customer-team {{TBD}}
SendGrid	Transactional email (verification, password reset, marketing)	Local mode bypasses entirely (`isLocal`); production has no fallback today — `{{TBD: engineering — second-provider abstraction}}`	customer-team {{TBD}}
Sentry	Error tracking + perf monitoring	Errors still log to Loki/stdout — Sentry is observability, not critical-path	customer-team {{TBD}}
EOS blockchain (public nodes)	RNG block-source for speed-roulette `WAITING_BLOCK` state	Round stalls in `WAITING_BLOCK` — see `runbooks/speed-roulette-deadlock.md`	engineering {{TBD}}
EVO wallet RPC (Skindeck)	Skin-deposit settlement	Deposits queue and retry; player sees pending state	customer-team {{TBD}}

For protocol-level details (auth headers, rate limits, observed failure modes) see external-services.md.

8. Dependency graph¶

Internal topology (clients · frontends · apps · datastores · observability) is covered in architecture/service-map.md, split into player path + admin/ops path. Read that first.

The diagram below covers what service-map.md deliberately omits — third-party integrations outbound from ebit-api, since each is an external dependency with its own runbook concerns.

flowchart LR
    ebit_api["ebit-api :4000"]

    subgraph fairness["Game fairness"]
        eos(("EOS blockchain<br/>JSON-RPC"))
    end

    subgraph auth_msg["Auth + messaging"]
        recaptcha(("reCAPTCHA<br/>verify token"))
        sendgrid(("SendGrid<br/>SMTP"))
    end

    subgraph kyc_pay["KYC + payment"]
        sumsub(("Sumsub<br/>KYC"))
        ccp(("CCPAYMENT<br/>crypto deposits"))
    end

    subgraph obs["Observability (non-local)"]
        sentry(("Sentry<br/>errors"))
    end

    ebit_api -- "JSON-RPC" --> eos
    ebit_api -- "verify"   --> recaptcha
    ebit_api -- "send"     --> sendgrid
    ebit_api -- "KYC API"  --> sumsub
    ebit_api -- "deposit"  --> ccp
    ebit_api -- "errors"   --> sentry

Speed-roulette is the only Nest app besides ebit-api reaching out to a third party (EOS). All others terminate inside the Nest monorepo or hit shared datastores — see architecture/service-map.md for the internal topology.

Edge convention: solid = active in production, dashed = stubbed or no callers. The ebit-bj and RabbitMQ edges are dashed for that reason.

Node count	Edge count
21	27

9. Find by symptom¶

The fastest path during an incident — symptom → likely service(s) → start here.

Symptom	Likely service(s)	Open first
Bet placement slow	`ebit-api` + `ebit-db`	`runbooks/db-high-load.md`
Bet placement returns 5xx	`ebit-api` (Prisma transaction) + BullMQ	`runbooks/bullmq-job-stuck.md`, then `runbooks/db-down.md`
Live game state stuck (>90 s)	`ebit-speed-roulette` (or `ebit-bj`, but bj has no callers)	`runbooks/speed-roulette-deadlock.md`
Real-time updates not pushed	`ebit-rt` + `ebit-redis` (cache)	`runbooks/ws-adapter-scale-out.md` — single-replica today
Sign-in failing for many users	`ebit-api` + Google reCAPTCHA + Sumsub	`runbooks/captcha-break-glass.md`, then `runbooks/login-fails-bcrypt.md`
Trace gap mid-request	OTel Collector + Redis pub/sub transport	`runbooks/trace-missing.md` — known gap on bj/speed-roulette per `adr/0005-no-traceparent-on-redis-rpc.md`
Logs missing for a service	OTel Collector + Loki	`runbooks/loki-no-logs.md`
Wallet balance wrong (one user)	`ebit-api` + `ebit-db`	`flows/dropbet-wallet.md` — check SF-013 (no overdraft guard on `toVault`)
Wallet balance wrong (many users)	`ebit-api` + BullMQ `bet_settled_queue`	`handover/oncall-runbook.md` — P0; promote and page
Admin can't ban a user	`ebit-api` + `ebit-bo`	`flows/admin-user-mgmt.md`
Redis OOM / eviction spike	`ebit-redis` (cache) + BullMQ retention	`runbooks/redis-memory-pressure.md`
WS storm / bans climbing	`ebit-rt` (single-replica ceiling)	`runbooks/ws-adapter-scale-out.md`
2FA / MFA reset for an admin	`ebit-api` (auth) + Postgres	`runbooks/2fa-unknown-secret.md`
Email not delivered	SendGrid + `ebit-api`	`external-services.md` §SendGrid (no production fallback today)
Captcha fails on real traffic	Google reCAPTCHA upstream	`runbooks/captcha-break-glass.md`
KYC verification stuck	Sumsub + `ebit-api`	`external-services.md` §Sumsub
Crypto deposit not credited	CCPAYMENT or NowPayments + BullMQ `SKINDECK_DEPOSIT`	`runbooks/bullmq-job-stuck.md`

10. Coverage gaps (where this catalog is thinnest)¶

The three services with the least operational documentation today — engineering-team follow-ups:

ebit-fe — no FE-specific runbook for build / SSR / hydration failures. Browser RUM dashboard exists; runbook is {{TBD: engineering — authorrunbooks/fe-build-failure.mdandrunbooks/fe-hydration-mismatch.md}}.
ebit-bj — orphaned, but disposition decision (delete? rewire? keep as backup?) hasn't been made. {{TBD: engineering — file ADR for bj disposition}}.
External payment providers (CCPAYMENT, NowPayments) — no break-glass runbook for "payment processor down". The pattern from runbooks/captcha-break-glass.md applies. {{TBD: engineering — authorrunbooks/payments-provider-down.md}}.

These three are tracked in the engineering follow-up backlog (task #35 in the doc-portal task list).

Service Catalog¶

Reading the cards¶

1. Application services (5 NestJS apps)¶

ebit-api — main REST API¶

ebit-rt — realtime / websocket gateway¶

ebit-bj — blackjack server (orphaned)¶

ebit-bo — back-office API¶

ebit-speed-roulette — multiplayer roulette state machine¶

2. Frontend services (2 Next.js apps)¶

ebit-fe — dropbet (player site)¶

ebit-admin-fe — internal admin panel¶