Skip to content

Service Catalog

CMDB-style master inventory of every service, datastore, and external dependency in Evospin / dropbet. One look-up surface for incident response: find the service in §"Application services" or §"Find by symptom", read its card, jump to its runbook + dashboard + owner.

Audience: Tier 1 / Tier 2 on-call, customer-team SRE, and any new-hire who needs the "what's in this thing?" map. Reading order: skim §"Find by symptom" first, then drill into the service card. The cards do not duplicate runbook or dashboard content — they link to it.

Cross-links: architecture/service-map.md for the architectural view (with Mermaid C4 diagrams), external-services.md for the protocol-level / fallback details on third-parties.


Reading the cards

Every service card has the same shape. Fields that don't apply are explicitly marked n/a; fields the customer team owns are marked {{TBD: customer-team}}; fields engineering owns are marked {{TBD: engineering}}.

- Type: docker / NestJS app / Postgres / external SaaS / k8s / EC2
- Repo + source path: where the code lives
- Port + URL: local + perf staging
- Health endpoint: HTTP path or TCP probe
- Owner: customer team role responsible (Tier 1/2/3)
- SLA target: from business/nfr-sla.md
- Dashboard / Runbook / API ref / Logs / Traces / Doppler config
- Dependencies / Dependents
- Critical alerts / Notes

1. Application services (5 NestJS apps)

ebit-api — main REST API

  • Type: NestJS application
  • Repo / source: ebit-api/apps/api/
  • Port: 4000 (host); 4000 (container internal)
  • URL (local): http://localhost:4000/swagger
  • URL (perf staging): <SUT_IP>:4000 — see terraform/perf/.outputs.env after terraform apply
  • Health endpoint: /health (TCP probe in compose; full check via Swagger reachability)
  • Owner: customer-team Tier 2 senior on-call · {{TBD: customer-team}}
  • SLA target: p95 sign-in < 200 ms, p95 bet-place < 200 ms, p95 balance < 100 ms — see business/nfr-sla.md
  • Dashboard: Grafana → service-overview + ebit-perf-test (/observability/grafana/provisioning/dashboards/)
  • Runbooks: runbooks/db-high-load.md · runbooks/db-down.md · runbooks/bullmq-job-stuck.md · general triage in handover/oncall-runbook.md
  • API reference: api-reference/api.md
  • Logs: Loki {service_name="ebit-api"}
  • Traces: Jaeger → service ebit-api
  • Doppler config: ebit-api/dev_perf (and per-env analogs)
  • Dependencies: Postgres (ebit-db) · Redis cache (ebit-redis) · Redis bot (ebit-redis-bot) · OTel collector
  • Dependents: ebit-fe (SSR + browser fetches) · ebit-admin-fe (SSR) · ebit-rt (token validation) · BullMQ producers across the fleet
  • Critical alerts: error rate > 5% for 5 min · p95 > 2× SLA · pg_stat_activity.idle_in_transaction > 5 — wired in {{TBD: engineering — alert routing}}
  • Notes: Manual tracer.startActiveSpan wraps in AuthService.login and UserService.authenticate (see adr/0007-evologger-kept-not-migrated.md). Highest blast radius — every player flow flows through this service.

ebit-rt — realtime / websocket gateway

  • Type: NestJS application (socket.io, websocket-only — polling disabled)
  • Repo / source: ebit-api/apps/rt/
  • Port: 4001 (host) — namespace /events
  • Health endpoint: TCP probe on 4001
  • Owner: customer-team Tier 2 senior on-call · {{TBD: customer-team}}
  • SLA target: connection success > 99%, p95 handshake < 200 ms — see business/nfr-sla.md
  • Dashboard: service-overview (rt panel) + browser RUM
  • Runbooks: runbooks/ws-adapter-scale-out.md
  • API reference: api-reference/rt-events.md
  • Logs / Traces: Loki {service_name="ebit-rt"} · Jaeger → ebit-rt
  • Doppler config: same as ebit-api
  • Dependencies: Postgres, Redis cache, ebit-api (token validation hop)
  • Dependents: ebit-fe (player live updates)
  • Critical alerts: throttler ban count > 0/sec for 1 min · CPU > 80% for 2 min · concurrent connections > measured ceiling
  • Notes: Single replica today@socket.io/redis-adapter not installed. Horizontal scale-out blocked; see runbooks/ws-adapter-scale-out.md §Cause and AF-3 in architecture.md.

ebit-bj — blackjack server (orphaned)

  • Type: NestJS application
  • Repo / source: ebit-api/apps/bj/
  • Port: 4002
  • Owner: engineering — currently no production callers · {{TBD: engineering — disposition decision}}
  • Status: Orphaned. Has its own session-token scheme and EVO-Games wallet RPC. The dropbet client exclusively hits ebit-api's /casino/games/house/blackjack/* instead — bj is never called from the in-repo FE or api. AF-4 in architecture.md. Documented for completeness; do not page on its alerts.
  • Dashboard / Runbooks / API ref: service-overview (bj panel — usually flat) · no dedicated runbook (general triage in handover/oncall-runbook.md) · no Swagger
  • Logs / Traces: Loki {service_name="ebit-bj"} · Jaeger → ebit-bj (orphan trace roots — traceparent doesn't propagate over Redis pub/sub; see adr/0005-no-traceparent-on-redis-rpc.md)
  • Dependencies: Postgres, Redis cache
  • Dependents: none in-repo

ebit-bo — back-office API

  • Type: NestJS application (separate Swagger, separate route tree)
  • Repo / source: ebit-api/apps/bo/
  • Port: 4003
  • URL (local): http://localhost:4003/swagger
  • Health endpoint: TCP probe on 4003
  • Owner: customer-team Tier 2 (admin operations) · {{TBD: customer-team}}
  • SLA target: best-effort; not on the player-facing critical path
  • Dashboard: service-overview (bo panel)
  • API reference: api-reference/bo.md
  • Logs / Traces: Loki {service_name="ebit-bo"} · Jaeger → ebit-bo
  • Dependencies: Postgres, Redis cache
  • Dependents: internal ops tooling only · note: ebit-admin-fe does not call bo directly today; it routes through api (see flows/admin-bets.md).

ebit-speed-roulette — multiplayer roulette state machine

  • Type: NestJS application (BullMQ-driven state queue, concurrency: 1)
  • Repo / source: ebit-api/apps/speed-roulette/
  • Port: 4004
  • Owner: customer-team Tier 2 (multiplayer game-state) · {{TBD: customer-team}}
  • SLA target: round duration ≈ 27 s end-to-end (no stall > 90 s); reveal-secret correctness 100%
  • Dashboard: service-overview + bullmq (queue depth panel for speed-roulette:state)
  • Runbooks: runbooks/speed-roulette-deadlock.md
  • Logs / Traces: Loki {service_name="ebit-speed-roulette"} · Jaeger → ebit-speed-roulette (orphan roots same as ebit-bj)
  • Dependencies: Postgres, Redis cache (BullMQ), EOS blockchain (block source for RNG)
  • Dependents: ebit-api proxies player calls; ebit-rt pushes SpeedRouletteStateUpdate to clients
  • Notes: Single replica by design — concurrency: 1 is the correctness gate. EOS provider lag tips into round-stall.

2. Frontend services (2 Next.js apps)

ebit-fe — dropbet (player site)

  • Type: Next.js 14 App Router (pnpm)
  • Repo / source: ebit-fe/
  • Port: 3000
  • URL (local): http://localhost:3000
  • Owner: customer-team Tier 1 first-line · {{TBD: customer-team}}
  • SLA target: p95 page TTFB < 400 ms, p95 LCP < 2.5 s — see business/nfr-sla.md
  • Dashboard: browser-rum (/observability/grafana/provisioning/dashboards/browser-rum.json)
  • Runbooks: page through handover/oncall-runbook.md §3 — no FE-specific runbook today · {{TBD: engineering — author runbooks/fe-build-failure.md}}
  • Logs / Traces: browser RUM via @vercel/otel; SSR logs via Loki {service_name="ebit-fe"}
  • Dependencies: ebit-api (REST + WS via ebit-rt)
  • Notes: i18n via next-intl; SVG handling via @svgr/webpack. Connects to ebit-rt via socket.io-client (websocket transport only).

ebit-admin-fe — internal admin panel

  • Type: Next.js 14 + NextUI + Ant Design charts (pnpm)
  • Repo / source: ebit-admin-fe/
  • Port: 3001
  • Owner: customer-team Tier 2 (admin ops) · {{TBD: customer-team}}
  • Status: Sign-in flow has 4 known integration bugs — cookie-name mismatch, missing OTel, no propagateContextUrls, hardcoded API host. Use Swagger directly for admin operations until fixed. See flows/admin-sign-in.md and onboarding/day-one.md §9.
  • Dashboard: service-overview (admin-fe panel)
  • Logs / Traces: Loki {service_name="ebit-admin-fe"}; tracing currently broken — see Status above
  • Dependencies: ebit-api (admin endpoints)
  • Operator reference: 22-screen guide at admin/README.md — every screen cites its admin-fe route + the matching apps/api/src/**/admin*.controller.ts file:line

3. Data services (3)

ebit-db — Postgres 13

  • Type: Postgres 13-bullseye in compose; managed instance in production · {{TBD: customer-team — production instance type}}
  • Port: 5555 (host) → 5432 (container)
  • Owner: customer-team Tier 2 (DB) · {{TBD: customer-team}}
  • Dashboard: prisma-postgres
  • Runbooks: runbooks/db-high-load.md · runbooks/db-down.md · runbooks/login-fails-bcrypt.md
  • Logs: docker logs only (not in Loki by default)
  • Dependencies: none (root datastore)
  • Dependents: every NestJS app
  • Notes: split Prisma schema (api, blackjack, speed_roulette) — see adr/0006-split-prisma-schema.md. No replication in compose; production replication shape is {{TBD: engineering}}.

ebit-redis — cache Redis (:6379)

  • Type: redis/redis-stack:latest (stdlib + RedisJSON + RediSearch)
  • Port: 6379 (host) — password cache
  • Owner: customer-team Tier 2 (DB) · {{TBD: customer-team}}
  • Dashboard: redis
  • Runbooks: runbooks/redis-memory-pressure.md · runbooks/bullmq-job-stuck.md
  • Dependents: every NestJS app · all production BullMQ queues except bot queues
  • Notes: No maxmemory configured in docker-compose.yml. Production sizing is {{TBD: engineering — see redis-memory-pressure.md §Prevention}}.

ebit-redis-bot — bot Redis (:6380)

  • Type: same as cache; separate instance to isolate bot-driven load
  • Port: 6380 (host) — password bot
  • Owner: customer-team Tier 2 · {{TBD: customer-team}}
  • Dashboard: redis (filter by instance)
  • Runbooks: same patterns as cache Redis
  • Dependents: bot-related BullMQ queues only (bots-bet, bots-session-scheduler, bots-start-session, challenges)

4. Async / messaging (2)

BullMQ — Redis-backed job queues

  • Type: in-process queue library (no daemon); state lives in ebit-redis and ebit-redis-bot
  • Repo / source: ebit-api/apps/*/queue/ and ebit-api/apps/*/bull/
  • Owner: customer-team Tier 2 · {{TBD: customer-team}}
  • Dashboard: bullmq — depth per queue
  • Runbooks: runbooks/bullmq-job-stuck.md
  • Notes: 13 queues total. All production async work rides BullMQ — see adr/0003-bullmq-not-rabbitmq.md. Per-queue → Redis-instance map in runbooks/bullmq-job-stuck.md §1.

ebit-rabbitmq — stubbed broker (vhost ft)

  • Type: RabbitMQ in compose; zero traffic — wired only to apps/api/src/fast-track/rabbitmq/fast-track.rmq.module.ts which returns a stubbed no-op (disabled = true)
  • Port: 5672 (AMQP), 15672 (UI — user rabbitmq / pass rabbitmq, vhost ft)
  • Owner: engineering — disposition pending · {{TBD: engineering}}
  • Status: Container runs and passes healthchecks but is unused. Don't page on it.
  • Notes: When debugging a stalled queued job, look in Redis (KEYS bull:*), not RabbitMQ — see /CLAUDE.md "Async queues — BullMQ, not RabbitMQ".

5. Observability (5)

otel-collector

jaeger (v2 + Badger)

prometheus

  • Type: Prometheus TSDB
  • Port: 9090
  • Config: /observability/prometheus.yml
  • Owner: customer-team Tier 2 · {{TBD: customer-team}}
  • Notes: scrapes otel-collector for spanmetrics-derived RED.

loki

grafana

  • Type: Grafana (dashboards + Explore)
  • Port: 3003 (admin / grafana for local)
  • Dashboards: /observability/grafana/provisioning/dashboards/ — 8 provisioned (service-overview, perf-test, perf-system, logs-trace-pivot, prisma-postgres, bullmq, redis, browser-rum)
  • Owner: customer-team Tier 2 · {{TBD: customer-team}}

6. Performance-test infrastructure (3, transient)

These exist only when a perf run is active. Provisioned via /terraform/perf/; destroyed via terraform destroy after capture.

Service Instance Role
SUT (System Under Test) EC2 c7g.4xlarge Runs the full Evospin stack from ECR (api / rt / bj / bo / speed-roulette + both FEs + Postgres + 2× Redis + RabbitMQ)
Monitoring EC2 c7g.xlarge OTel Collector + Prometheus + Grafana + Loki + Jaeger + node-exporter
Loadgen EC2 c7g.4xlarge k6 v0.56 + Node 22 + pnpm 9.11 + Playwright (100 concurrent Chromium) + node_exporter

7. Third-party / external dependencies

These are services we consume but don't operate. The card is shorter — what we use it for, fallback if unavailable, contract owner.

Service What we use it for Fallback if unavailable Owner
Doppler (workspace ebit) Secrets distribution for runtime + CI .env files (local only); production has no fallback — outage = no-deploy customer-team {{TBD}}
Google reCAPTCHA v3 Sign-up + sign-in + forgot-password gating runbooks/captcha-break-glass.md — currently no backup provider configured (engineering follow-up) customer-team {{TBD}}
Sumsub KYC verification Manual review queue customer-team {{TBD}}
CCPAYMENT Crypto payments processor Alternative provider via {{TBD: engineering — payments-abstraction layer}} customer-team {{TBD}}
NowPayments Crypto payments processor (secondary) CCPAYMENT primary customer-team {{TBD}}
Softswiss Slots provider Other slot providers in catalog (apps/api/src/casino/slots/providers/) customer-team {{TBD}}
PM8 Slots provider Same customer-team {{TBD}}
MaxMind GeoIP Country gating + restricted-country list Block all on lookup failure (fail-secure) customer-team {{TBD}}
CoinGecko Exchange rates for ExchangeRatesService.toUsd() Cached rates degrade quietly; alert on stale rate customer-team {{TBD}}
SendGrid Transactional email (verification, password reset, marketing) Local mode bypasses entirely (isLocal); production has no fallback today — {{TBD: engineering — second-provider abstraction}} customer-team {{TBD}}
Sentry Error tracking + perf monitoring Errors still log to Loki/stdout — Sentry is observability, not critical-path customer-team {{TBD}}
EOS blockchain (public nodes) RNG block-source for speed-roulette WAITING_BLOCK state Round stalls in WAITING_BLOCK — see runbooks/speed-roulette-deadlock.md engineering {{TBD}}
EVO wallet RPC (Skindeck) Skin-deposit settlement Deposits queue and retry; player sees pending state customer-team {{TBD}}

For protocol-level details (auth headers, rate limits, observed failure modes) see external-services.md.


8. Dependency graph

Internal topology (clients · frontends · apps · datastores · observability) is covered in architecture/service-map.md, split into player path + admin/ops path. Read that first.

The diagram below covers what service-map.md deliberately omits — third-party integrations outbound from ebit-api, since each is an external dependency with its own runbook concerns.

flowchart LR
    ebit_api["ebit-api :4000"]

    subgraph fairness["Game fairness"]
        eos(("EOS blockchain<br/>JSON-RPC"))
    end

    subgraph auth_msg["Auth + messaging"]
        recaptcha(("reCAPTCHA<br/>verify token"))
        sendgrid(("SendGrid<br/>SMTP"))
    end

    subgraph kyc_pay["KYC + payment"]
        sumsub(("Sumsub<br/>KYC"))
        ccp(("CCPAYMENT<br/>crypto deposits"))
    end

    subgraph obs["Observability (non-local)"]
        sentry(("Sentry<br/>errors"))
    end

    ebit_api -- "JSON-RPC" --> eos
    ebit_api -- "verify"   --> recaptcha
    ebit_api -- "send"     --> sendgrid
    ebit_api -- "KYC API"  --> sumsub
    ebit_api -- "deposit"  --> ccp
    ebit_api -- "errors"   --> sentry

Speed-roulette is the only Nest app besides ebit-api reaching out to a third party (EOS). All others terminate inside the Nest monorepo or hit shared datastores — see architecture/service-map.md for the internal topology.

Edge convention: solid = active in production, dashed = stubbed or no callers. The ebit-bj and RabbitMQ edges are dashed for that reason.

Node count Edge count
21 27

9. Find by symptom

The fastest path during an incident — symptom → likely service(s) → start here.

Symptom Likely service(s) Open first
Bet placement slow ebit-api + ebit-db runbooks/db-high-load.md
Bet placement returns 5xx ebit-api (Prisma transaction) + BullMQ runbooks/bullmq-job-stuck.md, then runbooks/db-down.md
Live game state stuck (>90 s) ebit-speed-roulette (or ebit-bj, but bj has no callers) runbooks/speed-roulette-deadlock.md
Real-time updates not pushed ebit-rt + ebit-redis (cache) runbooks/ws-adapter-scale-out.md — single-replica today
Sign-in failing for many users ebit-api + Google reCAPTCHA + Sumsub runbooks/captcha-break-glass.md, then runbooks/login-fails-bcrypt.md
Trace gap mid-request OTel Collector + Redis pub/sub transport runbooks/trace-missing.md — known gap on bj/speed-roulette per adr/0005-no-traceparent-on-redis-rpc.md
Logs missing for a service OTel Collector + Loki runbooks/loki-no-logs.md
Wallet balance wrong (one user) ebit-api + ebit-db flows/dropbet-wallet.md — check SF-013 (no overdraft guard on toVault)
Wallet balance wrong (many users) ebit-api + BullMQ bet_settled_queue handover/oncall-runbook.md — P0; promote and page
Admin can't ban a user ebit-api + ebit-bo flows/admin-user-mgmt.md
Redis OOM / eviction spike ebit-redis (cache) + BullMQ retention runbooks/redis-memory-pressure.md
WS storm / bans climbing ebit-rt (single-replica ceiling) runbooks/ws-adapter-scale-out.md
2FA / MFA reset for an admin ebit-api (auth) + Postgres runbooks/2fa-unknown-secret.md
Email not delivered SendGrid + ebit-api external-services.md §SendGrid (no production fallback today)
Captcha fails on real traffic Google reCAPTCHA upstream runbooks/captcha-break-glass.md
KYC verification stuck Sumsub + ebit-api external-services.md §Sumsub
Crypto deposit not credited CCPAYMENT or NowPayments + BullMQ SKINDECK_DEPOSIT runbooks/bullmq-job-stuck.md

10. Coverage gaps (where this catalog is thinnest)

The three services with the least operational documentation today — engineering-team follow-ups:

  1. ebit-fe — no FE-specific runbook for build / SSR / hydration failures. Browser RUM dashboard exists; runbook is {{TBD: engineering — authorrunbooks/fe-build-failure.mdandrunbooks/fe-hydration-mismatch.md}}.
  2. ebit-bj — orphaned, but disposition decision (delete? rewire? keep as backup?) hasn't been made. {{TBD: engineering — file ADR for bj disposition}}.
  3. External payment providers (CCPAYMENT, NowPayments) — no break-glass runbook for "payment processor down". The pattern from runbooks/captcha-break-glass.md applies. {{TBD: engineering — authorrunbooks/payments-provider-down.md}}.

These three are tracked in the engineering follow-up backlog (task #35 in the doc-portal task list).


See also