Skip to content

Frequently asked questions

Cross-track FAQ. The answers are deliberately short — they tell you where to find the full answer, not the full answer itself. If a question's link target hasn't been written yet, the answer here says so.

Grouped by audience. Anchor links are quoted from each track README.

  • Business — what is Evospin, what does it do, what does it cost, what's the SLA
  • Delivery — what does the customer team need in place, how long does onboarding take
  • Engineering — why this stack, how do I add X, why is Y the way it is
  • Handover & operations — what to do at 3 AM, where the runbooks live
  • Cross-cutting — versioning, doc CI, sources of truth

Business

What is Evospin and how is it different from white-label casino platforms? (Business)

Unlike turnkey white-label products that ship as a closed black-box, Evospin is delivered as source — customers can extend (new payment provider, new game, new locale, new currency) without depending on the vendor for every change.

Full positioning + elevator pitch + stack: business/value-proposition.md. Per-component stack inventory: engineering/stack.md.

What's our SLA target and how do we measure it? (Business)

SLA, SLOs, and the underlying NFRs are documented in business/nfr-sla.md. Headline targets: p95 sign-in < 200 ms, p95 game-bet < 100 ms (per-game), error rate < 0.5 %, monthly availability ≥ 99.9 %. Measurement is via the ebit-perf-test Grafana dashboard, sourced from spanmetrics-derived metrics (engineering/observability-runbook.md §1).

Can we deploy on a different cloud provider? (Business)

Yes — the codebase is cloud-agnostic. The Terraform module set in terraform/perf/ is currently AWS-shaped (ECR, EBS, IMDSv2, c7g Graviton VMs) but the application stack (NestJS containers + Postgres + Redis + RabbitMQ + observability) runs on any container platform. Customer-cloud porting effort is documented in business/integration-options.md.

How long does customer onboarding typically take end-to-end? (Business)

The phased rollout is documented in delivery/phased-rollout.md — five phases over ~10 weeks (kickoff → bootstrap → bring-up → soft launch → general availability), depending on how many customisations the customer brings.

What's deprecated and what's not? (Business)

Nothing in v1.0 is deprecated — it's the initial baseline. Future deprecations land via api/changelog.md for API surface and versions/changelog.md for the documentation portal. Each release's versions/v<X.Y>/MIGRATION.md lists deprecations with sunset dates.


Delivery

What does the customer team need IN PLACE before kickoff? (Delivery)

The full prerequisite list is delivery/prerequisites.md. Headline: AWS account (or equivalent), Doppler workspace access, GitLab project access (for Unleash feature flags), payment provider sandbox creds, KYC vendor sandbox (Sumsub or equivalent), domain + DNS controller, customer-side legal/compliance officer signed off, copy reviewer named.

Who owns the Doppler workspace during and after handover? (Delivery)

Pre-handover: the platform team owns workspace ebit-devops and issues per-project service tokens to the customer environment. Post-handover: ownership transfers to the customer's ops team (or stays with platform under an SLA, depending on engagement model). See business/responsibilities.md for the responsibility matrix and ADR-0010 for the secrets architecture.

What's the rollback procedure if v1.0 fails customer UAT? (Delivery)

Documented under "Rollback" in delivery/launch-checklist.md. Short form: revert the deployment image tag to the previous release, replay any DB migrations that landed since (Prisma migrations are forward-only — coordinate with platform team for a downgrade migration if a schema change shipped). Document the rollback reason in incidents/ using the template.

Can we run the perf test without paying for the AWS infra? (Delivery)

Yes — the local docker compose stack in ebit-api/ runs the same images at smaller scale. Use it for smoke testing and component-level perf. The full stepped-ramp 1k→10k VU test requires the AWS provisioned stack because it needs a separate loadgen VM (network back-pressure shapes the result). See performance-testing.md.

Can a customer bring their own KYC vendor? (Delivery)

Yes, but it's the highest-friction customisation. The current apps/api/src/kyc/ abstraction is vendor-specific (Sumsub-named methods, Sumsub-shaped DTOs) — a swap is effectively a rewrite. Quote 4+ weeks. Recipe: recipes/swap-kyc-provider.md.

Can we add a custom in-house game? (Delivery)

Yes. Recipe: recipes/add-game.md (~2–4 days backend + 2–4 days frontend). The abstraction is well-formed — adding a game is mostly copy-and-adapt from dice/.


Engineering

Why are there 5 NestJS apps? (Engineering)

Per-app deploy cadence + per-app resource shape + per-app failure isolation. rt is throughput-bound, api is latency-bound, speed-roulette is single-tenant by design (concurrency=1). Same process can't honour all three. Full rationale + considered alternatives in ADR-0011.

Why BullMQ over RabbitMQ? (Engineering)

All production async work rides BullMQ on the cache Redis. RabbitMQ is wired but stubbed (disabled = true) — kept around for a planned-but-deferred Fast Track CRM integration. Full inventory of async work, considered alternatives, and revisit triggers in ADR-0003.

Why Jaeger v2 + Badger over Tempo? (Engineering)

v1 OOM'd at 19 GB on the dev VM (default in-memory backend, no cap). v1 is also EOL since 2025-12-31. Badger on a 50 GB EBS volume with GOMEMLIMIT=1500MiB and 72 h TTL ends the OOM class. Tempo is the documented fallback. Full storage-budget math and rejected alternatives in ADR-0009.

Why Doppler over Vault / SSM Parameter Store? (Engineering)

Per-token scoping (a leaked perf-test token can't read prd), turn-key UX (no Raft cluster to operate), free at our scale. Vault rejected on ops cost; SSM rejected on per-token-scoping limitations; Secrets Manager rejected on cost. Full rationale in ADR-0010.

Why is the captcha pass bypass NODE_ENV=local only? (Engineering)

Defence in depth. The check at apps/api/src/captcha/google/recaptcha.service.ts:28 short-circuits only when both NODE_ENV === 'local' and token === 'pass'. A production container misconfigured with NODE_ENV=local would disable captcha — that's a known posture-check item in security-register.md. Break-glass procedure (when you actively need to disable in prod): runbooks/captcha-break-glass.md.

Why does ebit-bj show as orphan? (Engineering)

It ships a full blackjack implementation (with EVO-Games wallet RPC and its own session-token scheme) but no dropbet client reaches it — all blackjack traffic routes through apps/api's /casino/games/house/blackjack/*. The app is preserved (delete-vs-revive decision pending product). See memory project_ebit_bj_orphan and architecture.md AF-4.

How do I add a new house game? (Engineering)

recipes/add-game.md. Canonical example: apps/api/src/casino/house/dice/. Touches libs/games/src/house-games.config.ts:23 (HouseGameSlug), the new module folder under apps/api/src/casino/house/, the FE route at ebit-fe/src/app/[locale]/games/originals/<slug>/, and translations in messages/{en,de}.json.

How do I add a new API endpoint? (Engineering)

recipes/add-rest-endpoint.md. Canonical example: BetController at apps/api/src/bet/bet.controller.ts:12-51. Don't forget to re-run api/sync-postman.sh after merge so the OpenAPI spec, Markdown reference, and Postman collection stay in sync.

How do I emit a custom OTel span? (Engineering)

recipes/add-otel-span.md. Use trace.getTracer('<scope>').startActiveSpan('<name>', async (span) => { ... span.end(); }). Set attributes via span.setAttribute(key, value). Keep span names stable across deploys (they become metric labels via spanmetrics).

How do I debug a slow request end-to-end? (Engineering)

engineering/observability-runbook.md §2. Short form: open Grafana → set the time window → click the latency exemplar dot → Jaeger trace opens → identify the slowest span → pivot to Loki for logs by trace_id.

Where do logs and traces correlate? (Engineering)

Every log record carries trace_id / span_id / trace_flags matching the active OTel span. Pivot Jaeger → Loki by {service_name="ebit-api"} |= "<trace_id>". Pivot Loki → Jaeger via the provisioned derivedFields "View trace" link in Grafana Explore. Memory: project_evologger_trace_correlation. Operator how-to: engineering/observability-runbook.md §§3-4.

Why does cross-service tracing break for speed-roulette? (Engineering)

@ExternalControllerClient rides Redis pub/sub, which doesn't propagate W3C traceparent. The callee starts an orphan trace. Documented limitation, not a bug. Workaround: correlate by user_id / bet_id in Loki. Decision: ADR-0005. Memory: project_otel_microservice_transport_gap.

Why are spanmetrics names unprefixed (calls_total, not traces_spanmetrics_calls_total)? (Engineering)

Convention adopted with ADR-0002. The connector emits calls_total and duration_milliseconds_bucket directly — old docs that reference the prefixed form are wrong. Cross-audit at audits/perf-promql-audit.md.

Why can't I find some traces in Jaeger? (Engineering)

Tail sampling drops ~90 % of OK traces. ERROR traces and traces with any span > 500 ms are kept at 100 %; the rest are sampled at 10 %. If your search by min-duration=0 returns fewer hits than expected, this is why. Decision: ADR-0012. Operator note: engineering/observability-runbook.md §8 pitfall #4.

Why is Husky's prepare hook skipped in production builds? (Engineering)

npm install skips devDependencies under NODE_ENV=production, and Husky is a devDep — so the prepare script (which runs husky install) is a no-op. This is intentional (production images don't need git hooks). Gotcha: a developer running npm i --omit=dev locally will surface the same skip and break the hook chain.


Handover & operations

What's the first thing to do at 3 AM for a P1 alert? (Handover)

handover/oncall-runbook.md §"First response". Acknowledge the page, open the incident channel, identify the failing service from the alert, run the relevant per-service runbook from runbooks/. Don't escalate before triaging — escalation matrix at handover/escalation-matrix.md.

Where do I find a runbook for X? (Handover)

runbooks/README.md. Each runbook is keyed by symptom (postgres-connection-exhaustion, redis-memory-pressure, captcha-break-glass, websocket-fan-out-saturation, etc.). Search by symptom keyword first; if no match, the closest runbook + an incidents/ postmortem is usually the path.

How do I write a postmortem? (Handover)

incidents/0000-template.md. Filled in within 5 working days of incident close. Sections: timeline, customer impact, contributing factors, what worked, what didn't, action items. Cross-link from the relevant runbook so the next on-call benefits.

Can I disable captcha in an emergency? (Handover)

Yes, with caveats. runbooks/captcha-break-glass.md. The procedure is not "set NODE_ENV=local" (that's a posture-check finding) — the right path is to flip the captcha-disable feature flag in GitLab Unleash, audit-log the action, and re-enable as soon as the underlying issue is resolved.

How do I add a new on-call team member? (Handover)

handover/escalation-matrix.md. Adds the rotation, the contact channel (PagerDuty / Slack), and the runbook awareness checklist. Coordinate with the customer's ops team if the engagement model includes shared on-call.

Why does Redis OOM and what do I do about it? (Handover)

runbooks/redis-memory-pressure.md. Root cause: maxmemory is not configured, so Redis grows until the host's RAM ceiling is hit. Engineering follow-up tracked. Mitigation in the runbook: identify the largest keyspace via redis-cli --bigkeys, drop expired data, set a temporary maxmemory-policy. Permanent fix: configure maxmemory per-instance in the deploy config.

How do I refresh the API source-of-truth chain? (Handover)

api/sync-postman.sh. Pulls live Swagger from :4000/swagger.json and :4003/swagger.json, regenerates Postman collections, writes a changelog-draft-<date>.diff, prints an endpoint summary. Then write the api/changelog.md entry by hand.


Cross-cutting

Where is the source of truth for X? (Cross-cutting)

Domain Source of truth
API surface Live Swagger at :4000/swagger / :4003/swagger → snapshotted to api-reference/openapi/
Domain terminology glossary.md
Architecture decisions adr/
Release history api/changelog.md (API) + versions/changelog.md (docs)
Style and voice STYLE.md
Env vars across all repos env-reference.md
Feature flags GitLab Unleash project (per FEATURE_FLAGS_API_URL)
Secrets Doppler workspace ebit-devops
Runbooks runbooks/
Customer customisation recipes recipes/ — internal recipes + recipes/integration-cookbook.md — customer cookbook

How do I version the docs? (Cross-cutting)

versions/README.md. The working copy at docs/ tracks master; releases are pinned via annotated git tags (v1.0, v1.1, …) — cite externally as https://github.com/evo-spin/ebit-docs/blob/v1.0/.... Cut a release with git tag -a vX.Y && git push origin vX.Y after appending an entry to versions/changelog.md. The doc version is independent of the API version and the product version — bumping one does not bump the others. Authoring workflow: versions/CONTRIBUTING.md.

How do I run the doc-CI checks locally? (Cross-cutting)

tools/docs/ (forward-pointer at ci/). Run all checks: ./tools/docs/run-all.sh docs/. Individual checks: find-tbd.sh (scan {{TBD}}), link checker, terminology lint, draft scanner. The same checks run in CI via .github/workflows/docs-ci.yml on every PR.

What does {{TBD}} mean and is it OK to ship one? (Cross-cutting)

{{TBD}} marks a known gap — a fact that's not yet known but the slot for it exists. Yes, it's acceptable to ship. What's not acceptable is silently leaving a wrong fact in place because a {{TBD}} would look unfinished. The CI scanner reports {{TBD}} count + locations release-over-release. See versions/CONTRIBUTING.md §"Marking in-flight content".

Where do I look for "Quick answers" in each track? (Cross-cutting)

Each track README links back here for the cross-track FAQ. The portal entry point is README.md.

How does v1.0 handle deprecations? (Cross-cutting)

It doesn't — v1.0 is the initial release, nothing is deprecated. When a future release deprecates something, the entry under that version in versions/changelog.md lists the deprecated path + replacement + sunset date.

Why is the ebit-api README in Russian? (Cross-cutting)

Historical artefact — the project was bootstrapped by a Russian-speaking team. It's a development-only document, not customer-facing. Translate mentally when quoting; don't conflate with player-facing locales (which are en + de per add-locale.md).

Are these docs customer-shareable? (Cross-cutting)

Most. The split is documented in versions/CONTRIBUTING.md §"Customer-facing review gate". Customer-shareable: business/, delivery/, parts of handover/, and the sanitized layer of security/. Internal-only: engineering/, runbooks/, recipes/ (internal), adr/, data-model/, architecture/, flows/, the internal/ layer of security/, security-findings/.