Two-Week Onboarding Curriculum¶

This plan assumes you completed the Day One runbook and have a healthy stack running locally. Each day below is scoped to 4-6 hours — adjust pace to your background.

Week 1 — Foundations¶

Read: - docs/flows/dropbet-sign-in.md — cookie flow, 2FA, JwtGuard, session service - docs/flows/dropbet-sign-up.md — registration, reCAPTCHA, email verification

Exercise: 1. Sign in on http://localhost:3000 with local@example.com / password. 2. Open browser DevTools → Application → Cookies. Identify access_token, refresh_token, and socket_token. Note their path, domain, and HttpOnly flag. 3. Copy the access_token value. Decode it at https://jwt.io (the secret is in .env). Find the sub (userId) and exp (expiry) claims. 4. Open Jaeger (http://localhost:16686), select ebit-api, search for POST /auth/sign-in. Click the trace and identify the bcrypt.compare latency inside the handler span.

Self-check: 1. Which three cookies does POST /auth/verify-2fa set, and where is the cookie-setting code? → apps/api/src/auth/cookies.ts:8-23 2. What happens if you call POST /auth/sign-in with a non-existent email? How does the response differ from a wrong password? → SF-001: unknown email returns USER_INVALID_CREDENTIALS before bcrypt; known email returns USER_INVALID_PASSWORD after bcrypt. Timing oracle. 3. Where is the lockout counter stored, and what is its bug? → Redis keys user_lockout:<email> + user_attempts:<email>; SF-002: counter is deleted when lockout arms, so it resets after TTL.

Day 2: The bet pipeline¶

Read: - docs/flows/dropbet-bet-place.md — shared pipeline: lock → balance deduction → RNG → settlement → BullMQ side-effects - docs/flows/dropbet-house-game.md — dice and limbo specifics, provably-fair seed mechanics

Exercise: 1. Place a dice bet on http://localhost:3000. Then place a limbo bet. 2. In Jaeger, find both traces. Compare the span waterfalls — identify the shared pipeline spans (lock, balance, bet INSERT) vs. game-specific spans (dice RNG vs. limbo multiplier calc). 3. Call GET /provably-fair/seed via Swagger (authenticate with your access_token). Note the clientSeed, hashedServerSeed, and nonce. Place another bet, then call the endpoint again — confirm nonce incremented. 4. Verify fairness: the server seed hash displayed before the bet must match SHA256(serverSeed) revealed after the bet settles (check the bet detail response payload.gameHashedServerSeed).

Self-check: 1. What Redis key does @PlaceBetLock acquire, and what happens if two bets race? → Lock key per userId; second bet blocks until the first releases. If lock TTL expires mid-handler, fairness seed race risk (SF-005). 2. Where is the insufficient-funds guard implemented? → WHERE amount >= betAmount in the Prisma UPDATE at apps/api/src/accounting/user-balance.repository.ts. No pg CHECK constraint (SF-006). 3. What fires after a bet settles? → bet_settled_queue BullMQ job: leaderboard UPSERT, rakeback calc, affiliate notification, live-bets push, challenge progress. Fire-and-forget (SF-007).

Day 3: Observability deep-dive¶

Read: - docs/observability.md — OTel SDK bootstrap, pino vs EvoLogger, trace-to-log correlation - docs/observability.md — canonical observability wiring + dashboards

Exercise: 1. Pick a recent Jaeger trace for POST /casino/games/house/dice/bets. Copy its traceID. 2. In Grafana Explore (http://localhost:3003 → Explore → Loki datasource), query:

{service_name="ebit-api"} |= "<your-traceID>"

You should see at least one pino log line. Click a line — the "View trace" button links back to Jaeger. 3. Now query EvoLogger records:

{source="docker_filelog"} |= "EvoLogger"

These records reach Loki via the Docker filelog receiver, not the OTel SDK. 4. Open the "Service Overview" dashboard in Grafana. Identify the request rate, error rate, and p95 latency for ebit-api. 5. Run a PromQL query in Prometheus (http://localhost:9090):

histogram_quantile(0.95, sum(rate(traces_spanmetrics_latency_bucket{service_name="ebit-api"}[5m])) by (le, span_name))

Self-check: 1. Why do pino and EvoLogger records take different paths to Loki? → Pino is bridged via @opentelemetry/instrumentation-pino → OTLP → collector → Loki. EvoLogger writes to winston → docker stdout → filelog/docker receiver → collector → Loki. 2. Where is the OTel SDK initialized? → libs/shared/src/basic/pre/pre-otel.main.ts, imported first in every main.ts. 3. Why can't you trace a request from ebit-rt back through ebit-api in a single Jaeger trace? → The Nest Redis pub/sub transport (@ExternalControllerClient) does not propagate W3C traceparent. Callee spans appear as orphan roots. See architecture.md AF-2.

Day 4: Add a read-only endpoint¶

Exercise: build GET /users/me/simple-stats

This exercise teaches the guard + controller + service + repository + DTO pattern.

Create a new method in apps/api/src/user/user.controller.ts that returns { totalBets, totalWagered, joinedAt } for the authenticated user. Use @UseGuards(JwtGuard) and @Get('me/simple-stats').
Implement the query in UserService — count bets and sum amounts from the bet table where userId = req.user.id.

Start the dev server on the host: cd ebit-api && npm run start:dev. Hit your endpoint with curl:

curl -s http://localhost:4000/users/me/simple-stats \
  -H "Cookie: access_token=<your-token>" | jq

Find the trace in Jaeger. You should see your controller method span + Prisma query spans.
When satisfied, discard the changes (git checkout -- .) — this is a learning exercise, not a shipped feature.

Self-check: 1. What is the difference between @UseGuards(JwtGuard) at the method level vs. the global guard in app.module.ts? → The global guard runs on every route. Method-level @UseGuards adds guards in addition to globals. 2. Where does Prisma tracing get enabled? → libs/_prisma/src/schema/api.prisma:4 — previewFeatures = ["tracing"] + PrismaInstrumentation in pre-otel.main.ts.

Day 5: Prisma and the data model¶

Read: - docs/flows/dropbet-bet-history.md — bet detail endpoints, cache strategy, JwtGuard gap - docs/architecture.md section 5 — split Prisma schema + core entities. Redis keyspace cheat sheet lives at ../data-model/redis-keyspace.md.

Exercise: 1. Open the three schema files: - libs/_prisma/src/schema/api.prisma (65 models — public schema) - libs/_prisma/src/schema/blackjack.prisma (8 models — blackjack schema) - libs/_prisma/src/schema/speed_roulette.prisma (3 models — speed_roulette schema) 2. Find the Bet model in api.prisma. Note the @@unique([roundId, userId]) constraint — this is the double-settle backstop (SF-004). 3. Add a trivial column to the User model: onboardingComplete Boolean @default(false) @map("onboarding_complete"). 4. Generate and apply the migration:

cd ebit-api
npm run db:migrate:dev -- --name add-onboarding-complete
npm run prisma:generate

5. Verify the column exists: docker compose exec ebit-db psql -U ebit -c "\\d user" | grep onboarding. 6. Revert when done: git checkout -- libs/_prisma/ and npm run db:reset.

Self-check: 1. Why are there three .prisma files instead of one? → previewFeatures = ["prismaSchemaFolder", "multiSchema"] enables splitting. Blackjack and speed-roulette have their own Postgres schemas to isolate game-specific tables. 2. Why must you use npm run db:migrate:dev instead of npx prisma migrate dev? → The npm scripts wrap Prisma with env-cmd -f .env so DATABASE_URL is set correctly. Direct npx prisma misses the env file. 3. What does bet.controller.ts:31 comment out, and why is it a security finding? → JwtGuard is commented out on the bet-detail endpoint. Any caller who can guess a betId reads the full seed material. SF-008.

Week 1 wrap-up — Operational toolkit (run alongside days 1-5)¶

You're going to live in these tools during week 2 and beyond. Get hands-on with each one this week — they don't take a full day; spread across days 1-5 in 30-60-minute slots.

1. Reproduce the bet-place trace via the runnable demo¶

Walk through ../e2e-trace-demo.md end-to-end. It signs in with local@example.com, places a dice bet via the REST API directly (no UI), fetches the resulting trace from Jaeger by traceID, and pivots into Loki + Prometheus to confirm trace-correlated logs and spanmetrics latency buckets.

Verify you can answer: - Which span shows the BullMQ enqueue, and why does it show as an EVALSHA against ioredis instead of as a separate broker hop? (Hint: BullMQ is Redis-backed; RabbitMQ is wired only to the stubbed Fast Track module — see ../adr/0003-bullmq-not-rabbitmq.md.) - Where in the waterfall is the @PlaceBetLock mutex acquired/released? What's the lock key? - Why is there no parent-child link between ebit-fe-browser's page-view trace and the bet-place server trace? (Hint: traceparent is forwarded only on the FE→API hop; browser RUM traces stand alone.)

2. Run a smoke perf test (50 VU, 1 min)¶

The smallest k6 profile is tests-perf/profiles/smoke.js — 50 virtual users, 1 minute, mixed journey (60% browse, 30% bet, 10% admin):

cd ~/ebit
BASE_URL=http://localhost:4000 \
TEST_EMAIL=local@example.com TEST_PASS=password \
ADMIN_EMAIL=admin@admin.com  ADMIN_PASS=admin \
k6 run tests-perf/profiles/smoke.js

Open the ebit-perf-test dashboard in Grafana in a second tab. You should see smoke_signin_latency, smoke_bet_latency, smoke_balance_latency Trends; p95 thresholds sign-in <200 ms, bet <200 ms, balance <100 ms; http_req_failed rate <1%. If a threshold breaks, the run exits non-zero — that's a real signal, not test flake.

See ../performance-testing.md for methodology and ../performance-test-report.md for the latest baseline.

3. Familiarize with the 8 Grafana dashboards¶

Open Grafana at http://localhost:3003 (admin / grafana). Eight dashboards are provisioned under ../../observability/grafana/provisioning/dashboards/. Spend 10 minutes per dashboard with the smoke test running so panels have data.

Dashboard	What it shows	When to open it
Service Overview	RED metrics per service (rate, errors, p95) from spanmetrics	Default landing page; first thing you check
ebit-perf-test	k6 custom metrics, threshold status, run-vs-baseline	During and after every perf run
perf-system	Host metrics from node_exporter	When latency rises with no application-level cause
Logs-Trace Pivot	Loki + derivedFields link from log to trace	When debugging a specific `trace_id`
Prisma + Postgres	Prisma query rate, slowest models, Postgres lock waits	When DB is suspect
BullMQ	Per-queue depth (waiting / active / failed / completed)	When async side-effects look stuck
Redis	Both Redis instances: ops/sec, memory, evictions	When cache or BullMQ acts up
Browser RUM	`ebit-fe-browser` traces: TTFB, LCP, route-load p95	For player-perceived latency complaints

By end of week 1 you should know which dashboard to open first for any given symptom.

4. Read all 14 runbooks¶

Read every file in ../runbooks/ — fourteen files, each 50-120 lines, all following the template: Symptom → Likely cause → Diagnosis → Fix → Prevention.

You'll author one runbook of your own in week 2 (../handover/oncall-readiness.md) — pick a candidate scenario as you read and add it to your notes.

The ../handover/oncall-runbook.md cross-references these by symptom; skim that doc now too so you know what the runbook directory is for during a real incident.

5. Postgres-down outage drill¶

Practice "something broke, what do I do" in a controlled environment:

Start steady-state traffic: k6 run tests-perf/profiles/smoke.js &.
Kill Postgres: docker compose stop ebit-db.
Observe — Service Overview dashboard: ebit-api error rate spikes; p95 climbs. Logs-Trace Pivot: search Can't reach database server. Jaeger: find a recent failed trace, locate the failing prisma:client:operation span.
Recover — docker compose start ebit-db; watch docker compose logs -f ebit-db for database system is ready to accept connections; ebit-api typically resumes serving in 5-10 seconds.
Confirm error rate returns to baseline; BullMQ bet_settled_queue drains (workers retry while DB is down).

See the proper procedure in ../runbooks/db-down.md.

Goal: see how the team reasons about correctness and trace coverage before merging.

Live PR: ask your team lead for an open or recently merged bet-related PR. Sit with a senior engineer for ~30 minutes and read it together — focus on guard placement, span names, BullMQ enqueue placement, Prisma transaction boundaries.
Simulated (if no PR is open): git log --oneline --all -- ebit-api/apps/api/src/bet/ ebit-api/apps/api/src/casino/ | head -20, pick a recent bet-touching commit, read it as if it were a PR, then compare your notes against the merged commit's discussion or with a senior engineer over Slack.

End-of-week-1 self-check¶

By end of week 1 you should be able to answer without looking:

Which Grafana dashboard to open first for: high error rate, slow DB, stuck queue, browser-perceived slowness?
Which trace span shows the BullMQ enqueue, and which Redis instance backs it?
How do you navigate from a Loki log line to the parent Jaeger trace? (derivedFields link)

Week 2 — Depth¶

Day 6: BullMQ and async processing¶

Read: - docs/flows/dropbet-speed-roulette.md — BullMQ state machine, concurrency=1, EOS dependency - docs/flows/dropbet-leaderboard.md — leaderboard write path from bet_settled_queue

Exercise: 1. Inspect BullMQ queues in Redis:

docker compose exec ebit-redis redis-cli -a cache KEYS "bull:*" | sort | head -30

2. Check the bet_settled_queue specifically:

docker compose exec ebit-redis redis-cli -a cache LLEN "bull:bet_settled_queue:wait"
docker compose exec ebit-redis redis-cli -a cache LLEN "bull:bet_settled_queue:completed"

3. Place a bet, then immediately check the queue lengths again — you should see the job transit from wait through active to completed. 4. Find the speed-roulette state queue: bull:speed_roulette_state_queue:*. Note the concurrency: 1 design — only one round processes at a time.

Self-check: 1. Why does Evospin use BullMQ instead of RabbitMQ for async work? → RabbitMQ is only wired to the stubbed Fast Track module (disabled=true). All production queues ride on BullMQ (Redis-backed) via @nestjs/bullmq. 2. What happens if Redis goes down during a bet_settled_queue job? → The job is lost after removeOnFail.age elapses. Side-effects (leaderboard, rakeback, affiliate) silently don't fire. SF-007. 3. What is the risk of concurrency: 1 on the speed-roulette state processor? → A job that exhausts retries without adding a follow-up deadlocks the queue. startIfNotStarted only bootstraps on an empty queue.

Day 7: WebSocket and the real-time layer¶

Read: - docs/flows/rt-websocket.md — socket.io connect, auth, rooms, server pushes, online-count inflation

Exercise: 1. Open http://localhost:3000 and sign in. Open browser DevTools → Network → WS tab. Find the socket.io connection to ws://localhost:4001/events?transport=websocket. 2. Watch for the AuthSuccess message and subsequent UsersOnlineUpdated broadcasts (every 10 seconds). 3. Using Jaeger, search for send /events spans from ebit-rt. Each server-to-client emit produces one span. 4. In a second browser tab (or incognito), connect without signing in. Confirm no AuthSuccess or AuthError fires — the socket stays connected but unauthenticated.

Self-check: 1. Where does the socket_token come from? → Set as a cookie by POST /auth/verify-2fa (or POST /auth/sign-in for non-2FA users) alongside access_token and refresh_token. Extracted by extractSocketAuthToken at apps/rt/src/utils.ts:24. 2. Why does handleDisconnect not call zrem on ONLINE_USERS_KEY? → By design — the zset entry stays until the TTL sweep (ONLINE_USER_TTL_SECONDS). Short-lived probes and reconnects are smoothed out. The user appears "online" for up to TTL after disconnect. 3. What happens if ebit-rt is scaled to 2 replicas? → Per-user delivery breaks. clientSockets is a local Map; a message.user-targeted emit on instance A misses sockets on instance B. Fix: @socket.io/redis-adapter (not installed). AF-3.

Day 8: Wallet and balance operations¶

Read: - docs/flows/dropbet-wallet.md — balance view, vault transfer, negative-balance bug - docs/flows/dropbet-challenges.md — challenges, promos, the missing @Post decorator

Exercise: 1. Call GET /accounting/balances via Swagger with your access_token. Note the response structure: balances[], each with amount, vaultAmount, usdAmount. 2. Transfer funds to the vault via POST /accounting/balances/to-vault with { "amount": 100, "currencyId": "DBC" }. Note the balance change. 3. Try transferring more than your remaining balance. Observe the bug: SF-013 — toVault has no overdraft guard; the balance can go negative. 4. Reset via npm run db:seed if your balance is corrupted.

Self-check: 1. Why is there no GET /accounting/transactions endpoint? → The transaction ledger is only served over the rt websocket via Private.TransactionFindMany. SF-015. 2. Where is usdAmount computed, and why can two sequential reads disagree? → ExchangeRatesService.toUsd runs per-request; rates refresh independently. SF-017. 3. Why does POST /promo/public/:code return 404? → The handler at promo.controller.ts:100 has guards and @ApiOperation but no @Post() HTTP-verb decorator. Nest skips it during route discovery.

Day 9: The admin surface¶

Read: - docs/flows/admin-sign-in.md — admin auth, 2FA, cookie-name mismatch - docs/flows/admin-user-mgmt.md — ban/unban, notes, audit log - docs/flows/admin-bets.md — bet listing, dead bo route, SuperAdmin MFA bypass

Exercise: 1. Call POST /auth/sign-in with { "email": "admin@admin.com", "password": "admin" } via curl or Swagger. This is the non-2FA admin — you get tokens directly. 2. Use the access_token to call POST /admin/user/all with { "page": 1, "take": 5, "search": "local@example.com" }. Find the seeded dropbet user. 3. Call GET /admin/user/<id>/full/stats to see the user's wagering summary. 4. Call GET /admin/user/admin-audit?page=1&take=10&userId=<your-admin-id>. Note: userId filters by the admin actor, not the target user — this is a documented quirk.

Self-check: 1. What is AdminLoggerInterceptor and which routes does it skip? → It wraps every non-GET /admin/* mutation. GET routes are skipped at line 27. It writes admin_action_log rows via a tap() after the handler resolves. safeLog swallows insert errors. 2. Does banUser record who performed the ban? → No. user.service.ts:383 takes an admin parameter but doesn't persist it. The only "who banned whom" record is the AdminActionLog row from the interceptor. 3. What does permission.guard.ts:24 do for SuperAdmin, and why is it a security finding? → Returns true immediately, before the MFA check at line 40. A SuperAdmin without mfaSecret (like the seeded admin@admin.com) bypasses MFA. SF-029.

Day 10: Blackjack, security, and pairing¶

Read: - docs/flows/dropbet-blackjack.md — init → action loop → settle, abandoned-hand fund lockup, orphan ebit-bj - docs/flows/dropbet-password-reset.md — forgeable token, no one-use enforcement - docs/weaknesses-register.md — aggregated known weaknesses (AF/SF/FM/WK)

Exercise: 1. Play a blackjack hand end-to-end via Swagger: POST /casino/games/house/blackjack/init → repeated POST /casino/games/house/blackjack/handleAction (hit/stand/double/split) → observe settlement. 2. Pick one security finding from the architecture doc's SF table that interests you. Read the flow doc it references. Trace the issue back to the specific file and line. Write a one-paragraph summary of what you found and how you would fix it. 3. Pair with a teammate on a real ticket or review an open PR. Use the flow docs as reference material during the review.

Self-check: 1. What happens if a player closes their browser mid-blackjack-hand? → The wager is locked indefinitely. No TTL or cron auto-resolves is_finished=false rounds. The player must return and call getActiveState to resume. 2. Why does ebit-bj exist if dropbet doesn't use it? → It's an orphaned app with its own session-token scheme and EVO-Games wallet RPC. The dropbet client exclusively hits ebit-api's /casino/games/house/blackjack/*. AF-4. 3. Is the password-reset token one-use? → No. The same token works for the entire TTL window (up to 1200 seconds). The E2E spec confirms two successful resets with one token.

After week 2¶

You have read all 15 flow docs, exercised every major subsystem, and understand the known weaknesses. Next steps:

Deep-dive into a module — pick the area closest to your first assigned work. Read the source alongside the flow doc.
Review the glossary — docs/glossary.md defines 50+ domain terms used across the codebase.
Read the ADRs — docs/adr/ documents the 8 key architectural decisions and the tradeoffs behind them.
Contribute — read CONTRIBUTING.md for the branch, commit, PR, and review conventions.