Skip to content

Launch Checklist

Final pre-GA sweep. Run this top-to-bottom in the week before Phase 8 — GA. Tick every box, capture the verification output, file a customer-side ticket for any failure.

Conventions: each item carries a one-line verify with a concrete command, query, or file path. {{TBD}} indicates a value the customer must fill in before running.


Infrastructure

  • [ ] Terraform applied cleanly to production — verify: terraform -chdir=terraform/perf plan after apply prints No changes.
  • [ ] State stored in S3 backend with locking — verify: terraform/perf/versions.tf has backend "s3" with dynamodb_table set; per terraform/perf/README.md §state.
  • [ ] EBS volumes encrypted — verify: aws ec2 describe-volumes --filters Name=tag:Name,Values=ebit-* --query 'Volumes[].{Id:VolumeId,Encrypted:Encrypted}' --output table shows all Encrypted=true.
  • [ ] IAM least-privilege — verify: SUT instance profile carries only AmazonECRReadOnly-equivalent + SSM read; no *:*.
  • [ ] No 0.0.0.0/0 ingress on app SGs — verify: aws ec2 describe-security-groups --filters Name=tag:Name,Values=ebit-* --query 'SecurityGroups[].IpPermissions[?contains(IpRanges[].CidrIp, \0.0.0.0/0`)]'` returns empty.
  • [ ] Admin SSH CIDR locked to a customer-owned range — verify: terraform.tfvars sets admin_ssh_cidrs explicitly; not relying on data.http.my_ip.
  • [ ] Internal datastore ports not exposed externally — verify: Postgres 5432, Redis 6379/6380, RabbitMQ 5672 not in any public SG inbound rule.
  • [ ] Egress allows ECR + apt + OTel exporter only — verify: egress rules whitelist customer-required outbound; document any wildcard egress decision.
  • [ ] DNS records propagated globally — verify: dig +short {{TBD: dropbet.example.com}} resolves from at least 3 geographic checkers.
  • [ ] TLS certificate valid + auto-renewing — verify: ACM cert in ISSUED state; renewal automation tested or noted.
  • [ ] ALB / ingress in front of admin-fe (if applicable) — verify: not relying on network_mode: host for production admin-fe (see Risks #5).
  • [ ] Postgres pool sized for expected load — verify: max_connections ≥ 300 (or PgBouncer in transaction mode); per-app ?connection_limit=N set per Risks #15.
  • [ ] Postgres backups configured — verify: AWS RDS automated backup retention ≥ 7d, or equivalent for self-hosted Postgres.
  • [ ] Redis persistence policy decided — verify: cache Redis appendonly/save policy matches RPO requirement; bot Redis can be ephemeral.
  • [ ] OTel collector resource limits set — verify: observability/otel-collector.yml memory_limiter configured; container has CPU/memory limits in compose.

Secrets

  • [ ] Doppler prd config audited end-to-end — verify: run docs/audits/doppler-perf-audit.md playbook against prd; commit the output diff.
  • [ ] No dev_* configs referenced from production paths — verify: aws ec2 describe-instances ... user-data and docker-compose.yml reference prd only.
  • [ ] NODE_ENV=production in prd — verify: doppler secrets get NODE_ENV --project ebit-api --config prd --plain returns production.
  • [ ] DEFAULT_LOG_LEVEL=info in prd — verify: same pattern as above.
  • [ ] DEBUG_LOGS_PRETTY=false in prd — verify: same pattern.
  • [ ] DISABLE_RATE_LIMMITING=false in prd — verify: same pattern.
  • [ ] FASTTRACK_JWT_* populated (stub or real) — verify: env validator does not block boot (Risks #1).
  • [ ] No ebit-personal Doppler tokens in prd — verify: doppler service-tokens list shows only customer-owned tokens for prd configs.
  • [ ] All vendor production keys in place — verify per vendor:
  • Sumsub: SUMSUB_APP_TOKEN, SUMSUB_SECRET_KEY
  • Payment provider: CCPAYMENT_* and/or NOWPAYMENTS_*
  • Captcha: RECAPTCHA_* or GEETEST_* (production keys, not sandbox)
  • Email: SENDGRID_API_KEY or SMTP credentials
  • Sentry: SENTRY_DSN for each app
  • [ ] JWT signing keys rotated for production — verify: JWT_SECRET, JWT_VERIFICATION_TOKEN_SECRET are not the .example.env defaults.
  • [ ] Admin default password changed — verify: ADMIN_DEFAULT_PASSWORDadmin; first admin login forced password change documented.
  • [ ] Webhook secret per provider — verify: payment / KYC webhook signature secret matches provider config and Doppler.
  • [ ] Secret rotation runbook drafted — verify: a runbook exists under docs/runbooks/ covering Doppler key rotation and post-rotation app reload.

Observability

  • [ ] Grafana alerts firing on synthetic breach — verify: temporarily inject a 500 on a non-critical route; alert reaches the on-call channel within configured threshold.
  • [ ] service-overview dashboard populated — verify: grafana_url shows non-zero traffic on every panel after a smoke bet.
  • [ ] bullmq dashboard populated — verify: bullmq_queue_jobs panels show non-zero gauges for at least one queue.
  • [ ] redis dashboard populated — verify: ioredis ops + latency panels split cache vs bot Redis.
  • [ ] prisma-postgres dashboard populated — verify: Prisma model/method heatmap + Postgres slow queries + pool saturation visible.
  • [ ] browser-rum dashboard populated — verify: Web Vitals p75/p95 from ebit-fe present (requires @vercel/otel browser export).
  • [ ] logs-trace-pivot derivedFields working — verify: query Loki, click "View trace" link, lands on the matching Jaeger trace.
  • [ ] Loki retention configured — verify: Loki config sets retention to match data retention policy (Risks #16).
  • [ ] Jaeger Badger TTL configured — verify: Jaeger storage TTL set to a finite value to bound disk growth.
  • [ ] Sentry DSNs live — verify: trigger a synthetic error in each app (ebit-api, ebit-fe, ebit-admin-fe); event lands in Sentry within 30 s.
  • [ ] Source-maps uploaded — verify: a synthetic JS error in ebit-fe shows minified-to-source mapping in Sentry.
  • [ ] Custom OTel attributes captured — verify: a recent trace contains user.id (where applicable) and bet.id for a bet-place flow.
  • [ ] Trace blind spots documented — verify: AF-2 trace propagation gap noted in customer wiki (Risks #4).

Performance

  • [ ] Perf test executed within last 7 days against prod-equivalent — verify: timestamped k6 output and Grafana snapshot in the perf report.
  • [ ] Perf report signed — verify: every {{TBD}} in performance-test-report.md replaced; signed by customer-sre.
  • [ ] Auto-abort thresholds verified — verify: synthetic 2x-budget breach was injected and k6 aborted as expected during a dry-run.
  • [ ] Dice-bet SLO decision live — verify: documented re-baseline (SLO=150 ms) OR optimization ticket merged (Risks #7).
  • [ ] Queue stability holds at peak load — verify: PromQL bullmq_queue_jobs{state="wait"} shows zero sustained positive slope during the 27-minute ramp.
  • [ ] Kernel tuning applied on load-gen — verify: sysctl snapshot matches the table in docs/performance-testing.md §9.

Security

  • [ ] Security review completed — verify: signed review document attached; one row per Critical/High in docs/security-register.md.
  • [ ] Zero open Critical or High findings without signed mitigation — verify: register state is fixed or accepted-with-mitigation.
  • [ ] Dependency audit clean — verify: cd ebit-api && npm audit --production and cd ebit-fe && pnpm audit --production and cd ebit-admin-fe && pnpm audit --production show zero high/critical vulnerabilities, or each is documented.
  • [ ] No exposed admin endpoints — verify: curl -i https://{{TBD: api-host}}/admin/user/all without auth returns 401, not 200.
  • [ ] Captcha enforced on sign-up — verify: with production captcha key, a sign-up request without x-captcha-token is rejected; with 'pass' is rejected (NODE_ENV=production blocks the bypass per Risks #8).
  • [ ] Rate limiting active — verify: 30 rapid sign-in attempts to one email triggers the user-lockout cooldown (user_lockout:<email> key in cache Redis).
  • [ ] CORS configured — verify: cross-origin from a non-customer-owned domain is rejected on /auth/*.
  • [ ] CSP headers set — verify: curl -I https://{{TBD: dropbet-host}} shows content-security-policy with no unsafe-inline on scripts.
  • [ ] HSTS active — verify: response carries strict-transport-security ≥ 6 months.
  • [ ] Sensitive log fields redacted — verify: pino formatter strips password, accessToken, refreshToken; spot-check a sign-in failure log line.
  • [ ] 2FA enforced for admin SuperAdmin — verify: SF-029 mitigation in place — seeded admin@admin.com no longer bypasses MFA, or this account has been deleted from prd.

Operational

  • [ ] All runbooks reviewed by on-call team — verify: each file in docs/runbooks/ has been read and dry-run by the on-call team this calendar quarter.
  • [ ] On-call rotation set — verify: PagerDuty (or equivalent) rotation document; coverage matches the chosen SLO (24/7 vs business-hours).
  • [ ] Escalation matrix tested — verify: a test page reached L1 → L2 → L3 within the documented threshold; after-action attached.
  • [ ] Incident drill executed solo by customer team — verify: drill report exists; Evospin observer-only.
  • [ ] RCA template ready — verify: customer wiki has an RCA template; first incident slot reserved.
  • [ ] Backup restore drill executed — verify: a Postgres backup was restored to a scratch instance and a sanity query ran successfully within the last 30 days.
  • [ ] Disaster recovery plan documented — verify: customer ops doc covers Postgres + Redis + ECR loss scenarios; recovery time objectives stated.
  • [ ] Cost monitoring active — verify: AWS budget alerts on EC2, ECR, RDS, data egress; thresholds match expected spend.
  • [ ] Container resource limits set — verify: every service in docker-compose.yml (or production orchestrator config) has cpus and mem_limit set.

Compliance

  • [ ] Terms of Service live + linked from footer — verify: curl -s https://{{TBD: dropbet-host}}/terms returns 200 with the customer's legal text.
  • [ ] Privacy Policy live + linked from footer — verify: same pattern.
  • [ ] KYC flow tested in production credentials — verify: a test user submits production Sumsub flow; level transitions correctly.
  • [ ] AML policy attached + signed — verify: customer compliance has signed AML policy referenced in the production wiki.
  • [ ] Self-exclusion / responsible-gaming surface live — verify: a logged-in user can self-exclude via the user settings flow; subsequent sign-in attempts are blocked.
  • [ ] Data retention policy implemented — verify: Loki retention matches policy; PII tables have documented retention; Sentry retention set.
  • [ ] GDPR / CCPA data subject access workflow — verify: SOP exists for handling subject access requests; admin endpoint to export user data tested.
  • [ ] Gaming license active for target jurisdiction — verify: license document on file; geo-IP ban list in country/ module reflects unauthorised jurisdictions (Risks #18).
  • [ ] Legal review of in-product copy — verify: customer legal has signed off on bonus terms, withdrawal terms, dispute resolution copy.

Customer-side

  • [ ] DNS pointed to production — verify: as Infrastructure §DNS above; resolves to production load balancer or SUT.
  • [ ] Brand assets in place — verify: logo, favicon, palette match the brand pack; OG image set for social sharing.
  • [ ] Support email configured — verify: support@{{TBD: customer-domain}} reaches the customer support inbox; auto-reply on.
  • [ ] In-product help points to customer's help center — verify: footer "Help" link goes to customer-owned URL, not Evospin defaults.
  • [ ] Onboarding email sequence configured — verify: post-sign-up email cadence (welcome, KYC reminder, first-deposit) tested end-to-end.
  • [ ] Marketing analytics wired — verify: customer's GTM / Segment / equivalent receives events from dropbet (registration, deposit, first bet).
  • [ ] Status page live — verify: customer-owned status page exists at https://status.{{TBD: customer-domain}}; default state is operational.
  • [ ] Pricing / fees schedule public — verify: deposit / withdraw fees and limits page lives at the public URL agreed in Discovery.
  • [ ] Public communication plan staged — verify: customer marketing has draft press release + customer email + in-app banner ready for GA day.

Final go / no-go

The customer PM, customer SRE, customer compliance, and Evospin delivery lead each sign here:

Role Name Signed Date
Customer PM {{TBD}}
Customer SRE {{TBD}}
Customer compliance {{TBD}}
Evospin delivery {{TBD}}

If any item above is unchecked at sign-time, it goes in the post-launch backlog with an explicit owner and date. Items in the Critical category (zero open Critical security findings, license active, captcha enforced, error rate < 0.1% in pilot) are non-negotiable — failing one of those rolls Phase 8 back to Phase 7.