Launch Checklist¶
Final pre-GA sweep. Run this top-to-bottom in the week before Phase 8 — GA. Tick every box, capture the verification output, file a customer-side ticket for any failure.
Conventions: each item carries a one-line verify with a concrete command, query, or file path.
{{TBD}}indicates a value the customer must fill in before running.
Infrastructure¶
- [ ] Terraform applied cleanly to production — verify:
terraform -chdir=terraform/perf planafter apply printsNo changes. - [ ] State stored in S3 backend with locking — verify:
terraform/perf/versions.tfhasbackend "s3"withdynamodb_tableset; perterraform/perf/README.md§state. - [ ] EBS volumes encrypted — verify:
aws ec2 describe-volumes --filters Name=tag:Name,Values=ebit-* --query 'Volumes[].{Id:VolumeId,Encrypted:Encrypted}' --output tableshows allEncrypted=true. - [ ] IAM least-privilege — verify: SUT instance profile carries only
AmazonECRReadOnly-equivalent + SSM read; no*:*. - [ ] No
0.0.0.0/0ingress on app SGs — verify:aws ec2 describe-security-groups --filters Name=tag:Name,Values=ebit-* --query 'SecurityGroups[].IpPermissions[?contains(IpRanges[].CidrIp, \0.0.0.0/0`)]'` returns empty. - [ ] Admin SSH CIDR locked to a customer-owned range — verify:
terraform.tfvarssetsadmin_ssh_cidrsexplicitly; not relying ondata.http.my_ip. - [ ] Internal datastore ports not exposed externally — verify: Postgres 5432, Redis 6379/6380, RabbitMQ 5672 not in any public SG inbound rule.
- [ ] Egress allows ECR + apt + OTel exporter only — verify: egress rules whitelist customer-required outbound; document any wildcard egress decision.
- [ ] DNS records propagated globally — verify:
dig +short {{TBD: dropbet.example.com}}resolves from at least 3 geographic checkers. - [ ] TLS certificate valid + auto-renewing — verify: ACM cert in
ISSUEDstate; renewal automation tested or noted. - [ ] ALB / ingress in front of admin-fe (if applicable) — verify: not relying on
network_mode: hostfor production admin-fe (see Risks #5). - [ ] Postgres pool sized for expected load — verify:
max_connections≥ 300 (or PgBouncer in transaction mode); per-app?connection_limit=Nset per Risks #15. - [ ] Postgres backups configured — verify: AWS RDS automated backup retention ≥ 7d, or equivalent for self-hosted Postgres.
- [ ] Redis persistence policy decided — verify: cache Redis
appendonly/savepolicy matches RPO requirement; bot Redis can be ephemeral. - [ ] OTel collector resource limits set — verify:
observability/otel-collector.ymlmemory_limiterconfigured; container has CPU/memory limits in compose.
Secrets¶
- [ ] Doppler
prdconfig audited end-to-end — verify: rundocs/audits/doppler-perf-audit.mdplaybook againstprd; commit the output diff. - [ ] No
dev_*configs referenced from production paths — verify:aws ec2 describe-instances ...user-data anddocker-compose.ymlreferenceprdonly. - [ ]
NODE_ENV=productioninprd— verify:doppler secrets get NODE_ENV --project ebit-api --config prd --plainreturnsproduction. - [ ]
DEFAULT_LOG_LEVEL=infoinprd— verify: same pattern as above. - [ ]
DEBUG_LOGS_PRETTY=falseinprd— verify: same pattern. - [ ]
DISABLE_RATE_LIMMITING=falseinprd— verify: same pattern. - [ ]
FASTTRACK_JWT_*populated (stub or real) — verify: env validator does not block boot (Risks #1). - [ ] No ebit-personal Doppler tokens in
prd— verify:doppler service-tokens listshows only customer-owned tokens forprdconfigs. - [ ] All vendor production keys in place — verify per vendor:
- Sumsub:
SUMSUB_APP_TOKEN,SUMSUB_SECRET_KEY - Payment provider:
CCPAYMENT_*and/orNOWPAYMENTS_* - Captcha:
RECAPTCHA_*orGEETEST_*(production keys, not sandbox) - Email:
SENDGRID_API_KEYor SMTP credentials - Sentry:
SENTRY_DSNfor each app - [ ] JWT signing keys rotated for production — verify:
JWT_SECRET,JWT_VERIFICATION_TOKEN_SECRETare not the.example.envdefaults. - [ ] Admin default password changed — verify:
ADMIN_DEFAULT_PASSWORD≠admin; first admin login forced password change documented. - [ ] Webhook secret per provider — verify: payment / KYC webhook signature secret matches provider config and Doppler.
- [ ] Secret rotation runbook drafted — verify: a runbook exists under
docs/runbooks/covering Doppler key rotation and post-rotation app reload.
Observability¶
- [ ] Grafana alerts firing on synthetic breach — verify: temporarily inject a 500 on a non-critical route; alert reaches the on-call channel within configured threshold.
- [ ]
service-overviewdashboard populated — verify:grafana_urlshows non-zero traffic on every panel after a smoke bet. - [ ]
bullmqdashboard populated — verify:bullmq_queue_jobspanels show non-zero gauges for at least one queue. - [ ]
redisdashboard populated — verify: ioredis ops + latency panels split cache vs bot Redis. - [ ]
prisma-postgresdashboard populated — verify: Prisma model/method heatmap + Postgres slow queries + pool saturation visible. - [ ]
browser-rumdashboard populated — verify: Web Vitals p75/p95 from ebit-fe present (requires@vercel/otelbrowser export). - [ ]
logs-trace-pivotderivedFields working — verify: query Loki, click "View trace" link, lands on the matching Jaeger trace. - [ ] Loki retention configured — verify: Loki config sets retention to match data retention policy (Risks #16).
- [ ] Jaeger Badger TTL configured — verify: Jaeger storage TTL set to a finite value to bound disk growth.
- [ ] Sentry DSNs live — verify: trigger a synthetic error in each app (
ebit-api,ebit-fe,ebit-admin-fe); event lands in Sentry within 30 s. - [ ] Source-maps uploaded — verify: a synthetic JS error in
ebit-feshows minified-to-source mapping in Sentry. - [ ] Custom OTel attributes captured — verify: a recent trace contains
user.id(where applicable) andbet.idfor a bet-place flow. - [ ] Trace blind spots documented — verify: AF-2 trace propagation gap noted in customer wiki (Risks #4).
Performance¶
- [ ] Perf test executed within last 7 days against prod-equivalent — verify: timestamped k6 output and Grafana snapshot in the perf report.
- [ ] Perf report signed — verify: every
{{TBD}}inperformance-test-report.mdreplaced; signed bycustomer-sre. - [ ] Auto-abort thresholds verified — verify: synthetic 2x-budget breach was injected and k6 aborted as expected during a dry-run.
- [ ] Dice-bet SLO decision live — verify: documented re-baseline (SLO=150 ms) OR optimization ticket merged (Risks #7).
- [ ] Queue stability holds at peak load — verify: PromQL
bullmq_queue_jobs{state="wait"}shows zero sustained positive slope during the 27-minute ramp. - [ ] Kernel tuning applied on load-gen — verify:
sysctlsnapshot matches the table indocs/performance-testing.md§9.
Security¶
- [ ] Security review completed — verify: signed review document attached; one row per Critical/High in
docs/security-register.md. - [ ] Zero open Critical or High findings without signed mitigation — verify: register state is
fixedoraccepted-with-mitigation. - [ ] Dependency audit clean — verify:
cd ebit-api && npm audit --productionandcd ebit-fe && pnpm audit --productionandcd ebit-admin-fe && pnpm audit --productionshow zero high/critical vulnerabilities, or each is documented. - [ ] No exposed admin endpoints — verify:
curl -i https://{{TBD: api-host}}/admin/user/allwithout auth returns401, not200. - [ ] Captcha enforced on sign-up — verify: with production captcha key, a sign-up request without
x-captcha-tokenis rejected; with'pass'is rejected (NODE_ENV=productionblocks the bypass per Risks #8). - [ ] Rate limiting active — verify: 30 rapid sign-in attempts to one email triggers the user-lockout cooldown (
user_lockout:<email>key in cache Redis). - [ ] CORS configured — verify: cross-origin from a non-customer-owned domain is rejected on
/auth/*. - [ ] CSP headers set — verify:
curl -I https://{{TBD: dropbet-host}}showscontent-security-policywith nounsafe-inlineon scripts. - [ ] HSTS active — verify: response carries
strict-transport-security≥ 6 months. - [ ] Sensitive log fields redacted — verify: pino formatter strips
password,accessToken,refreshToken; spot-check a sign-in failure log line. - [ ] 2FA enforced for admin SuperAdmin — verify: SF-029 mitigation in place — seeded
admin@admin.comno longer bypasses MFA, or this account has been deleted fromprd.
Operational¶
- [ ] All runbooks reviewed by on-call team — verify: each file in
docs/runbooks/has been read and dry-run by the on-call team this calendar quarter. - [ ] On-call rotation set — verify: PagerDuty (or equivalent) rotation document; coverage matches the chosen SLO (24/7 vs business-hours).
- [ ] Escalation matrix tested — verify: a test page reached L1 → L2 → L3 within the documented threshold; after-action attached.
- [ ] Incident drill executed solo by customer team — verify: drill report exists; Evospin observer-only.
- [ ] RCA template ready — verify: customer wiki has an RCA template; first incident slot reserved.
- [ ] Backup restore drill executed — verify: a Postgres backup was restored to a scratch instance and a sanity query ran successfully within the last 30 days.
- [ ] Disaster recovery plan documented — verify: customer ops doc covers Postgres + Redis + ECR loss scenarios; recovery time objectives stated.
- [ ] Cost monitoring active — verify: AWS budget alerts on EC2, ECR, RDS, data egress; thresholds match expected spend.
- [ ] Container resource limits set — verify: every service in
docker-compose.yml(or production orchestrator config) hascpusandmem_limitset.
Compliance¶
- [ ] Terms of Service live + linked from footer — verify:
curl -s https://{{TBD: dropbet-host}}/termsreturns 200 with the customer's legal text. - [ ] Privacy Policy live + linked from footer — verify: same pattern.
- [ ] KYC flow tested in production credentials — verify: a test user submits production Sumsub flow; level transitions correctly.
- [ ] AML policy attached + signed — verify: customer compliance has signed AML policy referenced in the production wiki.
- [ ] Self-exclusion / responsible-gaming surface live — verify: a logged-in user can self-exclude via the user settings flow; subsequent sign-in attempts are blocked.
- [ ] Data retention policy implemented — verify: Loki retention matches policy; PII tables have documented retention; Sentry retention set.
- [ ] GDPR / CCPA data subject access workflow — verify: SOP exists for handling subject access requests; admin endpoint to export user data tested.
- [ ] Gaming license active for target jurisdiction — verify: license document on file; geo-IP ban list in
country/module reflects unauthorised jurisdictions (Risks #18). - [ ] Legal review of in-product copy — verify: customer legal has signed off on bonus terms, withdrawal terms, dispute resolution copy.
Customer-side¶
- [ ] DNS pointed to production — verify: as Infrastructure §DNS above; resolves to production load balancer or SUT.
- [ ] Brand assets in place — verify: logo, favicon, palette match the brand pack; OG image set for social sharing.
- [ ] Support email configured — verify:
support@{{TBD: customer-domain}}reaches the customer support inbox; auto-reply on. - [ ] In-product help points to customer's help center — verify: footer "Help" link goes to customer-owned URL, not Evospin defaults.
- [ ] Onboarding email sequence configured — verify: post-sign-up email cadence (welcome, KYC reminder, first-deposit) tested end-to-end.
- [ ] Marketing analytics wired — verify: customer's GTM / Segment / equivalent receives events from dropbet (registration, deposit, first bet).
- [ ] Status page live — verify: customer-owned status page exists at
https://status.{{TBD: customer-domain}}; default state is operational. - [ ] Pricing / fees schedule public — verify: deposit / withdraw fees and limits page lives at the public URL agreed in Discovery.
- [ ] Public communication plan staged — verify: customer marketing has draft press release + customer email + in-app banner ready for GA day.
Final go / no-go¶
The customer PM, customer SRE, customer compliance, and Evospin delivery lead each sign here:
| Role | Name | Signed | Date |
|---|---|---|---|
| Customer PM | {{TBD}} | ||
| Customer SRE | {{TBD}} | ||
| Customer compliance | {{TBD}} | ||
| Evospin delivery | {{TBD}} |
If any item above is unchecked at sign-time, it goes in the post-launch backlog with an explicit owner and date. Items in the Critical category (zero open Critical security findings, license active, captcha enforced, error rate < 0.1% in pilot) are non-negotiable — failing one of those rolls Phase 8 back to Phase 7.