Performance Test Report¶
Executive Summary¶
Evospin was load-tested under a stepped ramp from 50 to 10,000 concurrent virtual
users. {{TBD: 1-2 sentence verdict — e.g. "The platform sustained X concurrent
users within the p95 < 100ms SLO. First breach occurred at Y users, caused by
Z."}} The /casino/games/house/dice/bet endpoint is a pre-existing exception:
its baseline p95 of 108 ms at a single user already exceeds the 100 ms target,
a code-level concern unrelated to load.
Verdict: {{TBD: GREEN / YELLOW / RED traffic-light}}
| Metric | Value |
|---|---|
| Peak concurrency sustained within SLO | {{TBD}} VUs |
| First SLO breach | {{TBD}} VUs on {{TBD: endpoint}} |
| Peak error rate observed | {{TBD}}% |
| Primary bottleneck | {{TBD}} |
Test Setup¶
| Parameter | Value |
|---|---|
| Hardware | {{TBD: instance type, vCPU, RAM — e.g. "1x t3.xlarge, 4 vCPU, 16 GB"}} |
| Commit | {{TBD: git SHA}} |
| Date | {{TBD: YYYY-MM-DD}} |
| Duration | 27 minutes (6 stages) |
| Load generator | k6 v{{TBD}}, co-located on same VM |
| Services exercised | api (:4000), rt (:4001) |
| Services not exercised | bj, bo, speed-roulette (no load scenarios targeted these) |
| External stubs active | reCAPTCHA bypass (x-captcha-token: pass), Fast Track disabled, EVO wallet stubbed |
Methodology¶
This test followed the methodology documented in docs/performance-testing.md. That document covers tooling rationale, SLO definitions, the stepped-ramp protocol, auto-abort thresholds, and kernel tuning. This report presents results only — it does not duplicate the methodology.
Test Environment Caveat¶
The load generator ran on the same VM as the services under test. This means:
- Observed latencies are pessimistic (k6 and services compete for CPU).
- Throughput ceilings are optimistic (k6 may become the bottleneck first).
- These numbers are directional. An authoritative rerun on dedicated
infrastructure (see
terraform/perf/) is recommended before capacity decisions.
Per-Endpoint SLO Scorecard¶
| Endpoint | SLO (p95) | 1k VU | 2.5k VU | 5k VU | 7.5k VU | 10k VU | Met? |
|---|---|---|---|---|---|---|---|
POST /auth/sign-in |
150 ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} |
POST /casino/games/house/dice/bet |
100 ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} |
GET /bets |
50 ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} |
GET /accounting/balances |
50 ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} |
| rt WebSocket handshake | 200 ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} |
How to fill this table¶
For each stage, capture:
# p95 latency per endpoint (adjust time range to the stage window)
histogram_quantile(0.95,
sum by (le) (rate(duration_milliseconds_bucket{
span_kind="SPAN_KIND_SERVER",
span_name="POST /auth/sign-in",
service_name="api"
}[1m]))
)
# Error rate per endpoint
sum(rate(calls_total{
span_kind="SPAN_KIND_SERVER",
status_code="STATUS_CODE_ERROR",
span_name="POST /auth/sign-in"
}[1m]))
/
sum(rate(calls_total{
span_kind="SPAN_KIND_SERVER",
span_name="POST /auth/sign-in"
}[1m]))
Pre-Existing SLO Exception: Dice Bet¶
The POST /casino/games/house/dice/bet endpoint has a baseline p95 of
108 ms at 1 virtual user — already above the 100 ms SLO target before
any load is applied.
Evidence: Trace
8aaa902b3964af1d33dec7000bb36e02
(102 spans). The latency is dominated by the Prisma transactional block in
PlaceBetService: user seed pop, game identity lookup, self-exclusion check,
balance update, two transaction creates, bet create, and nonce increment —
all within a single serializable transaction.
This is a code-level concern, not a load concern. Optimization opportunities include:
- Reducing the number of queries inside the transaction (batch where possible).
- Relaxing the isolation level if business rules allow.
- Caching immutable lookups (GameIdentity, UserSelfExclusion) outside the transaction.
Under load, this endpoint will breach SLO first and by the widest margin. Results for this endpoint should be read with the 108 ms idle baseline in mind.
Bottleneck Findings¶
Ordered by impact (highest first). Each finding includes evidence pointers for independent verification.
Bottleneck 1: {{TBD: title}}¶
- Symptom: {{TBD — e.g. "p95 for /dice/bet rose from 108 ms to 450 ms between 2.5k and 5k VUs"}}
- Stage of first appearance: {{TBD}} VUs
- Evidence:
- Grafana panel: {{TBD: panel name + screenshot path}}
- Jaeger trace:
{{TBD: trace ID}}(sort by duration, pick slowest at breach moment) - PromQL:
- Root cause: {{TBD — e.g. "Postgres connection pool exhausted. Default pool size is 10 per service; at 5k VUs the queue depth exceeded pool capacity, causing wait times."}}
- Remediation:
- {{TBD: option + effort — e.g. "Increase connection pool to 25 (small — env var change)"}}
- {{TBD: alternative — e.g. "Add PgBouncer connection pooler (medium — infra change)"}}
Bottleneck 2: {{TBD: title}}¶
- Symptom: {{TBD}}
- Stage of first appearance: {{TBD}} VUs
- Evidence:
- Grafana panel: {{TBD}}
- Jaeger trace:
{{TBD}} - PromQL:
- Root cause: {{TBD}}
- Remediation:
- {{TBD}}
Bottleneck 3: {{TBD: title}}¶
- Symptom: {{TBD}}
- Stage of first appearance: {{TBD}} VUs
- Evidence:
- Grafana panel: {{TBD}}
- Jaeger trace:
{{TBD}} - PromQL:
- Root cause: {{TBD}}
- Remediation:
- {{TBD}}
Expected bottleneck patterns to watch for¶
Use these PromQL queries during the ramp to identify which bottleneck class fires first:
| Pattern | Query | Indicates |
|---|---|---|
| Postgres pool saturation | histogram_quantile(0.95, sum by (le) (rate(duration_milliseconds_bucket{span_name="prisma:engine:db_query"}[1m]))) |
DB query latency climbing = pool wait time |
| Redis throughput ceiling | sum(rate(calls_total{span_kind="SPAN_KIND_CLIENT",span_name=~"(?i)(get\|set\|evalsha\|del)"}[1m])) |
Flat or declining rate = single-thread CPU bound (abbreviated filter — dashboard uses full command list) |
| BullMQ back-pressure | bullmq_queue_jobs{state="wait"} |
Sustained positive slope = workers can't keep up |
| HTTP error spike | sum(rate(calls_total{span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_ERROR"}[1m])) by (service_name) |
5xx rate climbing = upstream saturation |
Known Limitations¶
Carried from methodology doc:
- Co-located load generator competes for CPU/RAM — latencies pessimistic, ceilings optimistic.
- External services stubbed (reCAPTCHA, Fast Track, EVO wallet) — real-world latency and failure modes absent.
- Single Postgres instance (no read replicas).
- Single Redis instance (no cluster/sentinel).
- rt WebSocket lacks Redis adapter — per-instance socket capacity bounded.
- k6 WebSocket scenario measures transport, not full game logic.
Headroom Projection¶
{{TBD: Back-of-envelope projection. Example format:}}
If the top 3 bottlenecks were addressed:
| Fix | Expected effect | Confidence |
|---|---|---|
| {{TBD: e.g. "Increase Postgres pool to 25"}} | {{TBD: e.g. "Push pool-saturation ceiling from 5k to ~8k VUs"}} | Medium |
| {{TBD}} | {{TBD}} | {{TBD}} |
| {{TBD}} | {{TBD}} | {{TBD}} |
These are projections based on the bottleneck analysis, not measurements. Verification requires a rerun after implementing the fixes.
Recommended Next Steps¶
- {{TBD: e.g. "Optimize the dice-bet transaction to bring idle p95 below 100 ms before further load testing."}}
- {{TBD: e.g. "Rerun on dedicated infrastructure using terraform/perf/ for authoritative numbers."}}
- {{TBD: e.g. "Address Bottleneck 1 (pool saturation) and retest at 5k."}}
- {{TBD: e.g. "Add Redis adapter to rt service before testing >5k WebSocket connections."}}
Appendix¶
Reproduction¶
# Full reproduction from clean state
sudo docker compose up -d
k6 run --out experimental-prometheus-rw tests-perf/profiles/stepped-ramp.js
# In a separate terminal:
npx playwright test tests-perf/playwright-canary/
See docs/performance-testing.md for kernel tuning and detailed reproduction steps.
Raw results¶
- k6 summary output:
{{TBD: path to results/k6-summary-YYYYMMDD.json}} - Grafana snapshots:
{{TBD: path or URL}} - Jaeger trace IDs: listed inline in each bottleneck finding above
Canary UX timings (Playwright)¶
| Metric | Idle | Under 1k VU | Under 5k VU | Under 10k VU |
|---|---|---|---|---|
| LCP | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms |
| FCP | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms |
| Bet-place round-trip | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms | {{TBD}} ms |