Performance Test Report¶

Executive Summary¶

Evospin was load-tested under a stepped ramp from 50 to 10,000 concurrent virtual users. {{TBD: 1-2 sentence verdict — e.g. "The platform sustained X concurrent users within the p95 < 100ms SLO. First breach occurred at Y users, caused by Z."}} The /casino/games/house/dice/bet endpoint is a pre-existing exception: its baseline p95 of 108 ms at a single user already exceeds the 100 ms target, a code-level concern unrelated to load.

Verdict: {{TBD: GREEN / YELLOW / RED traffic-light}}

Metric	Value
Peak concurrency sustained within SLO	{{TBD}} VUs
First SLO breach	{{TBD}} VUs on `{{TBD: endpoint}}`
Peak error rate observed	{{TBD}}%
Primary bottleneck	{{TBD}}

Test Setup¶

Parameter	Value
Hardware	{{TBD: instance type, vCPU, RAM — e.g. "1x t3.xlarge, 4 vCPU, 16 GB"}}
Commit	`{{TBD: git SHA}}`
Date	{{TBD: YYYY-MM-DD}}
Duration	27 minutes (6 stages)
Load generator	k6 v{{TBD}}, co-located on same VM
Services exercised	api (:4000), rt (:4001)
Services not exercised	bj, bo, speed-roulette (no load scenarios targeted these)
External stubs active	reCAPTCHA bypass (`x-captcha-token: pass`), Fast Track disabled, EVO wallet stubbed

Methodology¶

This test followed the methodology documented in docs/performance-testing.md. That document covers tooling rationale, SLO definitions, the stepped-ramp protocol, auto-abort thresholds, and kernel tuning. This report presents results only — it does not duplicate the methodology.

Test Environment Caveat¶

The load generator ran on the same VM as the services under test. This means:

Observed latencies are pessimistic (k6 and services compete for CPU).
Throughput ceilings are optimistic (k6 may become the bottleneck first).
These numbers are directional. An authoritative rerun on dedicated infrastructure (see terraform/perf/) is recommended before capacity decisions.

Per-Endpoint SLO Scorecard¶

Endpoint	SLO (p95)	1k VU	2.5k VU	5k VU	7.5k VU	10k VU	Met?
`POST /auth/sign-in`	150 ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}}
`POST /casino/games/house/dice/bet`	100 ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}}
`GET /bets`	50 ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}}
`GET /accounting/balances`	50 ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}}
rt WebSocket handshake	200 ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}}

How to fill this table¶

For each stage, capture:

# p95 latency per endpoint (adjust time range to the stage window)
histogram_quantile(0.95,
  sum by (le) (rate(duration_milliseconds_bucket{
    span_kind="SPAN_KIND_SERVER",
    span_name="POST /auth/sign-in",
    service_name="api"
  }[1m]))
)

# Error rate per endpoint
sum(rate(calls_total{
  span_kind="SPAN_KIND_SERVER",
  status_code="STATUS_CODE_ERROR",
  span_name="POST /auth/sign-in"
}[1m]))
/
sum(rate(calls_total{
  span_kind="SPAN_KIND_SERVER",
  span_name="POST /auth/sign-in"
}[1m]))

Pre-Existing SLO Exception: Dice Bet¶

The POST /casino/games/house/dice/bet endpoint has a baseline p95 of 108 ms at 1 virtual user — already above the 100 ms SLO target before any load is applied.

Evidence: Trace 8aaa902b3964af1d33dec7000bb36e02 (102 spans). The latency is dominated by the Prisma transactional block in PlaceBetService: user seed pop, game identity lookup, self-exclusion check, balance update, two transaction creates, bet create, and nonce increment — all within a single serializable transaction.

This is a code-level concern, not a load concern. Optimization opportunities include:

Reducing the number of queries inside the transaction (batch where possible).
Relaxing the isolation level if business rules allow.
Caching immutable lookups (GameIdentity, UserSelfExclusion) outside the transaction.

Under load, this endpoint will breach SLO first and by the widest margin. Results for this endpoint should be read with the 108 ms idle baseline in mind.

Bottleneck Findings¶

Ordered by impact (highest first). Each finding includes evidence pointers for independent verification.

Bottleneck 1: {{TBD: title}}¶

Symptom: {{TBD — e.g. "p95 for /dice/bet rose from 108 ms to 450 ms between 2.5k and 5k VUs"}}
Stage of first appearance: {{TBD}} VUs
Evidence:
Grafana panel: {{TBD: panel name + screenshot path}}
Jaeger trace: {{TBD: trace ID}} (sort by duration, pick slowest at breach moment)

PromQL:

{{TBD: the exact query that shows the spike}}

Root cause: {{TBD — e.g. "Postgres connection pool exhausted. Default pool size is 10 per service; at 5k VUs the queue depth exceeded pool capacity, causing wait times."}}
Remediation:
{{TBD: option + effort — e.g. "Increase connection pool to 25 (small — env var change)"}}
{{TBD: alternative — e.g. "Add PgBouncer connection pooler (medium — infra change)"}}

Bottleneck 2: {{TBD: title}}¶

Symptom: {{TBD}}
Stage of first appearance: {{TBD}} VUs
Evidence:
Grafana panel: {{TBD}}
Jaeger trace: {{TBD}}
PromQL:
```
{{TBD}}
```
Root cause: {{TBD}}
Remediation:
{{TBD}}

Bottleneck 3: {{TBD: title}}¶

Symptom: {{TBD}}
Stage of first appearance: {{TBD}} VUs
Evidence:
Grafana panel: {{TBD}}
Jaeger trace: {{TBD}}
PromQL:
```
{{TBD}}
```
Root cause: {{TBD}}
Remediation:
{{TBD}}

Expected bottleneck patterns to watch for¶

Use these PromQL queries during the ramp to identify which bottleneck class fires first:

Pattern	Query	Indicates
Postgres pool saturation	`histogram_quantile(0.95, sum by (le) (rate(duration_milliseconds_bucket{span_name="prisma:engine:db_query"}[1m])))`	DB query latency climbing = pool wait time
Redis throughput ceiling	`sum(rate(calls_total{span_kind="SPAN_KIND_CLIENT",span_name=~"(?i)(get\\|set\\|evalsha\\|del)"}[1m]))`	Flat or declining rate = single-thread CPU bound (abbreviated filter — dashboard uses full command list)
BullMQ back-pressure	`bullmq_queue_jobs{state="wait"}`	Sustained positive slope = workers can't keep up
HTTP error spike	`sum(rate(calls_total{span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_ERROR"}[1m])) by (service_name)`	5xx rate climbing = upstream saturation

Known Limitations¶

Carried from methodology doc:

Co-located load generator competes for CPU/RAM — latencies pessimistic, ceilings optimistic.
External services stubbed (reCAPTCHA, Fast Track, EVO wallet) — real-world latency and failure modes absent.
Single Postgres instance (no read replicas).
Single Redis instance (no cluster/sentinel).
rt WebSocket lacks Redis adapter — per-instance socket capacity bounded.
k6 WebSocket scenario measures transport, not full game logic.

Headroom Projection¶

If the top 3 bottlenecks were addressed:

Fix	Expected effect	Confidence
{{TBD: e.g. "Increase Postgres pool to 25"}}	{{TBD: e.g. "Push pool-saturation ceiling from 5k to ~8k VUs"}}	Medium
{{TBD}}	{{TBD}}	{{TBD}}
{{TBD}}	{{TBD}}	{{TBD}}

These are projections based on the bottleneck analysis, not measurements. Verification requires a rerun after implementing the fixes.

Recommended Next Steps¶

{{TBD: e.g. "Optimize the dice-bet transaction to bring idle p95 below 100 ms before further load testing."}}
{{TBD: e.g. "Rerun on dedicated infrastructure using terraform/perf/ for authoritative numbers."}}
{{TBD: e.g. "Address Bottleneck 1 (pool saturation) and retest at 5k."}}
{{TBD: e.g. "Add Redis adapter to rt service before testing >5k WebSocket connections."}}

Appendix¶

Reproduction¶

# Full reproduction from clean state
sudo docker compose up -d
k6 run --out experimental-prometheus-rw tests-perf/profiles/stepped-ramp.js
# In a separate terminal:
npx playwright test tests-perf/playwright-canary/

See docs/performance-testing.md for kernel tuning and detailed reproduction steps.

Raw results¶

k6 summary output: {{TBD: path to results/k6-summary-YYYYMMDD.json}}
Grafana snapshots: {{TBD: path or URL}}
Jaeger trace IDs: listed inline in each bottleneck finding above

Canary UX timings (Playwright)¶

Metric	Idle	Under 1k VU	Under 5k VU	Under 10k VU
LCP	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms
FCP	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms
Bet-place round-trip	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms	{{TBD}} ms