Performance Testing Methodology¶
1. Goal¶
Validate service-level objectives (SLOs) under a stepped ramp from 50 to 10,000 concurrent virtual users. The ramp profile is:
50 -> 1,000 -> 2,500 -> 5,000 -> 7,500 -> 10,000
The primary deliverables are:
- The concurrency ceiling at which each SLO first breaches.
- A bottleneck attribution for every breach, traced back to a specific subsystem (CPU, Postgres, Redis, BullMQ, or application code).
- Reproduction steps that allow any engineer to re-run the test and arrive at comparable numbers.
2. Tooling Rationale¶
k6 (synthetic load generation)¶
k6 drives both HTTP and WebSocket traffic. It supports Prometheus remote-write
natively (--out experimental-prometheus-rw), which feeds real-time panels in
Grafana without any intermediary. k6 scripts live under tests-perf/k6/ and
tests-perf/profiles/.
Playwright canary (real-browser UX validation)¶
A small Playwright suite runs alongside k6 to measure real-browser metrics
(LCP, FCP, bet-place latency) under load. This validates that the user
experience degrades gracefully rather than catastrophically. Canary tests live
under tests-perf/playwright-canary/.
Playwright is not used for load generation. A single Chromium instance consumes roughly 300 MB of RAM and significant CPU. Scaling to 10,000 concurrent sessions is not feasible, nor is it the tool's design intent.
Existing observability stack¶
The platform already emits OpenTelemetry traces to Jaeger, spanmetrics-derived counters/histograms to Prometheus, and structured logs to Loki. Grafana dashboards unify all three signals. No additional instrumentation is required for bottleneck attribution.
Relevant metric families:
| Metric | Source | Labels |
|---|---|---|
calls_total |
spanmetrics connector | service_name, span_name, span_kind, status_code |
duration_milliseconds_bucket |
spanmetrics connector | same as above; buckets: 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000 ms |
http_server_duration_milliseconds_bucket |
OTel HTTP instrumentation | http_route, http_status_code, service_name |
bullmq_queue_jobs |
custom gauge | queue, state |
3. Test Environment¶
Single-VM development setup¶
A single 16 GB VM runs both the load generator and all services (NestJS apps, Postgres, Redis, Docker infrastructure). This is the default for local development and CI.
Caveat: Numbers from a co-located setup are directional, not authoritative. The load generator competes with the services under test for CPU and RAM, which means observed latencies will be higher (and throughput ceilings lower) than in production. Customers should re-run on dedicated infrastructure for authoritative results.
Multi-VM production-grade setup¶
For isolated, reproducible results, deploy the test environment across dedicated VMs using the Terraform modules:
| Module | Path | Role |
|---|---|---|
| Monitoring VM | terraform/modules/monitoring |
Prometheus, Grafana, Loki, Jaeger, OTel Collector |
| Application VM | terraform/modules/app |
NestJS apps (api, rt, bj, bo, speed-roulette), Postgres, Redis |
| Perf wiring | terraform/perf/ |
Ties monitoring and app modules together for perf environments |
In a multi-VM layout, the load generator runs on a third machine (or on the monitoring VM if its resource footprint is low). This eliminates contention between k6 and the services under test.
4. SLO Definitions¶
Endpoint latency targets¶
| Endpoint | Method | p95 Target | Notes |
|---|---|---|---|
/auth/sign-in |
POST | 150 ms | bcrypt is irreducible at ~60-80 ms per hash; the budget allows for DB lookup and session creation on top |
/casino/games/house/dice/bet |
POST | 100 ms | Baseline measured at 108 ms with 1 VU -- this endpoint does not meet SLO even at minimal load. Pre-noted as SLO-unmet; optimization work is required before this target is achievable |
/bets |
GET | 50 ms | Paginated bet history |
/accounting/balances |
GET | 50 ms | Cached balance lookup |
| rt WebSocket handshake | WS | 200 ms | Measured as time from TCP connect to receipt of AuthSuccess event |
System-level SLOs¶
- Error rate: less than 0.1% per endpoint across the full test duration.
- Queue stability:
bullmq_queue_jobs{state="wait"}must not trend upward over any 2-minute window. Sustained growth indicates worker throughput is below arrival rate. - No OOM kills: no container may be killed by the kernel OOM killer during
the test. Validated via
dmesgand container exit codes.
5. Stepped-Ramp Protocol¶
Stage definition¶
| Stage | Target VUs | Duration | Purpose |
|---|---|---|---|
| 1 -- Warmup | 50 | 2 min | Populate caches, establish baseline |
| 2 | 1,000 | 5 min | Light production-equivalent load |
| 3 | 2,500 | 5 min | Moderate concurrency |
| 4 | 5,000 | 5 min | High concurrency |
| 5 | 7,500 | 5 min | Stress region |
| 6 | 10,000 | 5 min | Peak target |
Total test duration: 27 minutes.
Auto-abort thresholds¶
The k6 scenario must abort the current stage (and log the breach) when any of the following conditions hold for 30 consecutive seconds:
- p95 latency exceeds 2x the SLO budget for any monitored endpoint.
- Aggregate error rate exceeds 1%.
- Any container is OOM-killed (detected via a sidecar health check or
docker eventswatcher).
Artifact capture¶
At each breach moment, capture:
- A Grafana dashboard snapshot covering the 5 minutes surrounding the breach.
- Jaeger exemplar trace IDs for the slowest requests at the breach point.
- The k6 summary output at the time of abort.
6. Bottleneck-Hunting Runbook¶
Symptom-to-diagnosis map¶
| Metric Pattern | Diagnosis | PromQL Example |
|---|---|---|
duration_milliseconds p95 rising across all routes simultaneously |
CPU or RAM saturation at the container level. Check container health before investigating application code. | histogram_quantile(0.95, sum(rate(duration_milliseconds_bucket{span_kind="SPAN_KIND_SERVER"}[1m])) by (le, service_name)) |
prisma:engine:db_query p95 rising |
Postgres connection pool saturation. The default pool size is ~10 connections per service instance. | histogram_quantile(0.95, sum(rate(duration_milliseconds_bucket{span_name="prisma:engine:db_query"}[1m])) by (le, service_name)) |
| ioredis command rate hitting a ceiling | Cache Redis instance is saturated (CPU-bound, since Redis is single-threaded). | sum(rate(calls_total{span_kind="SPAN_KIND_CLIENT",span_name=~"(?i)(get\|set\|del\|evalsha\|hget\|hset\|expire\|publish)"}[1m])) by (service_name) |
bullmq_queue_jobs{state="wait"} growing over time |
Worker throughput is below arrival rate. Either add workers or optimize job processing time. | bullmq_queue_jobs{state="wait"} (raw gauge, watch for sustained positive slope) |
| rt socket count per instance exceeds expected capacity | The rt service stores socket state in an in-process Map. Without a Redis adapter for socket.io, sticky sessions are required and per-instance capacity is bounded by memory. | No built-in metric; monitor via docker exec ebit-rt node -e "..." or add a custom gauge. The rt service does not currently expose a connection count metric. |
| Container CPU utilization above 90% or RSS approaching memory limit | Vertical scaling ceiling reached. Scale up (larger instance) or scale out (more replicas). | Not available without cadvisor/node_exporter -- this is a known observability gap. Use docker stats as a stopgap. |
k6_http_req_failed rising |
Upstream is returning 5xx errors. Correlate the failing route with Jaeger exemplar traces to identify the root cause. | sum(rate(k6_http_req_failed[1m])) by (url) (available only when k6 remote-write is active) |
Investigating a specific breach¶
- Open the Grafana dashboard and identify the timestamp where the SLO breach begins.
- Filter Jaeger traces to the affected service and time window. Sort by duration descending.
- In the slowest trace, identify which span contributes the most wall-clock time (database query, Redis call, downstream HTTP, or application code).
- Cross-reference with Loki logs for the same
trace_idto check for error messages or warnings. - If the bottleneck is infrastructure (Postgres, Redis), check connection pool metrics and resource utilization. If it is application code, profile the specific handler.
7. Reproduction¶
All commands assume the working directory is the repository root.
Start infrastructure¶
This starts Postgres, Redis, RabbitMQ, and the observability stack (Prometheus, Grafana, Loki, Jaeger, OTel Collector).
Run a k6 smoke test¶
The smoke test sends minimal traffic (1-5 VUs) to verify that all endpoints respond correctly and that k6 metrics appear in Grafana.
Run the stepped ramp¶
This executes the full 27-minute ramp described in Section 5. Auto-abort thresholds are configured within the script.
Run the Playwright canary¶
Run this in a separate terminal while k6 is active. The canary reports LCP, FCP, and bet-place latency as the system is under load.
Open dashboards¶
| Tool | URL |
|---|---|
| Grafana | http://localhost:3000 |
| Jaeger | http://localhost:16686 |
8. Limitations¶
-
Co-located load generation. On a single-VM setup, k6 competes with the services under test for CPU and RAM. Observed latencies will be pessimistic; throughput ceilings will be optimistic (k6 itself may become the bottleneck before the services do).
-
Stubbed external services. reCAPTCHA validation is bypassed in the test environment. The Fast Track integration is disabled (the RabbitMQ producer is stubbed). The EVO wallet is stubbed. These stubs remove real-world latency and failure modes from the test.
-
Single Postgres instance. Production may use read replicas to offload query traffic. The test environment runs a single instance, so Postgres will saturate earlier than in a replicated deployment.
-
No Redis Cluster. Redis runs as a single instance. In production, a cluster or sentinel setup would provide higher throughput and failover.
-
rt WebSocket lacks a Redis adapter. The socket.io gateway in the rt service does not use a Redis adapter for pub/sub. This means horizontal scaling requires sticky sessions, and per-instance socket capacity is bounded by the in-process connection Map.
-
k6 WebSocket scenario scope. The k6 WebSocket scenario measures transport-level metrics (handshake time, message round-trip). It does not exercise full game logic, which requires multi-step stateful interactions that are better validated by the Playwright canary.
9. Kernel Tuning Checklist (Load Generator)¶
These settings are required on the load generator machine to sustain 10,000+ concurrent connections. Apply them before running the stepped ramp.
| Setting | Command | Value | Rationale |
|---|---|---|---|
| Open file limit | ulimit -n 65536 |
65536 | Each TCP socket consumes one file descriptor. The default limit (typically 1024) is insufficient for 10k connections. |
| Listen backlog | sysctl net.core.somaxconn=65535 |
65535 | Increases the accept queue depth for listening sockets, preventing connection drops under burst. |
| TIME_WAIT reuse | sysctl net.ipv4.tcp_tw_reuse=1 |
1 | Allows reuse of sockets in TIME_WAIT state for new outbound connections, reclaiming ports faster. |
| Local port range | sysctl net.ipv4.ip_local_port_range="1024 65535" |
1024-65535 | Expands the ephemeral port range from the default (~28k ports) to ~64k ports. |
| FIN timeout | sysctl net.ipv4.tcp_fin_timeout=15 |
15 seconds | Reduces the time sockets spend in FIN_WAIT_2 state, accelerating teardown. Default is 60 seconds. |
| System-wide fd limit | sysctl fs.file-max=2097152 |
2097152 | Raises the kernel-level cap on open file descriptors across all processes. |
To apply sysctl settings persistently, add them to /etc/sysctl.d/99-perf.conf
and run sysctl --system. The ulimit setting must be configured in
/etc/security/limits.conf or the shell profile for persistence.