Performance Testing Methodology¶

1. Goal¶

Validate service-level objectives (SLOs) under a stepped ramp from 50 to 10,000 concurrent virtual users. The ramp profile is:

50 -> 1,000 -> 2,500 -> 5,000 -> 7,500 -> 10,000

The primary deliverables are:

The concurrency ceiling at which each SLO first breaches.
A bottleneck attribution for every breach, traced back to a specific subsystem (CPU, Postgres, Redis, BullMQ, or application code).
Reproduction steps that allow any engineer to re-run the test and arrive at comparable numbers.

2. Tooling Rationale¶

k6 (synthetic load generation)¶

k6 drives both HTTP and WebSocket traffic. It supports Prometheus remote-write natively (--out experimental-prometheus-rw), which feeds real-time panels in Grafana without any intermediary. k6 scripts live under tests-perf/k6/ and tests-perf/profiles/.

Playwright canary (real-browser UX validation)¶

A small Playwright suite runs alongside k6 to measure real-browser metrics (LCP, FCP, bet-place latency) under load. This validates that the user experience degrades gracefully rather than catastrophically. Canary tests live under tests-perf/playwright-canary/.

Playwright is not used for load generation. A single Chromium instance consumes roughly 300 MB of RAM and significant CPU. Scaling to 10,000 concurrent sessions is not feasible, nor is it the tool's design intent.

Existing observability stack¶

The platform already emits OpenTelemetry traces to Jaeger, spanmetrics-derived counters/histograms to Prometheus, and structured logs to Loki. Grafana dashboards unify all three signals. No additional instrumentation is required for bottleneck attribution.

Relevant metric families:

Metric	Source	Labels
`calls_total`	spanmetrics connector	`service_name`, `span_name`, `span_kind`, `status_code`
`duration_milliseconds_bucket`	spanmetrics connector	same as above; buckets: 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000 ms
`http_server_duration_milliseconds_bucket`	OTel HTTP instrumentation	`http_route`, `http_status_code`, `service_name`
`bullmq_queue_jobs`	custom gauge	`queue`, `state`

3. Test Environment¶

Single-VM development setup¶

A single 16 GB VM runs both the load generator and all services (NestJS apps, Postgres, Redis, Docker infrastructure). This is the default for local development and CI.

Caveat: Numbers from a co-located setup are directional, not authoritative. The load generator competes with the services under test for CPU and RAM, which means observed latencies will be higher (and throughput ceilings lower) than in production. Customers should re-run on dedicated infrastructure for authoritative results.

Multi-VM production-grade setup¶

For isolated, reproducible results, deploy the test environment across dedicated VMs using the Terraform modules:

Module	Path	Role
Monitoring VM	`terraform/modules/monitoring`	Prometheus, Grafana, Loki, Jaeger, OTel Collector
Application VM	`terraform/modules/app`	NestJS apps (api, rt, bj, bo, speed-roulette), Postgres, Redis
Perf wiring	`terraform/perf/`	Ties monitoring and app modules together for perf environments

In a multi-VM layout, the load generator runs on a third machine (or on the monitoring VM if its resource footprint is low). This eliminates contention between k6 and the services under test.

4. SLO Definitions¶

Endpoint latency targets¶

Endpoint	Method	p95 Target	Notes
`/auth/sign-in`	POST	150 ms	bcrypt is irreducible at ~60-80 ms per hash; the budget allows for DB lookup and session creation on top
`/casino/games/house/dice/bet`	POST	100 ms	Baseline measured at 108 ms with 1 VU -- this endpoint does not meet SLO even at minimal load. Pre-noted as SLO-unmet; optimization work is required before this target is achievable
`/bets`	GET	50 ms	Paginated bet history
`/accounting/balances`	GET	50 ms	Cached balance lookup
rt WebSocket handshake	WS	200 ms	Measured as time from TCP connect to receipt of `AuthSuccess` event

System-level SLOs¶

Error rate: less than 0.1% per endpoint across the full test duration.
Queue stability: bullmq_queue_jobs{state="wait"} must not trend upward over any 2-minute window. Sustained growth indicates worker throughput is below arrival rate.
No OOM kills: no container may be killed by the kernel OOM killer during the test. Validated via dmesg and container exit codes.

5. Stepped-Ramp Protocol¶

Stage definition¶

Stage	Target VUs	Duration	Purpose
1 -- Warmup	50	2 min	Populate caches, establish baseline
2	1,000	5 min	Light production-equivalent load
3	2,500	5 min	Moderate concurrency
4	5,000	5 min	High concurrency
5	7,500	5 min	Stress region
6	10,000	5 min	Peak target

Total test duration: 27 minutes.

Auto-abort thresholds¶

The k6 scenario must abort the current stage (and log the breach) when any of the following conditions hold for 30 consecutive seconds:

p95 latency exceeds 2x the SLO budget for any monitored endpoint.
Aggregate error rate exceeds 1%.
Any container is OOM-killed (detected via a sidecar health check or docker events watcher).

Artifact capture¶

At each breach moment, capture:

A Grafana dashboard snapshot covering the 5 minutes surrounding the breach.
Jaeger exemplar trace IDs for the slowest requests at the breach point.
The k6 summary output at the time of abort.

6. Bottleneck-Hunting Runbook¶

Symptom-to-diagnosis map¶

Metric Pattern	Diagnosis	PromQL Example
`duration_milliseconds` p95 rising across all routes simultaneously	CPU or RAM saturation at the container level. Check container health before investigating application code.	`histogram_quantile(0.95, sum(rate(duration_milliseconds_bucket{span_kind="SPAN_KIND_SERVER"}[1m])) by (le, service_name))`
`prisma:engine:db_query` p95 rising	Postgres connection pool saturation. The default pool size is ~10 connections per service instance.	`histogram_quantile(0.95, sum(rate(duration_milliseconds_bucket{span_name="prisma:engine:db_query"}[1m])) by (le, service_name))`
ioredis command rate hitting a ceiling	Cache Redis instance is saturated (CPU-bound, since Redis is single-threaded).	`sum(rate(calls_total{span_kind="SPAN_KIND_CLIENT",span_name=~"(?i)(get\\|set\\|del\\|evalsha\\|hget\\|hset\\|expire\\|publish)"}[1m])) by (service_name)`
`bullmq_queue_jobs{state="wait"}` growing over time	Worker throughput is below arrival rate. Either add workers or optimize job processing time.	`bullmq_queue_jobs{state="wait"}` (raw gauge, watch for sustained positive slope)
rt socket count per instance exceeds expected capacity	The rt service stores socket state in an in-process Map. Without a Redis adapter for socket.io, sticky sessions are required and per-instance capacity is bounded by memory.	No built-in metric; monitor via `docker exec ebit-rt node -e "..."` or add a custom gauge. The rt service does not currently expose a connection count metric.
Container CPU utilization above 90% or RSS approaching memory limit	Vertical scaling ceiling reached. Scale up (larger instance) or scale out (more replicas).	Not available without cadvisor/node_exporter -- this is a known observability gap. Use `docker stats` as a stopgap.
`k6_http_req_failed` rising	Upstream is returning 5xx errors. Correlate the failing route with Jaeger exemplar traces to identify the root cause.	`sum(rate(k6_http_req_failed[1m])) by (url)` (available only when k6 remote-write is active)

Investigating a specific breach¶

Open the Grafana dashboard and identify the timestamp where the SLO breach begins.
Filter Jaeger traces to the affected service and time window. Sort by duration descending.
In the slowest trace, identify which span contributes the most wall-clock time (database query, Redis call, downstream HTTP, or application code).
Cross-reference with Loki logs for the same trace_id to check for error messages or warnings.
If the bottleneck is infrastructure (Postgres, Redis), check connection pool metrics and resource utilization. If it is application code, profile the specific handler.

7. Reproduction¶

All commands assume the working directory is the repository root.

Start infrastructure¶

sudo docker compose up -d

This starts Postgres, Redis, RabbitMQ, and the observability stack (Prometheus, Grafana, Loki, Jaeger, OTel Collector).

Run a k6 smoke test¶

k6 run --out experimental-prometheus-rw tests-perf/k6/smoke.js

The smoke test sends minimal traffic (1-5 VUs) to verify that all endpoints respond correctly and that k6 metrics appear in Grafana.

Run the stepped ramp¶

k6 run --out experimental-prometheus-rw tests-perf/profiles/stepped-ramp.js

This executes the full 27-minute ramp described in Section 5. Auto-abort thresholds are configured within the script.

Run the Playwright canary¶

npx playwright test tests-perf/playwright-canary/

Run this in a separate terminal while k6 is active. The canary reports LCP, FCP, and bet-place latency as the system is under load.

Open dashboards¶

Tool	URL
Grafana	`http://localhost:3000`
Jaeger	`http://localhost:16686`

8. Limitations¶

Co-located load generation. On a single-VM setup, k6 competes with the services under test for CPU and RAM. Observed latencies will be pessimistic; throughput ceilings will be optimistic (k6 itself may become the bottleneck before the services do).
Stubbed external services. reCAPTCHA validation is bypassed in the test environment. The Fast Track integration is disabled (the RabbitMQ producer is stubbed). The EVO wallet is stubbed. These stubs remove real-world latency and failure modes from the test.
Single Postgres instance. Production may use read replicas to offload query traffic. The test environment runs a single instance, so Postgres will saturate earlier than in a replicated deployment.
No Redis Cluster. Redis runs as a single instance. In production, a cluster or sentinel setup would provide higher throughput and failover.
rt WebSocket lacks a Redis adapter. The socket.io gateway in the rt service does not use a Redis adapter for pub/sub. This means horizontal scaling requires sticky sessions, and per-instance socket capacity is bounded by the in-process connection Map.
k6 WebSocket scenario scope. The k6 WebSocket scenario measures transport-level metrics (handshake time, message round-trip). It does not exercise full game logic, which requires multi-step stateful interactions that are better validated by the Playwright canary.

9. Kernel Tuning Checklist (Load Generator)¶

These settings are required on the load generator machine to sustain 10,000+ concurrent connections. Apply them before running the stepped ramp.

Setting	Command	Value	Rationale
Open file limit	`ulimit -n 65536`	65536	Each TCP socket consumes one file descriptor. The default limit (typically 1024) is insufficient for 10k connections.
Listen backlog	`sysctl net.core.somaxconn=65535`	65535	Increases the accept queue depth for listening sockets, preventing connection drops under burst.
TIME_WAIT reuse	`sysctl net.ipv4.tcp_tw_reuse=1`	1	Allows reuse of sockets in TIME_WAIT state for new outbound connections, reclaiming ports faster.
Local port range	`sysctl net.ipv4.ip_local_port_range="1024 65535"`	1024-65535	Expands the ephemeral port range from the default (~28k ports) to ~64k ports.
FIN timeout	`sysctl net.ipv4.tcp_fin_timeout=15`	15 seconds	Reduces the time sockets spend in FIN_WAIT_2 state, accelerating teardown. Default is 60 seconds.
System-wide fd limit	`sysctl fs.file-max=2097152`	2097152	Raises the kernel-level cap on open file descriptors across all processes.

To apply sysctl settings persistently, add them to /etc/sysctl.d/99-perf.conf and run sysctl --system. The ulimit setting must be configured in /etc/security/limits.conf or the shell profile for persistence.