Skip to content

Troubleshooting Runbooks

Quick-reference guides for common local-dev and infrastructure issues across the Evospin platform.

Index

Runbook Symptom
Trace missing from Jaeger Request spans don't appear in the Jaeger UI
BullMQ job stuck Queued job stays in waiting, active, or failed state indefinitely
Loki missing logs Grafana Logs dashboard shows no results for a service
Login fails (bcrypt) "Invalid credentials" despite correct password on seeded user
reCAPTCHA fails locally Auth endpoints return captcha validation errors in local dev
2FA unknown secret Admin account requires MFA and the TOTP secret is lost
npm EACCES on host npm install fails with permission denied on node_modules/
Postgres under high load DB CPU pegged; p95 climbs across the API; Prisma errors / pool exhaustion
Postgres unreachable Can't reach database server; ebit-db container exited or restart-looping
Speed-roulette round stuck Shared roulette round doesn't advance for > 90 s; state queue wedged
Redis under memory pressure cache Redis RSS climbing toward mem_limit; OOM command not allowed; eviction rate spike
Captcha provider down (break-glass) Sign-up / sign-in failing with AUTH_INVALID_CAPTCHA at scale; reCAPTCHA upstream impaired
ebit-rt connection saturation Websocket clients refused (Too many connections / Too many requests); rt CPU pegged; single-replica ceiling hit

Structure

Every runbook follows the same template:

  1. Symptom — what the developer sees
  2. Likely cause — the most common root cause
  3. Diagnosis — commands to confirm the hypothesis
  4. Fix — step-by-step resolution (multiple options when applicable)
  5. Prevention — how to avoid recurrence

Adding a new runbook

  1. Create docs/runbooks/<short-slug>.md using the template above
  2. Add a row to the index table in this file
  3. Keep the slug lowercase, hyphenated, and under 30 characters