ADR-0003 — BullMQ for production async; RabbitMQ kept but stubbed¶
Status: Accepted Date: 2026-04-16 (original); expanded 2026-04-25 Author(s): Platform engineering
Context¶
ebit-api/ ships with two message-broker integrations:
- BullMQ (
@nestjs/bullmq) — Redis-backed task queues, in production use across the entire codebase. - RabbitMQ (
@golevelup/nestjs-rabbitmq) — AMQP broker, present indocker-compose.yml, present in module imports, but wired to a stub (disabled = trueatapps/api/src/fast-track/rabbitmq/fast-track.rmq.module.ts:8).
The duality is historical, not by design. RabbitMQ was introduced for a planned Fast Track CRM integration that would emit transactional and bonus events to an external CRM via AMQP. The integration was never completed; the producer was stubbed pending product decision; the broker stayed in the compose so the import graph wouldn't fail. Eight months on, the stub is unchanged and BullMQ has absorbed every async use case that emerged in the meantime.
This ADR codifies the decision so future engineers — including new ones who see two brokers and assume both carry traffic — don't burn time debugging an empty RabbitMQ vhost.
Inventory of async work in the codebase¶
All of the following ride BullMQ on the cache Redis instance (port 6379):
| Domain | Producer | Consumer |
|---|---|---|
| Auth session updates | apps/api/src/auth/session/session.queue-producer.ts |
session worker |
| Bets settlement | apps/api/src/bet/queue/bet.queue-producer.ts |
bet.queue-processor.ts |
| Bot activity | apps/api/src/bots/system/bull/ |
bot scheduler |
| Challenge progress | apps/api/src/challenge/... |
challenge worker |
| Leaderboard updates | apps/api/src/leaderboard/... |
leaderboard worker |
| Promo tasks | apps/api/src/promo/... |
promo worker |
| User stats migration | apps/api/src/user/... |
one-shot |
| Skindeck deposits | apps/api/src/payment/provider/integration/skindeck/... |
settlement worker |
| Speed-roulette state | apps/speed-roulette/src/... (state queue) |
per-table worker |
| Speed-roulette bets | apps/speed-roulette/src/... (bet queue) |
settlement |
All of the following ride the RabbitMQ stub (no traffic ever leaves the process):
apps/api/src/bet/bet.service.ts× 4 call sites —emitTransaction(...).apps/api/src/promo/promo-effect.service.ts× 7 call sites —emitBonus(...).
Both classes of call site invoke methods on FastTrackRabbitMQProducer. The provider is replaced with a Stub at module-register time; every call resolves to a no-op promise.
Decision¶
- BullMQ is the canonical production async transport. New async work goes here; do not add to the RabbitMQ producer.
- RabbitMQ stays in compose because removing it would require coordinated source changes across the
FastTrackRabbitMQModuleimport + the 11 call sites +BROKER_URIenv handling. That refactor is reserved for the day product confirms either (a) Fast Track is permanently abandoned, or (b) Fast Track is re-enabled and the stub flips to real. - The stub is documented, not hidden. The
disabled = trueflag at the top offast-track.rmq.module.ts:8is read on every container start. A comment block aboveebit-rabbitmq:indocker-compose.ymlexplains "this broker boots empty by design." - OTel traces show BullMQ enqueue as ioredis
EVALSHAspans — they do not appear as a separate broker hop. Operators chasing a "missing broker hop" are tracing this correctly; BullMQ rides Redis Lua scripts. - Cross-job traceparent is not propagated. The BullMQ bet-settled consumer (and others) start an orphan trace because the producer doesn't write
traceparentinto the job payload. Documented limitation; correlate by user/bet id when needed. Seedocs/audits/perf-trace-coverage-audit.mdandproject_otel_microservice_transport_gapmemory.
Considered alternatives¶
A. Remove RabbitMQ from compose entirely¶
The simplest cleanup. Rejected because:
- The
FastTrackRabbitMQModuleis still imported byapps/api/src/app.module.ts. - Removing the broker without also removing the module would surface as Nest startup failures (the module reads
BROKER_URIeven when disabled — the URI must resolve to a host that responds, even if the producer is stubbed). - 11 call sites would need refactoring to drop the
emitTransaction/emitBonusinvocations or replace them with no-ops not gated by the module. - This is a real refactor, not a compose tweak; it stays on the backlog until a product owner explicitly retires Fast Track.
B. Wire up a real Fast Track sandbox in dev¶
Replace the stub with the real producer pointing at a Fast Track-supplied sandbox. Rejected for now because:
- Requires external coordination (Fast Track-issued sandbox URL + JWT keypair). The compose env values
FASTTRACK_JWT_PRIVATE_KEY/FASTTRACK_JWT_PUBLIC_KEYare placeholders ("local-stub-not-used"). - No business case to do this absent a product decision to re-launch the integration.
- Re-evaluate if product confirms Fast Track returns; until then the stub is correct behaviour.
C. Delete the Fast Track module entirely¶
Tear out the 11 call sites + the module + the broker. Rejected because:
- It's a destructive change. Restoring Fast Track later means re-introducing 11 call sites in their correct semantic positions across
betandpromoflows. Risk of getting the semantics wrong on re-introduction. - The cost of the stub today is ~300 MB idle RAM and a small amount of confusion mitigated by this ADR. Not worth a destructive change.
D. Migrate the Fast Track stub to BullMQ¶
Use BullMQ for what the stubbed RabbitMQ producer would have emitted. Rejected because:
- Fast Track expects AMQP semantics (topic exchanges, headers exchanges, durable bindings) that BullMQ doesn't model natively. A migration would mean re-encoding the routing logic in the BullMQ worker.
- If Fast Track is ever turned on, AMQP-shaped routing is the native fit. Pre-migrating to BullMQ would either need a second migration back to RabbitMQ at that point, or commit the team to building a BullMQ-shaped adapter for Fast Track's CRM. Neither is a clear win.
E. Pick a third broker (NATS / Kafka)¶
Replace BullMQ with a "real" message broker for production async work. Rejected because BullMQ is sufficient at our scale, the operational tooling (redis-cli, BullBoard) is well-understood by the team, and migrating away from BullMQ would be a multi-month project against zero current pain.
Consequences¶
Operational¶
-
Debug stalled jobs in Redis, not RabbitMQ:
The RabbitMQ management UI (redis-cli -a cache KEYS 'bull:*' # list queues redis-cli -a cache LLEN bull:bet-settle:wait # pending in queuehttp://localhost:15672,rabbitmq/rabbitmq) will always show vhostftempty — that's by design. -
BullBoard (or
@nestjs/bullmqUI add-on) is the right inspection tool for queue depth, retries, failures. -
Idle RAM cost of the RabbitMQ container is ~300 MB. Tolerable on 16 GB hosts; noticeable on 8 GB. Local dev can
docker compose stop ebit-rabbitmqwithout breaking anything (the producer is stubbed), butdocker compose downand back up restores it.
Observability¶
- BullMQ enqueue → ioredis
EVALSHAin OTel traces. Operators searching Jaeger for a "broker hop" between producer and consumer will not find one. Cross-linkdocs/engineering/observability-runbook.md§5 ("Cross-service tracing gotchas"). - BullMQ consumer starts an orphan trace. No
traceparentis in the job payload, so the consumer's root span is not linked to the producer's. Same workaround as the Redis pub/sub case (ADR-0005): correlate by user_id / bet_id in Loki, not by trace_id. - The bet-settled consumer is the most-cited orphan case in the perf programme. Documented in
docs/audits/perf-trace-coverage-audit.md.
Code-shape constraints¶
- New async work goes through
apps/api/src/<domain>/queue/with a*.queue-producer.tsand*.queue-processor.tspair, registered viaBullModule.registerQueueAsync(...). The bet-queue (apps/api/src/bet/queue/bet.queue-producer.ts+bet.queue-processor.ts) is the canonical example to copy. Seeadd-bullmq-queue.mdfor the recipe. - Do not add to
apps/api/src/fast-track/rabbitmq/. That module is frozen pending product decision.
Future Fast Track decision¶
If product decides to ship Fast Track:
- Coordinate Fast Track sandbox creds; populate
FASTTRACK_JWT_*in Doppler. - Set
disabled = falseatfast-track.rmq.module.ts:8. - Verify the 11 call sites against the real producer in a sandbox round-trip.
- Update this ADR's status to
Superseded by ADR-NNNN(the new ADR codifying the live integration).
If product decides Fast Track is permanently dead:
- Remove the 11 call sites.
- Remove the
FastTrackRabbitMQModuleimport fromapp.module.ts. - Remove the
ebit-rabbitmqservice fromdocker-compose.yml. - Remove
BROKER_URIfrom.example.envand Doppler. - Update this ADR's status to
Deprecated.
Until one of those happens, the stub is correct.
References¶
apps/api/src/fast-track/rabbitmq/fast-track.rmq.module.ts:8—disabled = true.apps/api/src/fast-track/rabbitmq/fast-track.rmq.producer.ts— interface.apps/api/src/bet/bet.service.ts— × 4 stub call sites.apps/api/src/promo/promo-effect.service.ts— × 7 stub call sites.apps/api/src/bet/queue/bet.queue-producer.ts+bet.queue-processor.ts— canonical BullMQ producer/consumer pair.docker-compose.yml—ebit-rabbitmq:service, comment block.CLAUDE.md— "BullMQ, not RabbitMQ" workspace summary.docs/architecture.md— AF-6 weakness entry.docs/audits/perf-trace-coverage-audit.md— orphan trace cases.docs/engineering/observability-runbook.md§5 — cross-service trace gotchas.docs/recipes/add-bullmq-queue.md— internal recipe for new queues.- Memory:
project_otel_microservice_transport_gap.md— trace propagation gap. - Sibling ADRs: 0005 (Redis pub/sub), 0011 (cross-app calls), 0012 (sampling interaction).