Flow: rt websocket connect + events (/events namespace)¶
Trace IDs:
715c41f678f84462eab761c6f423bacc(send /events · op=AuthSuccess · 0.24 ms) — per-emit, single-span. Jaeger: http://localhost:16686/trace/715c41f678f84462eab761c6f423bacc · E2E:tests-e2e/tests/dropbet-rt-websocket.spec.tsGenerated: 2026-04-16 · Services touched (trace):ebit-rtonly (connection auth RPC → ebit-api is not trace-linked — §6.1)
1. User-visible contract¶
ebit-rt (apps/rt) is Evospin's websocket surface. main.ts boots with http: null — the process only serves socket.io on :4001, namespace /events, transports: ['websocket'] (polling is off server-side at client.gateway.ts:51). CORS: APP_FE_ORIGIN or http://localhost:3000, credentials on. Max message length: WS_CLIENT_MESSAGE_MAX_LEN (default 8192 bytes).
Contract observed by browser clients (ebit-fe src/providers/socketsWithAuth.tsx is the reference consumer):
- Connect handshake carries
socket_tokenviaauth: { socket_token }, orheaders.socket_token, orcookie: socket_token=…(extractSocketAuthTokenatapps/rt/src/utils.ts:24). The token is issued by/auth/verify-2fa(and/auth/sign-infor non-2FA users) alongsideaccess_token+refresh_token. - Server emits
AuthSuccess({success:true}) on successful validation, orAuthErrorthendisconnect()on failure or over-length token. If no token is presented at all, neither event fires — the socket stays connected, unauthenticated, so the client can later send anAuthorizationmessage (client.gateway.ts:97gates the emit onif (accessToken)). - Runtime re-auth via
emit('Authorization', { accessToken })— bypasses theclientListenedEventsgate (explicit check atclient.gateway.ts:169). - Custom client events fan out via
BackendService.emitEvent(fire-and-forget publish to the Redis gateway transport) whenisAckis false, orsendEvent(RPC with 5 s timeout + reply) when the client usesemitWithAck. Only event names registered intoclientListenedEventsvia thePrivate.RequestClientEventscollection round-trip are forwarded; unknown events are silently warn-logged and dropped (client.gateway.ts:178-183). - Server → client pushes land through
BackendService.onServerEvent→ClientGateway.handleServerEvent. Routing: ifmessage.userset → only that user's sockets; ifmessage.room→server.to(room).emit; otherwiseserver.emit(broadcast). Each emit produces onesend /eventsspan in Jaeger. - Rooms are server-authoritative: a client emits an event whose RPC reply has
_type=ClientRoom, action=join|leave, rooms[]— the gateway then callsclient.join(rooms)orclient.leave(room)(client.gateway.ts:229-248). Clients cannot unilaterally join. - Liveness: every minute
warpUpOnlineUserswalksclientSocketsand re-zadds each authenticated user id into the cache-RedisONLINE_USERS_KEYzset, score=now (ONLINE_USER_TTL_SECONDS). A separate cron on the ebit-api side (online-tracker.service.ts:30, every 10 s) broadcastsUsersOnlineUpdated— with an inflated count, see §6.3.
2. Sequence diagram¶
sequenceDiagram
participant C as Browser (socket.io-client)
participant RT as ebit-rt (:4001 /events)
participant R as Redis (gateway pub/sub)
participant API as ebit-api (auth controller)
participant CACHE as Redis (cache — ONLINE_USERS)
C->>RT: WS upgrade + auth={socket_token}
RT->>RT: extractSocketAuthToken (auth → headers → cookie)
RT->>R: publish Private.AuthSocket (ClientProxy.send, 5s timeout)
R->>API: deliver (Redis transport — no traceparent, §6.1)
API->>API: JwtGuard validates accessToken → user
API-->>R: AuthCheckResult { user }
R-->>RT: reply
RT->>CACHE: zadd ONLINE_USERS_KEY score=now member=userId
RT-->>C: emit AuthSuccess { success:true } — span send /events
Note over API,C: 10 s ticker
API->>R: emitEvent Server.UsersOnlineUpdated { value }
R->>RT: MessagePattern `server_channel_event.*` (GatewayController.serverEvent)
RT-->>C: emit UsersOnlineUpdated (broadcast — span send /events)
Note over C,RT: client emits unknown event
C->>RT: emit UnknownThing {…}
RT-->>RT: not in clientListenedEvents — warn-log and drop
Note over C,RT: disconnect
C->>RT: disconnect()
RT->>RT: handleDisconnect → clientSockets.delete(client.id)
3. Component diagram¶
Edges are numbered in connect-then-event-loop order. Section §4 below has the same numbers — each (N) on the diagram has its own §4.N subsection, so you can click straight through.
flowchart TD
%% Datastores
rd[("Redis (gateway pub/sub)<br/>server_channel_event.* · private_channel_event.* · Private.AuthSocket RPC")]
cache[("Redis (cache)<br/>ONLINE_USERS_KEY zset · TTL sweep")]
%% Browser
subgraph br["Browser (socket.io-client)"]
fe["SocketsWithAuthProvider<br/><i>ebit-fe/src/providers/socketsWithAuth.tsx</i>"]
end
%% ebit-rt process
subgraph rt["ebit-rt :4001 (socket.io, /events, websocket transport)"]
cg["ClientGateway<br/><i>@WebSocketGateway(namespace=events) + WildcardsIoAdapter</i>"]
auth["AuthService.authorizeConnection<br/><i>extractSocketAuthToken + authCheck</i>"]
bs["BackendService<br/><i>sendEvent (5s RPC) / emitEvent (fire-forget)</i>"]
gc["GatewayController.serverEvent<br/><i>@MessagePattern server_channel_event.*</i>"]
online["OnlineTrackerService (rt)<br/><i>warmUpUserOnline + cleanColdUsers cron</i>"]
end
%% ebit-api process (reached via Redis pub/sub, AF-2 trace gap)
subgraph api["ebit-api :4000 (reached via Redis pub/sub — AF-2)"]
authCtl["AuthController.authSocket<br/><i>@GatewayMethod(Private.AuthSocket) + JwtGuard</i>"]
onlineApi["OnlineTrackerService (api)<br/><i>10s cron · zcard + fakeUserOnline (AF-5)</i>"]
notifier["ProfileNotifier / game services<br/><i>EventsGateway.emitEvent Server.*</i>"]
end
%% (1)-(5) Connect + auth handshake
fe -- "(1) WS upgrade + auth={socket_token}" --> cg
cg -- "(2) authorizeClient → authCheck" --> auth
auth -- "(3) sendEvent Private.AuthSocket (5s, AF-2)" --> rd
rd -- "(4) @MessagePattern dispatch" --> authCtl
cg -- "(5) warmUpUserOnline (zadd)" --> online
online -- "(5a) zadd ONLINE_USERS_KEY" --> cache
cg -- "(6) emit AuthSuccess (send /events span)" --> fe
%% (7)-(8) Client-initiated emit loop
fe -- "(7) emit Authorization / room event / unknown" --> cg
cg -- "(8) emitEvent or sendEvent (idempotencyKey)" --> bs
bs -- "(8a) publish client_channel_event.* / private_channel_event.*" --> rd
%% (9)-(12) Server-push fan-out loop
onlineApi -- "(9) emit Server.UsersOnlineUpdated (10s tick, AF-5)" --> rd
notifier -- "(9a) emit Server.* (profile/ban/bet)" --> rd
rd -- "(10) @MessagePattern server_channel_event.*" --> gc
gc -- "(11) handleServerEvent → BackendService" --> bs
bs -- "(12) emit to user / room / broadcast (send /events span)" --> fe
%% (13) Disconnect + TTL sweep
fe -- "(13) disconnect → handleDisconnect (no zrem, FM-RT-5)" --> cg
online -- "(13a) zremrangebyscore (per-minute cron)" --> cache
%% Style: datastores stand out
classDef db fill:#1f4e79,stroke:#bbb,color:#fff;
class rd,cache db;
4. Per-step walkthrough¶
Section headers below mirror the diagram step numbers in §3 — each §4.N covers (N) on the diagram. The numbering follows connect (steps 1–6) → client emit loop (7–8) → server push loop (9–12) → disconnect (13). Sub-edges marked (Na) are inseparable side effects of the parent step and collapse into the same walkthrough section.
4.1 Step (1) — WS upgrade + auth={socket_token} reaches ClientGateway¶
Engine.io upgrades HTTP → WS; socket.io namespace /events accepts the connection at apps/rt/src/gateway/client.gateway.ts:51 with transports:['websocket'] only (polling disabled server-side). CORS: APP_FE_ORIGIN or http://localhost:3000, credentials on. handleConnection at client.gateway.ts:94 extracts the token via extractSocketAuthToken (apps/rt/src/utils.ts:24) in auth → headers → cookie order. If accessToken is absent or over WS_CLIENT_MESSAGE_MAX_LEN (default 8192 bytes), the socket stays anonymous in clientSockets and steps (2)–(6) are skipped.
4.2 Step (2) — authorizeClient → AuthService.authCheck¶
ClientGateway.handleConnection hands off to AuthService.authorizeConnection, which sets client.accessToken and prepares the RPC envelope. No trace span: ebit-rt does not instrument inbound socket events, only outbound emits.
4.3 Steps (3)–(4) — Private.AuthSocket RPC over Redis pub/sub (AF-2 trace gap)¶
BackendService.sendEvent (apps/rt/src/gateway/backend.service.ts) publishes to the gateway transport via Nest's ClientProxy.send, 5 s timeout. On the ebit-api side the message lands at auth.controller.ts:192 where @GatewayMethod(Private.AuthSocket) + JwtGuard validates payload.accessToken, attaches req.user, and returns new AuthCheckResult(payload.user ?? null). The reply comes back; authorizeConnection sets client.authorized = true; client.user = result.user.
This is the AF-2 trace gap. Nest's Redis microservice transport does not propagate W3C traceparent, so the JWT validation span on ebit-api is orphaned — never linked to the connect on ebit-rt. See AF-2 in ../weaknesses-register.md. E2E anchors on the send /events span at step (6) as a proxy signal.
4.4 Step (5) — warmUpUserOnline zadds into ONLINE_USERS_KEY¶
ClientGateway calls OnlineTrackerService.warmUpUserOnline(userId) (apps/rt/src/online-tracker/); the service zadds ONLINE_USERS_KEY (cache Redis) with score=now, member=userId (step 5a). TTL is ONLINE_USER_TTL_SECONDS. The per-minute warpUpOnlineUsers walks clientSockets and re-zadds each authenticated user so live sessions hold their score.
4.5 Step (6) — emit AuthSuccess produces the only observable connect span¶
client.emit('AuthSuccess', {success:true}) produces span send /events tagged messaging.system=socket.io, messaging.destination=/events, messaging.destination_kind=topic, span.kind=producer (0.24 ms observed in trace 715c41f6…). Failure path: client.emit('AuthError', …) then client.disconnect(). No-token path: neither event fires — the client can later send an Authorization message on the existing socket (step 7).
4.6 Steps (7)–(8) — Client-initiated emit loop (@SubscribeMessage('*'))¶
clientMessageHandler at client.gateway.ts:178 is wired via WildcardsIoAdapter:
Authorizationshort-circuits to re-auth (bypasses theclientListenedEventsgate atclient.gateway.ts:169).- Anything not in
clientListenedEventsis warn-logged and dropped (FM-RT-4 below) — ack-style emits leave the client hanging until their own 5 s timeout. - Otherwise
backend.emitEvent(fire-forget, if!isAck) orbackend.sendEvent(5 s RPC withidempotencyKey) is invoked.
Step (8a): the publish lands on Redis gateway transport channels (client_channel_event.* or private_channel_event.*). sendEvent inspects the reply _type:
- ClientRoom → handleSocketRoomUpdate joins or leaves server-named rooms (client.gateway.ts:229-248). Rooms are server-authoritative — clients cannot unilaterally join.
- ClientMessage → returns {data, success:true} back over the ack.
- GatewayError/ApiException/GatewayTimeoutError/HttpException → normalised error shape {message, code, statusCode, success:false}.
4.7 Steps (9)–(12) — Server push loop (separate trace per emit)¶
Step (9): every 10 s apps/api/src/user/online-tracker.service.ts:30 reads zcard ONLINE_USERS_KEY + adds fakeUserOnline (init 500, drifts ±5 per tick, floored at 180) → publishes Server.UsersOnlineUpdated via EventsGateway.emitEvent. AF-5: dashboards reading this number see zcard + padding, not the real figure. Raw zcard is available only via OnlineTrackerService.getOnlineCount(); no HTTP endpoint exposes it. Step (9a) is the same pattern for ProfileNotifierService / game services (profile/ban/bet broadcasts).
Step (10): on the rt side, GatewayController.serverEvent (apps/rt/src/gateway/gateway.controller.ts) picks up everything on server_channel_event.* via @MessagePattern. Step (11): forwards into BackendService.handleServerEvent, which fans out to registered listeners — ClientGateway.handleServerEvent is the only one wired at boot. Stamps __m = {ts, windowId} on object payloads so clients can de-dupe multi-tab fan-out.
Step (12): routing decision in ClientGateway.handleServerEvent —
- message.user set → iterate clientSockets and emit only to that user's sockets (FM-RT-5: per-instance Map; horizontal scale breaks this).
- message.room → server.to(room).emit(…) (rooms are also per-instance under the default adapter).
- otherwise → server.emit(…) broadcast.
Each emit → one send /events producer span. Separate trace from the producer because of AF-2: the publish on ebit-api and the emit on ebit-rt are not linked (SF-016 below).
4.8 Step (13) — Disconnect + TTL sweep (FM-RT-5)¶
handleDisconnect at client.gateway.ts simply clientSockets.delete(client.id). No zrem on ONLINE_USERS_KEY — entries persist until the per-minute cron OnlineTrackerService.cleanColdUsers (apps/rt/src/online-tracker/online-tracker.service.ts:14) runs zremrangebyscore -inf (now - ONLINE_USER_TTL_SECONDS) (step 13a). So a user reads "online" for up to TTL after their last tab closes. For short-lived probes and reconnects this is intentional; for accounting it inflates the count and stacks on top of AF-5.
5. Data model¶
| Store | Key / channel | R/W | Fields | Source |
|---|---|---|---|---|
| In-memory (per rt worker) | ClientGateway.clientSockets |
R+W | Map<socketId, SocketWithContext> (scoped to one rt instance; horizontally scaling rt breaks message.user per-user delivery unless sticky) |
client.gateway.ts:59 |
| In-memory (per rt worker) | ClientGateway.clientListenedEvents |
R+W (append-only) | Set<string> populated by Private.ClientEvents envelopes from other microservices at boot (afterInit publishes RequestClientEvents) |
client.gateway.ts:150 |
| Redis (cache) | ONLINE_USERS_KEY zset |
R+W | member=userId, score=epoch-ms of last warmUpUserOnline. TTL sweep zremrangebyscore -inf (now - ONLINE_USER_TTL_SECONDS) every minute |
apps/rt/src/online-tracker/ |
| Redis (cache) | throttler keys | — | none (rt has no HTTP; no throttler spans in traces) | n/a |
| Redis (gateway pub/sub) | server_channel_event.*, private_channel_event.*, client_channel_event.* |
R+W | envelope {event, data, user?, room?, delayMs?, timestamp, windowId} from EventMessage |
libs/gateway/src/dto/base.dto.ts |
6. Failure modes¶
- AF-2 — Cross-service trace gap on every handshake.
AuthService.authCheckRPCs ebit-api over Nest's Redis pub/sub transport — which does not propagate W3Ctraceparent(memoryproject_otel_microservice_transport_gap). ThePrivate.AuthSocketRPC the auth flow depends on emits orphan spans underserviceName=ebit-api, never linked to the connect. Effect:ebit-rtsurfaces only outboundsend /eventsproducer spans and you cannot walk a single trace from client connect → JWT validation → AuthSuccess emit. E2E anchors on thesend /eventsspan that follows AuthSuccess as a proxy signal. Same gap affects every server-push hop in steps (9)–(12). See AF-2 in../weaknesses-register.md. - AF-3 / FM-RT-5 — Horizontal scale breaks per-user delivery.
handleServerEventiterates onlythis.clientSockets— the local Map. If rt is scaled past one replica without sticky-routing or cross-instance fan-out, amessage.user-targeted emit on instance A silently misses the user's socket on instance B. Same hazard for room joins — rooms are per-instance in socket.io's default adapter. The canonical fix (@socket.io/redis-adapter) is tracked in §7. - AF-5 — Online count is intentionally inflated.
apps/api/src/user/online-tracker.service.tsaddsfakeUserOnline(init 500, drifts ±5 per 10 s, floored at 180) togetOnlineCount()before broadcastingUsersOnlineUpdatedin step (9). Any dashboard or stakeholder that reads this number is reading zcard + padding, not the real figure. The raw count is available only viaOnlineTrackerService.getOnlineCount()(no HTTP endpoint exposes it). Memory:project_online_count_inflation. - FM-RT-4 — Unknown client event → silent drop.
clientMessageHandleratclient.gateway.ts:178warn-logs and returnsundefinedfor any event not inclientListenedEvents(step 7). For ack-style emits this leaves the client hanging until their own 5 s timeout; no error event is emitted back. Bootstrapping issue: if a microservice emitsClientEventsafter rt has already started accepting traffic, early clients can hit this path for valid events. - FM-RT-5 — Stale online score on disconnect.
handleDisconnectat step (13) does notzremthe user — the zset entry stays until the cron TTL sweep (ONLINE_USER_TTL_SECONDS) reclaims it (step 13a). For short-lived probes and reconnects this is fine, but it overstates "online" for the TTL window after the last tab closes — stacking on top of AF-5 inflation. - SF-016 — Server-push emit is not linked to its producer. Every
send /eventsemit in step (12) is its own root trace; the originatingEventsGateway.emitEventon ebit-api (step 9 / 9a) lands on a separate trace because of AF-2. Dashboards counting "events emitted to clients" cannot be joined to the business event that triggered them (bet settled, balance updated, profile changed). Workaround until AF-2 is fixed: stampEventMessage.payload.__traceand reconstruct linkage offline.
7. Unresolved¶
- Admin socket.io UI (
@socket.io/admin-uiatclient.gateway.ts:127) wired behindDEBUG_SOCKET_IO_ADMIN=true; must stay off in production. - Redis adapter for horizontal rt scale (
@socket.io/redis-adapter) is not installed — canonical fix for §6 #2. - Trace propagation across
@GatewayMethodtracked inproject_otel_microservice_transport_gap;/eventsflows stay single-service until fixed.