Skip to content

Flow: rt websocket connect + events (/events namespace)

Trace IDs: 715c41f678f84462eab761c6f423bacc (send /events · op=AuthSuccess · 0.24 ms) — per-emit, single-span. Jaeger: http://localhost:16686/trace/715c41f678f84462eab761c6f423bacc · E2E: tests-e2e/tests/dropbet-rt-websocket.spec.ts Generated: 2026-04-16 · Services touched (trace): ebit-rt only (connection auth RPC → ebit-api is not trace-linked — §6.1)

1. User-visible contract

ebit-rt (apps/rt) is Evospin's websocket surface. main.ts boots with http: null — the process only serves socket.io on :4001, namespace /events, transports: ['websocket'] (polling is off server-side at client.gateway.ts:51). CORS: APP_FE_ORIGIN or http://localhost:3000, credentials on. Max message length: WS_CLIENT_MESSAGE_MAX_LEN (default 8192 bytes).

Contract observed by browser clients (ebit-fe src/providers/socketsWithAuth.tsx is the reference consumer):

  • Connect handshake carries socket_token via auth: { socket_token }, or headers.socket_token, or cookie: socket_token=… (extractSocketAuthToken at apps/rt/src/utils.ts:24). The token is issued by /auth/verify-2fa (and /auth/sign-in for non-2FA users) alongside access_token + refresh_token.
  • Server emits AuthSuccess ({success:true}) on successful validation, or AuthError then disconnect() on failure or over-length token. If no token is presented at all, neither event fires — the socket stays connected, unauthenticated, so the client can later send an Authorization message (client.gateway.ts:97 gates the emit on if (accessToken)).
  • Runtime re-auth via emit('Authorization', { accessToken }) — bypasses the clientListenedEvents gate (explicit check at client.gateway.ts:169).
  • Custom client events fan out via BackendService.emitEvent (fire-and-forget publish to the Redis gateway transport) when isAck is false, or sendEvent (RPC with 5 s timeout + reply) when the client uses emitWithAck. Only event names registered into clientListenedEvents via the Private.RequestClientEvents collection round-trip are forwarded; unknown events are silently warn-logged and dropped (client.gateway.ts:178-183).
  • Server → client pushes land through BackendService.onServerEventClientGateway.handleServerEvent. Routing: if message.user set → only that user's sockets; if message.roomserver.to(room).emit; otherwise server.emit (broadcast). Each emit produces one send /events span in Jaeger.
  • Rooms are server-authoritative: a client emits an event whose RPC reply has _type=ClientRoom, action=join|leave, rooms[] — the gateway then calls client.join(rooms) or client.leave(room) (client.gateway.ts:229-248). Clients cannot unilaterally join.
  • Liveness: every minute warpUpOnlineUsers walks clientSockets and re-zadds each authenticated user id into the cache-Redis ONLINE_USERS_KEY zset, score=now (ONLINE_USER_TTL_SECONDS). A separate cron on the ebit-api side (online-tracker.service.ts:30, every 10 s) broadcasts UsersOnlineUpdated — with an inflated count, see §6.3.

2. Sequence diagram

sequenceDiagram
  participant C as Browser (socket.io-client)
  participant RT as ebit-rt (:4001 /events)
  participant R as Redis (gateway pub/sub)
  participant API as ebit-api (auth controller)
  participant CACHE as Redis (cache — ONLINE_USERS)

  C->>RT: WS upgrade + auth={socket_token}
  RT->>RT: extractSocketAuthToken (auth → headers → cookie)
  RT->>R: publish Private.AuthSocket (ClientProxy.send, 5s timeout)
  R->>API: deliver (Redis transport — no traceparent, §6.1)
  API->>API: JwtGuard validates accessToken → user
  API-->>R: AuthCheckResult { user }
  R-->>RT: reply
  RT->>CACHE: zadd ONLINE_USERS_KEY score=now member=userId
  RT-->>C: emit AuthSuccess { success:true } — span send /events

  Note over API,C: 10 s ticker
  API->>R: emitEvent Server.UsersOnlineUpdated { value }
  R->>RT: MessagePattern `server_channel_event.*` (GatewayController.serverEvent)
  RT-->>C: emit UsersOnlineUpdated (broadcast — span send /events)

  Note over C,RT: client emits unknown event
  C->>RT: emit UnknownThing {…}
  RT-->>RT: not in clientListenedEvents — warn-log and drop

  Note over C,RT: disconnect
  C->>RT: disconnect()
  RT->>RT: handleDisconnect → clientSockets.delete(client.id)

3. Component diagram

Edges are numbered in connect-then-event-loop order. Section §4 below has the same numbers — each (N) on the diagram has its own §4.N subsection, so you can click straight through.

flowchart TD
    %% Datastores
    rd[("Redis (gateway pub/sub)<br/>server_channel_event.* · private_channel_event.* · Private.AuthSocket RPC")]
    cache[("Redis (cache)<br/>ONLINE_USERS_KEY zset · TTL sweep")]

    %% Browser
    subgraph br["Browser (socket.io-client)"]
        fe["SocketsWithAuthProvider<br/><i>ebit-fe/src/providers/socketsWithAuth.tsx</i>"]
    end

    %% ebit-rt process
    subgraph rt["ebit-rt :4001 (socket.io, /events, websocket transport)"]
        cg["ClientGateway<br/><i>@WebSocketGateway(namespace=events) + WildcardsIoAdapter</i>"]
        auth["AuthService.authorizeConnection<br/><i>extractSocketAuthToken + authCheck</i>"]
        bs["BackendService<br/><i>sendEvent (5s RPC) / emitEvent (fire-forget)</i>"]
        gc["GatewayController.serverEvent<br/><i>@MessagePattern server_channel_event.*</i>"]
        online["OnlineTrackerService (rt)<br/><i>warmUpUserOnline + cleanColdUsers cron</i>"]
    end

    %% ebit-api process (reached via Redis pub/sub, AF-2 trace gap)
    subgraph api["ebit-api :4000 (reached via Redis pub/sub — AF-2)"]
        authCtl["AuthController.authSocket<br/><i>@GatewayMethod(Private.AuthSocket) + JwtGuard</i>"]
        onlineApi["OnlineTrackerService (api)<br/><i>10s cron · zcard + fakeUserOnline (AF-5)</i>"]
        notifier["ProfileNotifier / game services<br/><i>EventsGateway.emitEvent Server.*</i>"]
    end

    %% (1)-(5) Connect + auth handshake
    fe -- "(1) WS upgrade + auth={socket_token}" --> cg
    cg -- "(2) authorizeClient → authCheck" --> auth
    auth -- "(3) sendEvent Private.AuthSocket (5s, AF-2)" --> rd
    rd -- "(4) @MessagePattern dispatch" --> authCtl
    cg -- "(5) warmUpUserOnline (zadd)" --> online
    online -- "(5a) zadd ONLINE_USERS_KEY" --> cache
    cg -- "(6) emit AuthSuccess (send /events span)" --> fe

    %% (7)-(8) Client-initiated emit loop
    fe -- "(7) emit Authorization / room event / unknown" --> cg
    cg -- "(8) emitEvent or sendEvent (idempotencyKey)" --> bs
    bs -- "(8a) publish client_channel_event.* / private_channel_event.*" --> rd

    %% (9)-(12) Server-push fan-out loop
    onlineApi -- "(9) emit Server.UsersOnlineUpdated (10s tick, AF-5)" --> rd
    notifier -- "(9a) emit Server.* (profile/ban/bet)" --> rd
    rd -- "(10) @MessagePattern server_channel_event.*" --> gc
    gc -- "(11) handleServerEvent → BackendService" --> bs
    bs -- "(12) emit to user / room / broadcast (send /events span)" --> fe

    %% (13) Disconnect + TTL sweep
    fe -- "(13) disconnect → handleDisconnect (no zrem, FM-RT-5)" --> cg
    online -- "(13a) zremrangebyscore (per-minute cron)" --> cache

    %% Style: datastores stand out
    classDef db fill:#1f4e79,stroke:#bbb,color:#fff;
    class rd,cache db;

4. Per-step walkthrough

Section headers below mirror the diagram step numbers in §3 — each §4.N covers (N) on the diagram. The numbering follows connect (steps 1–6) → client emit loop (7–8) → server push loop (9–12) → disconnect (13). Sub-edges marked (Na) are inseparable side effects of the parent step and collapse into the same walkthrough section.

4.1 Step (1) — WS upgrade + auth={socket_token} reaches ClientGateway

Engine.io upgrades HTTP → WS; socket.io namespace /events accepts the connection at apps/rt/src/gateway/client.gateway.ts:51 with transports:['websocket'] only (polling disabled server-side). CORS: APP_FE_ORIGIN or http://localhost:3000, credentials on. handleConnection at client.gateway.ts:94 extracts the token via extractSocketAuthToken (apps/rt/src/utils.ts:24) in auth → headers → cookie order. If accessToken is absent or over WS_CLIENT_MESSAGE_MAX_LEN (default 8192 bytes), the socket stays anonymous in clientSockets and steps (2)–(6) are skipped.

4.2 Step (2) — authorizeClientAuthService.authCheck

ClientGateway.handleConnection hands off to AuthService.authorizeConnection, which sets client.accessToken and prepares the RPC envelope. No trace span: ebit-rt does not instrument inbound socket events, only outbound emits.

4.3 Steps (3)–(4) — Private.AuthSocket RPC over Redis pub/sub (AF-2 trace gap)

BackendService.sendEvent (apps/rt/src/gateway/backend.service.ts) publishes to the gateway transport via Nest's ClientProxy.send, 5 s timeout. On the ebit-api side the message lands at auth.controller.ts:192 where @GatewayMethod(Private.AuthSocket) + JwtGuard validates payload.accessToken, attaches req.user, and returns new AuthCheckResult(payload.user ?? null). The reply comes back; authorizeConnection sets client.authorized = true; client.user = result.user.

This is the AF-2 trace gap. Nest's Redis microservice transport does not propagate W3C traceparent, so the JWT validation span on ebit-api is orphaned — never linked to the connect on ebit-rt. See AF-2 in ../weaknesses-register.md. E2E anchors on the send /events span at step (6) as a proxy signal.

4.4 Step (5) — warmUpUserOnline zadds into ONLINE_USERS_KEY

ClientGateway calls OnlineTrackerService.warmUpUserOnline(userId) (apps/rt/src/online-tracker/); the service zadds ONLINE_USERS_KEY (cache Redis) with score=now, member=userId (step 5a). TTL is ONLINE_USER_TTL_SECONDS. The per-minute warpUpOnlineUsers walks clientSockets and re-zadds each authenticated user so live sessions hold their score.

4.5 Step (6) — emit AuthSuccess produces the only observable connect span

client.emit('AuthSuccess', {success:true}) produces span send /events tagged messaging.system=socket.io, messaging.destination=/events, messaging.destination_kind=topic, span.kind=producer (0.24 ms observed in trace 715c41f6…). Failure path: client.emit('AuthError', …) then client.disconnect(). No-token path: neither event fires — the client can later send an Authorization message on the existing socket (step 7).

4.6 Steps (7)–(8) — Client-initiated emit loop (@SubscribeMessage('*'))

clientMessageHandler at client.gateway.ts:178 is wired via WildcardsIoAdapter:

  • Authorization short-circuits to re-auth (bypasses the clientListenedEvents gate at client.gateway.ts:169).
  • Anything not in clientListenedEvents is warn-logged and dropped (FM-RT-4 below) — ack-style emits leave the client hanging until their own 5 s timeout.
  • Otherwise backend.emitEvent (fire-forget, if !isAck) or backend.sendEvent (5 s RPC with idempotencyKey) is invoked.

Step (8a): the publish lands on Redis gateway transport channels (client_channel_event.* or private_channel_event.*). sendEvent inspects the reply _type: - ClientRoomhandleSocketRoomUpdate joins or leaves server-named rooms (client.gateway.ts:229-248). Rooms are server-authoritative — clients cannot unilaterally join. - ClientMessage → returns {data, success:true} back over the ack. - GatewayError/ApiException/GatewayTimeoutError/HttpException → normalised error shape {message, code, statusCode, success:false}.

4.7 Steps (9)–(12) — Server push loop (separate trace per emit)

Step (9): every 10 s apps/api/src/user/online-tracker.service.ts:30 reads zcard ONLINE_USERS_KEY + adds fakeUserOnline (init 500, drifts ±5 per tick, floored at 180) → publishes Server.UsersOnlineUpdated via EventsGateway.emitEvent. AF-5: dashboards reading this number see zcard + padding, not the real figure. Raw zcard is available only via OnlineTrackerService.getOnlineCount(); no HTTP endpoint exposes it. Step (9a) is the same pattern for ProfileNotifierService / game services (profile/ban/bet broadcasts).

Step (10): on the rt side, GatewayController.serverEvent (apps/rt/src/gateway/gateway.controller.ts) picks up everything on server_channel_event.* via @MessagePattern. Step (11): forwards into BackendService.handleServerEvent, which fans out to registered listeners — ClientGateway.handleServerEvent is the only one wired at boot. Stamps __m = {ts, windowId} on object payloads so clients can de-dupe multi-tab fan-out.

Step (12): routing decision in ClientGateway.handleServerEvent — - message.user set → iterate clientSockets and emit only to that user's sockets (FM-RT-5: per-instance Map; horizontal scale breaks this). - message.roomserver.to(room).emit(…) (rooms are also per-instance under the default adapter). - otherwise → server.emit(…) broadcast.

Each emit → one send /events producer span. Separate trace from the producer because of AF-2: the publish on ebit-api and the emit on ebit-rt are not linked (SF-016 below).

4.8 Step (13) — Disconnect + TTL sweep (FM-RT-5)

handleDisconnect at client.gateway.ts simply clientSockets.delete(client.id). No zrem on ONLINE_USERS_KEY — entries persist until the per-minute cron OnlineTrackerService.cleanColdUsers (apps/rt/src/online-tracker/online-tracker.service.ts:14) runs zremrangebyscore -inf (now - ONLINE_USER_TTL_SECONDS) (step 13a). So a user reads "online" for up to TTL after their last tab closes. For short-lived probes and reconnects this is intentional; for accounting it inflates the count and stacks on top of AF-5.

5. Data model

Store Key / channel R/W Fields Source
In-memory (per rt worker) ClientGateway.clientSockets R+W Map<socketId, SocketWithContext> (scoped to one rt instance; horizontally scaling rt breaks message.user per-user delivery unless sticky) client.gateway.ts:59
In-memory (per rt worker) ClientGateway.clientListenedEvents R+W (append-only) Set<string> populated by Private.ClientEvents envelopes from other microservices at boot (afterInit publishes RequestClientEvents) client.gateway.ts:150
Redis (cache) ONLINE_USERS_KEY zset R+W member=userId, score=epoch-ms of last warmUpUserOnline. TTL sweep zremrangebyscore -inf (now - ONLINE_USER_TTL_SECONDS) every minute apps/rt/src/online-tracker/
Redis (cache) throttler keys none (rt has no HTTP; no throttler spans in traces) n/a
Redis (gateway pub/sub) server_channel_event.*, private_channel_event.*, client_channel_event.* R+W envelope {event, data, user?, room?, delayMs?, timestamp, windowId} from EventMessage libs/gateway/src/dto/base.dto.ts

6. Failure modes

  1. AF-2 — Cross-service trace gap on every handshake. AuthService.authCheck RPCs ebit-api over Nest's Redis pub/sub transport — which does not propagate W3C traceparent (memory project_otel_microservice_transport_gap). The Private.AuthSocket RPC the auth flow depends on emits orphan spans under serviceName=ebit-api, never linked to the connect. Effect: ebit-rt surfaces only outbound send /events producer spans and you cannot walk a single trace from client connect → JWT validation → AuthSuccess emit. E2E anchors on the send /events span that follows AuthSuccess as a proxy signal. Same gap affects every server-push hop in steps (9)–(12). See AF-2 in ../weaknesses-register.md.
  2. AF-3 / FM-RT-5 — Horizontal scale breaks per-user delivery. handleServerEvent iterates only this.clientSockets — the local Map. If rt is scaled past one replica without sticky-routing or cross-instance fan-out, a message.user-targeted emit on instance A silently misses the user's socket on instance B. Same hazard for room joins — rooms are per-instance in socket.io's default adapter. The canonical fix (@socket.io/redis-adapter) is tracked in §7.
  3. AF-5 — Online count is intentionally inflated. apps/api/src/user/online-tracker.service.ts adds fakeUserOnline (init 500, drifts ±5 per 10 s, floored at 180) to getOnlineCount() before broadcasting UsersOnlineUpdated in step (9). Any dashboard or stakeholder that reads this number is reading zcard + padding, not the real figure. The raw count is available only via OnlineTrackerService.getOnlineCount() (no HTTP endpoint exposes it). Memory: project_online_count_inflation.
  4. FM-RT-4 — Unknown client event → silent drop. clientMessageHandler at client.gateway.ts:178 warn-logs and returns undefined for any event not in clientListenedEvents (step 7). For ack-style emits this leaves the client hanging until their own 5 s timeout; no error event is emitted back. Bootstrapping issue: if a microservice emits ClientEvents after rt has already started accepting traffic, early clients can hit this path for valid events.
  5. FM-RT-5 — Stale online score on disconnect. handleDisconnect at step (13) does not zrem the user — the zset entry stays until the cron TTL sweep (ONLINE_USER_TTL_SECONDS) reclaims it (step 13a). For short-lived probes and reconnects this is fine, but it overstates "online" for the TTL window after the last tab closes — stacking on top of AF-5 inflation.
  6. SF-016 — Server-push emit is not linked to its producer. Every send /events emit in step (12) is its own root trace; the originating EventsGateway.emitEvent on ebit-api (step 9 / 9a) lands on a separate trace because of AF-2. Dashboards counting "events emitted to clients" cannot be joined to the business event that triggered them (bet settled, balance updated, profile changed). Workaround until AF-2 is fixed: stamp EventMessage.payload.__trace and reconstruct linkage offline.

7. Unresolved

  • Admin socket.io UI (@socket.io/admin-ui at client.gateway.ts:127) wired behind DEBUG_SOCKET_IO_ADMIN=true; must stay off in production.
  • Redis adapter for horizontal rt scale (@socket.io/redis-adapter) is not installed — canonical fix for §6 #2.
  • Trace propagation across @GatewayMethod tracked in project_otel_microservice_transport_gap; /events flows stay single-service until fixed.