Skip to content

Runbook: My trace isn't in Jaeger

Symptom

You triggered a request (sign-in, bet, page load) but the trace doesn't appear in Jaeger UI at http://localhost:16686. Either the service doesn't show up in the service dropdown or the specific trace_id is missing.

Likely causes

  1. OTel SDK not initialized (missing pre-otel.main.ts import)
  2. OTEL_EXPORTER_OTLP_ENDPOINT not set or pointing at wrong host
  3. OTel Collector unhealthy or not running
  4. Browser traces blocked by CORS (ebit-fe-browser service)
  5. Cross-service trace broken by Redis pub/sub transport (no traceparent propagation)
  6. Sentry gate conflict — Sentry's own trace SDK runs alongside OTel

Diagnosis

1. Check the NestJS app is exporting traces

# Verify env vars are set
sudo docker exec ebit-api env | grep OTEL
# Expected:
#   OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
#   OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
#   OTEL_SERVICE_NAME=ebit-api

# Check pre-otel.main.ts is imported FIRST in main.ts
grep -n "pre-otel\|pre-sentry" ebit-api/apps/api/src/main.ts
# pre-otel MUST come before pre-sentry (Sentry wraps OTel if loaded second)

2. Check the OTel Collector is healthy

curl -sf http://localhost:13133/ && echo "healthy" || echo "UNHEALTHY"

# Check collector logs for errors
sudo docker logs --tail 30 ebit-otel-collector 2>&1 | grep -i error

# Verify Jaeger is receiving from the collector
sudo docker logs --tail 10 ebit-jaeger 2>&1 | grep -i "span"

3. Check browser traces (ebit-fe-browser)

# Verify CORS allows browser origin
grep -A10 "cors:" observability/otel-collector.yml
# Must include http://localhost:3000 in allowed_origins

# Open browser DevTools → Network tab, filter for "v1/traces"
# Look for POST to http://localhost:4318/v1/traces
# Check response: 200 = OK, 0/CORS = blocked

# Verify browser OTel initialized
# Console should show: "[otel-client] Browser OTel initialized"

4. Check cross-service trace propagation

# Redis pub/sub RPC (@ExternalControllerClient) does NOT propagate traceparent.
# If your trace starts in speed-roulette and calls ebit-api via wallet RPC,
# the callee creates a new trace root. This is a known gap.
#
# Workaround: search Jaeger by operation name or time range, not by trace_id.

5. Verify the trace landed in Jaeger

# Search by service name
curl -s "http://localhost:16686/api/traces?service=ebit-api&limit=5" | python3 -c "
import json,sys
data = json.load(sys.stdin)['data']
for t in data[:3]:
    print(t['traceID'], len(t['spans']), 'spans')
"

Fix

Cause Fix
OTEL env vars missing Add to docker-compose.yml service environment block
pre-otel.main.ts not imported Add import '../../../libs/shared/src/basic/pre/pre-otel.main'; as first line in the app's main.ts. For bj/bo, use NODE_OPTIONS: "--require @opentelemetry/auto-instrumentations-node/register"
Collector unhealthy sudo docker compose up -d --force-recreate otel-collector
Browser CORS blocked Add origin to observability/otel-collector.yml cors.allowed_origins, restart collector
Redis RPC gap Known limitation — see docs/architecture.md. No fix available without a custom NestJS transport

Prevention

  • All new NestJS apps must import pre-otel.main.ts before any other import in main.ts
  • Run curl -s localhost:16686/api/services | jq after adding a new service to verify it appears in Jaeger
  • Browser RUM verification: open any page → DevTools Network → filter v1/traces → confirm 200 response