Skip to content

Runbook: Logs for service X aren't in Loki

Symptom

The Grafana Logs-Trace Pivot dashboard shows no results for a specific service, or curl -s 'http://localhost:3100/loki/api/v1/label/service_name/values' is missing the service.

Likely causes

  1. The NestJS app's logger isn't wired to OTel (missing nestjs-pino or pre-otel.main.ts)
  2. The filelog receiver can't read Docker container logs (permission denied)
  3. The OTel Collector's logs pipeline isn't running
  4. Loki is down or full
  5. The service hasn't emitted any logs since the last Loki retention cycle

Diagnosis

1. Verify the service is logging to Docker stdout

# Check raw Docker logs
sudo docker logs --tail 5 ebit-<service>
# Should see JSON lines with "body", "traceid", "spanid" keys (from EvoLogger/winston)
# or pino-style JSON with "level", "time", "msg" keys

2. Check the filelog receiver

# Verify the receiver is watching files
sudo docker logs ebit-otel-collector 2>&1 | grep "filelog"
# Should see: "Started watching file" entries

# Check for permission errors
sudo docker logs ebit-otel-collector 2>&1 | grep "permission denied"
# If present: the collector needs user: "0:0" in docker-compose.yml
# This was fixed in task #28 — verify it's in place:
grep 'user:' docker-compose.yml | head -1
# Expected: user: "0:0" under the otel-collector service

3. Check the OTel logs pipeline

# Verify Loki exporter is configured
grep -A5 "loki:" observability/otel-collector.yml
# Expected: endpoint: http://loki:3100/loki/api/v1/push

# Verify logs pipeline includes filelog
grep -A3 "logs:" observability/otel-collector.yml
# Expected: receivers: [otlp, filelog/docker]

4. Check Loki health

curl -s http://localhost:3100/ready
# Expected: "ready"

curl -s http://localhost:3100/loki/api/v1/labels | python3 -m json.tool
# Should list: job, level, service_name

5. For bj/bo specifically

ebit-bj and ebit-bo apps originally didn't have the pre-otel.main.ts import chain that api/rt/speed-roulette use. They bootstrap OTel via NODE_OPTIONS: "--require @opentelemetry/auto-instrumentations-node/register" in docker-compose.yml. If this line is missing, they won't emit OTLP logs — but the filelog receiver still picks up their Docker stdout.

# Check NODE_OPTIONS is set
sudo docker exec ebit-bj env | grep NODE_OPTIONS
sudo docker exec ebit-bo env | grep NODE_OPTIONS

Fix

Cause Fix
Permission denied on Docker logs Add user: "0:0" to otel-collector service in docker-compose.yml, then sudo docker compose up -d --force-recreate otel-collector
filelog receiver not in pipeline Add filelog/docker to logs.receivers in observability/otel-collector.yml
Loki unhealthy sudo docker compose restart loki — check observability/loki.yml for invalid config fields
Service not logging Verify nestjs-pino or EvoLogger is wired in the app's app.module.ts
No recent logs Generate traffic: curl http://localhost:400X/health (where X = service port)

Trace-ID pivot not working

If logs appear in Loki but the TraceID link to Jaeger doesn't work:

  1. Check the field name — EvoLogger emits "traceid" (camelCase, no underscore), not "trace_id"
  2. Verify the Loki datasource derivedFields regex matches: '"traceid":"(\w+)"'
  3. Check observability/grafana/provisioning/datasources/datasources.yml — the datasourceUid must be jaeger

Prevention

  • After adding any new NestJS app, verify it appears in curl -s 'http://localhost:3100/loki/api/v1/label/service_name/values'
  • The filelog receiver auto-discovers all Docker containers — no per-service config needed
  • Keep user: "0:0" on the collector to avoid future permission issues