Runbook: Logs for service X aren't in Loki¶
Symptom¶
The Grafana Logs-Trace Pivot dashboard shows no results for a specific service, or curl -s 'http://localhost:3100/loki/api/v1/label/service_name/values' is missing the service.
Likely causes¶
- The NestJS app's logger isn't wired to OTel (missing
nestjs-pinoorpre-otel.main.ts) - The filelog receiver can't read Docker container logs (permission denied)
- The OTel Collector's logs pipeline isn't running
- Loki is down or full
- The service hasn't emitted any logs since the last Loki retention cycle
Diagnosis¶
1. Verify the service is logging to Docker stdout¶
# Check raw Docker logs
sudo docker logs --tail 5 ebit-<service>
# Should see JSON lines with "body", "traceid", "spanid" keys (from EvoLogger/winston)
# or pino-style JSON with "level", "time", "msg" keys
2. Check the filelog receiver¶
# Verify the receiver is watching files
sudo docker logs ebit-otel-collector 2>&1 | grep "filelog"
# Should see: "Started watching file" entries
# Check for permission errors
sudo docker logs ebit-otel-collector 2>&1 | grep "permission denied"
# If present: the collector needs user: "0:0" in docker-compose.yml
# This was fixed in task #28 — verify it's in place:
grep 'user:' docker-compose.yml | head -1
# Expected: user: "0:0" under the otel-collector service
3. Check the OTel logs pipeline¶
# Verify Loki exporter is configured
grep -A5 "loki:" observability/otel-collector.yml
# Expected: endpoint: http://loki:3100/loki/api/v1/push
# Verify logs pipeline includes filelog
grep -A3 "logs:" observability/otel-collector.yml
# Expected: receivers: [otlp, filelog/docker]
4. Check Loki health¶
curl -s http://localhost:3100/ready
# Expected: "ready"
curl -s http://localhost:3100/loki/api/v1/labels | python3 -m json.tool
# Should list: job, level, service_name
5. For bj/bo specifically¶
ebit-bj and ebit-bo apps originally didn't have the pre-otel.main.ts import chain that api/rt/speed-roulette use. They bootstrap OTel via NODE_OPTIONS: "--require @opentelemetry/auto-instrumentations-node/register" in docker-compose.yml. If this line is missing, they won't emit OTLP logs — but the filelog receiver still picks up their Docker stdout.
# Check NODE_OPTIONS is set
sudo docker exec ebit-bj env | grep NODE_OPTIONS
sudo docker exec ebit-bo env | grep NODE_OPTIONS
Fix¶
| Cause | Fix |
|---|---|
| Permission denied on Docker logs | Add user: "0:0" to otel-collector service in docker-compose.yml, then sudo docker compose up -d --force-recreate otel-collector |
| filelog receiver not in pipeline | Add filelog/docker to logs.receivers in observability/otel-collector.yml |
| Loki unhealthy | sudo docker compose restart loki — check observability/loki.yml for invalid config fields |
| Service not logging | Verify nestjs-pino or EvoLogger is wired in the app's app.module.ts |
| No recent logs | Generate traffic: curl http://localhost:400X/health (where X = service port) |
Trace-ID pivot not working¶
If logs appear in Loki but the TraceID link to Jaeger doesn't work:
- Check the field name — EvoLogger emits
"traceid"(camelCase, no underscore), not"trace_id" - Verify the Loki datasource derivedFields regex matches:
'"traceid":"(\w+)"' - Check
observability/grafana/provisioning/datasources/datasources.yml— thedatasourceUidmust bejaeger
Prevention¶
- After adding any new NestJS app, verify it appears in
curl -s 'http://localhost:3100/loki/api/v1/label/service_name/values' - The filelog receiver auto-discovers all Docker containers — no per-service config needed
- Keep
user: "0:0"on the collector to avoid future permission issues