Observability¶
Local stack for traces, metrics, and logs. Everything runs out of the root compose.yml alongside the apps.
| Component | Port | Role |
|---|---|---|
| otel-collector | 4317 (gRPC), 4318 (HTTP) | OTLP ingress gateway |
| jaeger | 16686 | Trace UI |
| prometheus | 9090 | Metrics TSDB |
| loki | 3100 | Log store |
| grafana | 3003 | Unified UI (admin/grafana) |
Config lives in observability/: otel-collector.yml, loki.yml, prometheus.yml, grafana/ (provisioned datasources + dashboards).
How traces are produced¶
All five NestJS apps (api, rt, bj, bo, speed-roulette) share libs/shared/src/basic/pre/pre-otel.main.ts, which is imported at the top of every main.ts before Nest bootstraps. It initializes @opentelemetry/sdk-node with:
getNodeAutoInstrumentations()— covers http, express, nestjs-core, ioredis, pg, bullmq, winston, and more.fsanddnsare disabled to keep spans readable.new PrismaInstrumentation()— Prisma spans (model + method as span attrs).
The SDK exports via OTLP HTTP to OTEL_EXPORTER_OTLP_ENDPOINT (set to http://otel-collector:4318 in compose). OTEL_SERVICE_NAME identifies the service in Jaeger.
ebit-fe (Next.js) exports browser + server spans via @vercel/otel with propagateContextUrls covering the ebit-api base URL so traceparent is forwarded across the FE→API hop.
How traces correlate to logs¶
Every log record carries trace_id/span_id/trace_flags matching the active OTel span, so a Jaeger trace can pivot into Loki with {service_name="ebit-api"} |= "<trace_id>" and vice-versa.
Mixed stack: pino for framework, winston for app-code facade¶
All five ebit-api services run a two-logger setup:
- nestjs-pino is the Nest framework logger. It captures Nest lifecycle output and every HTTP request/response. Records are JSON on stdout, bridged into OTel's logs API by
@opentelemetry/instrumentation-pino, and exported via OTLP to the collector. These are the records that land in Loki. @bebkovan/server-core'sEvoLoggerfacade still backs the ~40 app-code call sites that already useEvoLogger.log/debug/error(...). It writes to winston.WinstonInstrumentation(enabled by default ingetNodeAutoInstrumentations) injects the sametrace_id/span_id/trace_flagsat the winston transport layer. Those records go to docker stdout only — no filelog receiver scrapes them, so they don't reach Loki today.
Pino is the canonical 2025+ OTel log-correlation recipe — pino's @opentelemetry/instrumentation-pino bridges records into the logs SDK so OTLP export is free. Winston's equivalent doesn't exist in a stable form. We kept EvoLogger (winston) for the existing call sites because a mass rewrite would touch 40+ files for no gain — the records are still trace-tagged, just confined to stdout.
The wiring (NestJS)¶
Shared helper in libs/shared/src/logger/pino-logger.module.ts:
import { LoggerModule } from 'nestjs-pino';
export class NestLoggerModule {
static forRoot(opts: { serviceName: string; level?: string }) {
return LoggerModule.forRoot({
pinoHttp: { name: opts.serviceName, level: opts.level ?? 'info', autoLogging: true },
});
}
}
Every app.module.ts imports it:
imports: [
EnvConfigModule,
NestLoggerModule.forRoot({ serviceName: `${Project.Name}-api` }), // before EvoLoggerModule
MetricsModule,
EvoLoggerModule.forRoot({ winston: {...}, ... }), // unchanged, still registered
...
]
libs/shared/src/basic/base.main.ts swaps the Nest framework logger to pino after NestFactory.create(...):
libs/shared/src/basic/pre/pre-otel.main.ts registers pino instrumentation explicitly so the logKeys field names are under our control:
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-pino': { enabled: false }, // we register one below
}),
new PinoInstrumentation({
logKeys: { traceId: 'trace_id', spanId: 'span_id', traceFlags: 'trace_flags' },
}),
WinstonInstrumentation stays in the default-enabled set, so EvoLogger records still carry the same three fields on docker stdout.
The wiring (Next.js)¶
ebit-fe/src/lib/log.ts is a thin structured-logger wrapper:
import { trace } from '@opentelemetry/api';
const emit = (level, msg, fields) => {
const ctx = trace.getActiveSpan()?.spanContext();
const record = {
time: new Date().toISOString(), level, msg,
service: process.env.OTEL_SERVICE_NAME ?? 'ebit-fe',
trace_id: ctx?.traceId, span_id: ctx?.spanId, trace_flags: ctx?.traceFlags,
...fields,
};
(level === 'error' ? console.error : console.log)(JSON.stringify(record));
};
We wire this only at proof points (e.g., src/app/api/auth/cookies/route.ts), not repo-wide — Next.js's own request logs don't need trace_id for our current use cases.
Validating correlation¶
- Run the sign-in E2E (
ebit-fe/e2e/), grab the FEtraceparentfrom browser devtools or the Jaeger search UI. - Find the trace in Jaeger: http://localhost:16686/search?service=ebit-api — click through to the trace detail. The root span's
trace_idis the anchor. - Query Loki for the same ID:
curl -sG 'http://localhost:3100/loki/api/v1/query_range' \
--data-urlencode 'query={service_name="ebit-api"} |= "<trace_id>"' \
--data-urlencode 'start='$(date -d '10 minutes ago' +%s%N) \
--data-urlencode 'end='$(date +%s%N) | jq '.data.result[].values'
Expect: at least one log line per service that participated in the trace.
- In Grafana Explore (http://localhost:3003, admin/grafana), choose the Loki datasource, paste the same LogQL, and click a matching line. The provisioned derivedFields config renders a "View trace" button that links back to Jaeger.
Dashboards (provisioned as code)¶
Lives under observability/grafana/provisioning/dashboards/:
| File | Purpose |
|---|---|
service-overview.json |
RED per service (rate / error / duration) from the spanmetrics connector |
bullmq.json |
Queue depth + processing rate across all BullMQ queues (sessions, bets, bots, leaderboard, promo, rakeback, skindeck, SpeedRoulette*) |
redis.json |
ioredis ops + latency, split cache vs bot Redis |
prisma-postgres.json |
Prisma model/method heatmap + Postgres top slow queries + connection pool saturation |
browser-rum.json |
Web Vitals p75 / p95 from ebit-fe (@vercel/otel browser export) |
logs-trace-pivot.json |
Loki search with derivedFields "View trace" link back to Jaeger |
perf-test.json |
k6 custom metrics, threshold status, run-vs-baseline (during perf runs) |
perf-system.json |
Host metrics from node_exporter (CPU, mem, disk, net) |
What produces what¶
| Produces spans | Transport | Notes |
|---|---|---|
| Every NestJS app | OTLP HTTP → otel-collector :4318 | @opentelemetry/sdk-node + PrismaInstrumentation + auto-instrumentations (http/express/nestjs/ioredis/pg/bullmq/winston) |
| ebit-fe browser + server | OTLP via @vercel/otel |
Requires propagateContextUrls covering ebit-api base URL |
| ebit-admin-fe | none (Vite SPA — no SSR; browser-only traces if any) | Migrated from Next.js; AF-1 still applies for cookie/header propagation |
| Inter-Nest-app RPC | — | Redis pub/sub transport does NOT propagate traceparent; callee produces orphan traces (AF-2 in weaknesses-register.md) |
Known sharp edges¶
EvoLogger.log(...)records now reach Loki via thefilelog/dockerreceiver inobservability/otel-collector.yml, which scrapes Docker container JSON logs from/var/lib/docker/containers/. These records carry asource=docker_filelogresource attribute so they are distinguishable from OTLP-bridged pino records. Query EvoLogger-only records in Loki:{source="docker_filelog"} |= "EvoLogger". Pino records still arrive via both OTLP (primary, with trace correlation) and filelog (secondary, withoutservice.nameresource); prefer the OTLP path ({service_name="ebit-api"}) for trace-correlated queries.@opentelemetry/resources@1.30exportsResource(class constructor), notresourceFromAttributes— that helper arrived in 2.x.pre-otel.main.tsuses the class form.- Next.js
@vercel/otelneedspropagateContextUrlsset to the ebit-api base URL ortraceparentwon't propagate across the FE→API fetch boundary. See the FE'sinstrumentation.ts. - bj/bo bootstrap their own
NestFactory.createinapps/bj/src/main.tsandapps/bo/src/main.tsinstead of going throughcreateNestApp— they mustimport '@app/shared/basic/pre-imports'as the first line (to boot the OTel SDK) and callapp.useLogger(app.get(Logger))(nestjs-pino) explicitly.