Skip to content

ADR-0009 — Jaeger v2 + Badger over Tempo / managed OpenSearch

Status: Accepted Date: 2026-04-25 Author(s): Platform engineering (research in ../audits/jaeger-storage-research.md)

Context

The dev observability VM ran jaegertracing/all-in-one:1.57 with the default in-memory backend. On 2026-04-23 the host OOM-killed: Jaeger consumed 19.2 GB RAM under perf-test-shaped span volume and breached the 16 GB host limit. The default MEMORY_MAX_TRACES=0 means "unbounded" — this is documented behaviour, not a Jaeger bug.

Two pressures forced a decision:

  1. The OOM is recurring. The next perf-test ramp (1k → 10k VUs over 42 min, peak ~50k spans/sec) will reproduce the failure on any host smaller than ~32 GB, which is bigger than the budget for the observability VM (c7g.xlarge, 8 GB RAM, eu-north-1).
  2. Jaeger v1 reached EOL on 2025-12-31 (CNCF blog 2024-11-12, issue #6321). Migrating to v2 is mandatory regardless of the OOM. v2 also rebuilds Jaeger on the OpenTelemetry Collector core, simplifying the pipeline.

The team is two engineers; ops capacity is the dominant constraint. Forensic retention requirement: 24–72 h post-test trace replay for incident analysis. No multi-tenancy, no global search across regions.

Decision

  1. Switch to jaegertracing/jaeger:2.17 with the Badger backend on the observability VM.
  2. Mount Badger storage on a 50 GB gp3 EBS volume, paths /var/lib/jaeger/keys and /var/lib/jaeger/values.
  3. TTL the spans at 72h (extensions.jaeger_storage.backends.badger_main.badger.ttl.spans: 72h per terraform/modules/monitoring/jaeger-v2-config.yaml.tftpl).
  4. Cap RAM with GOMEMLIMIT=1500MiB (and a Docker mem_limit of 2 GB) so the LSM compactor never exceeds the host budget.
  5. Set ephemeral: false so storage survives container restarts.
  6. Tempo single-binary on local backend is the documented fallback, picked only if Badger sweats under high-cardinality writes (Jaeger #2987). The team is less familiar with TraceQL, so we accept the migration cost only if forced.

The Terraform user-data renders the v2 config from jaeger-v2-config.yaml.tftpl and starts the container with GOMEMLIMIT set; see terraform/modules/monitoring/main.tf for the docker_compose definition.

Considered alternatives

A. Stay on Jaeger v1 in-memory, set MEMORY_MAX_TRACES

The minimum-effort path: cap the ring buffer at e.g. 100 000 traces, bypass the OOM. Rejected because v1 is EOL and we need to migrate anyway; doing it now is cheaper than splitting the work.

B. Self-hosted Elasticsearch single-node

Mature backend, well-supported by Jaeger v2. Rejected because ES needs ≥ 4 GB JVM heap on its own — c7g.xlarge's 8 GB RAM cannot fit ES + the OTel collector + Prometheus + Grafana + Loki on the same box. Splitting onto a dedicated ES VM doubles the cost. JVM tuning, snapshot scripts, version upgrades — operational cost not justified for a 2-person team.

C. AWS managed OpenSearch (t3.medium.search)

Removes the ops burden. Jaeger v2 has first-class OpenSearch support. Rejected on cost: ~$60–80/month for the smallest viable instance plus EBS, against a $0 baseline for Badger on the same box. The forensic-only use case doesn't justify the spend.

D. Self-hosted single-node Cassandra

Jaeger's most mature backend. Rejected decisively: Cassandra ops cost is the highest of any considered option. Two engineers cannot operate Cassandra. JVM heap + repair scheduling + tombstone management — out of scope.

E. AWS Keyspaces (managed Cassandra)

Removes ops burden of self-hosted Cassandra. Rejected on cost: per-request pricing means ~$1.45 / million WRU × 1 B writes ≈ $1.5k for the perf-test ingest alone. Forensic use does not justify the spend.

F. Grafana Tempo single-binary, backend: local

Purpose-built for high-throughput trace ingest; Parquet block storage compresses well. Rejected for now, kept as fallback. Two reasons:

  1. Loses the Jaeger UI — search happens in Grafana via TraceQL, which the team is less familiar with than Jaeger's text-search.
  2. Badger is sufficient for our throughput; Tempo's higher-throughput strengths are wasted at our scale.

If Badger sweats under perf-test ingest, Tempo replaces it with the same config-only-change posture (still OTLP-on-4318 ingest from the OTel collector).

Consequences

Capacity & retention

  • Disk: 50 GB gp3 EBS holds ~30–60 GB of compressed Badger blocks for 72 h of spans (depends on tail-sampling rate from ADR-0012). Cost: ~$4.64/month.
  • RAM: capped at 1.5 GB Go heap + ~256 MB Badger working set; well under the host's 8 GB.
  • TTL: spans automatically purge after 72 h. No retention dial available below that without code changes.

Operations

  • Single-node only. No HA. If the observability VM dies, traces older than 72 h are lost (acceptable — they were going to expire anyway). Newer traces are lost from the gap window. Mitigation: snapshot the EBS volume nightly via Terraform.
  • Healthcheck path is /status in v2 (not / as in v1). Existing probes that point at / permanently report unhealthy. Already corrected in jaeger-v2-config.yaml.tftpl (healthcheckv2.use_v2: true); flagged in docs/engineering/observability-runbook.md §8.
  • Badger compaction is asynchronous; sustained ingest can briefly raise disk usage above the steady-state mean. Mitigation: 50 GB volume gives ~3× headroom over peak.

Observability impact

  • The OTel collector pipeline is unchanged: still OTLP gRPC :4317 + HTTP :4318 → batch → spanmetrics + jaeger_storage_exporter.
  • Spanmetrics-derived metrics (ADR-0002) continue to flow into Prometheus regardless of trace storage.
  • Tail-sampling decisions (ADR-0012) are upstream of Jaeger, so Badger only ingests already-sampled traces.

Migration & rollback

  • v1 → v2 cutover was a Terraform apply on 2026-04-25. Rollback to v1 not planned (v1 is EOL).
  • If Badger underperforms during a perf-test, swap to Tempo by changing one Terraform variable; the rest of the pipeline stays unchanged.

References