How to Trace Multi-Agent AI Swarms with Jaeger v2

Debugging is straightforward when you run a single AI agent. You read the log, you see what happened.

When you run five agents in a swarm, each making their own tool calls and producing their own output, “read the log” ceases to be a strategy.

I made Claude Forge As an adversarial multi-agent coding framework on top of CloudCode. A typical run produces a planner, an implementer, an evaluator, and a fixer. They review each other’s work and bounce back if it fails quality checks.

But when something went wrong, I had timestamps and text dumps but no way to see which agent was responsible, how long it took, or where the tokens went.

Jaeger fixed it. This article covers setting up Jaeger v2 with Docker, wiring it into a multi-agent system via OpenTelemetry, and what I learned along the way.

What is distributed tracing?

Distributed tracing tracks a single operation as it passes through multiple services. A span is a unit of work that has start time, end time, and key value attributes. Nests in parent child trees. A tree is one trace per operation.

Microservices people already know this pattern: follow HTTP request from gateway through auth, database and cache. The same idea works for multi-agent AI. Process a swarm request from the orchestrator through each subagent and its tool calls.

OpenTelemetry (OTel) is the standard. It gives you SDKs to build spans and send them to OTLP. Jaeger receives that data and presents it as a searchable timeline.

Why Jaeger v2?

Jaeger started with Uber and graduated as a CNCF project in 2019. v1 end of life in December 2025. v2 is the current release, built on the OpenTelemetry Collector framework. Single Binary: Collector, Query Service, and UI. It speaks OTLP natively on ports 4317 (gRPC) and 4318 (HTTP). A separate collector is not required for local operation.

One important difference from v1: Configuration moved from CLI flags and environment variables to a YAML file. old -e SPAN_STORAGE_TYPE=badger env vars are silently ignored in v2. The container starts fine but falls back to in-memory storage. I lost two days worth of marks before I noticed. More on the correct setup below.

Conditions

Docker Installed and running.
Claude Code installed
Python 3.8+ For the tracing hook.
Claude Forge or for any other multi-agent system-to-device.

Installing Docker on Debian

If you already have Docker, skip it. macOS and Windows users can use Docker Desktop. On Debian:

sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL  -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb (arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc) \
   \
  \((. /etc/os-release && echo "\)VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Ubuntu users: Change both. linux/debian URL with linux/ubuntu.

Setting up Jaeger v2

Basic run

For a quick check without any continuation:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/jaeger:2.17.0

Port 16686 is UI. Port 4317 is the OTLP/gRPC entry. Port 4318 is OTLP/HTTP. Remove the container and your marks are gone.

Permanent stocking with badgers

v2 reads configuration from a YAML file, not from environment variables. Save it as ~/.local/share/jaeger/config.yaml:

service:
  extensions: (jaeger_storage, jaeger_query, healthcheckv2)
  pipelines:
    traces:
      receivers: (otlp)
      processors: (batch)
      exporters: (jaeger_storage_exporter)
extensions:
  healthcheckv2:
    use_v2: true
    http: { endpoint: 0.0.0.0:13133 }
  jaeger_query:
    storage: { traces: main_store }
  jaeger_storage:
    backends:
      main_store:
        badger:
          directories: { keys: /badger/key, values: /badger/data }
          ephemeral: false
          ttl: { spans: 720h }
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
processors:
  batch:
exporters:
  jaeger_storage_exporter:
    trace_storage: main_store

The Jaeger container runs as UID 10001. Docker assigns the default name to the skin property without first fixing permissions, causing the container to crash. mkdir /badger/key: permission denied.

Initialize the volume and fix the ownership:

docker volume create jaeger-data

docker run --rm \
  -v jaeger-data:/badger \
  alpine sh -c "mkdir -p /badger/data /badger/key && chown -R 10001:10001 /badger"

Then run Jaeger with the installed configuration:

docker run -d --name jaeger \
  --restart unless-stopped \
  -v ~/.local/share/jaeger/config.yaml:/etc/jaeger/config.yaml:ro \
  -v jaeger-data:/badger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/jaeger:2.17.0 \
  --config /etc/jaeger/config.yaml

Verify persistence by running. docker restart jaeger And the pre-recorded trace has yet to be verified. hit And you should see the UI.

Setting up cloud forge tracing

Installing Cloud Forge

Install it via Cloudcode Plugin Marketplace:

/plugin marketplace add hatmanstack/claude-forge
/plugin install forge@claude-forge
/reload-plugins

Install opens a TUI to verify the scope and settings. After reloading, use the commands forge: The former (for example, /forge:pipeline).

You can also clone from the repo. GitHub.

Installing the tracing hook

From your target project directory, run the install script. To install the plugin:

cd your-project
forge-trace                # if you set up the alias from the README
# or, without the alias:
bash "$(find ~/.claude -path '*/forge*' -name install-tracing.sh 2>/dev/null | head -1)"

To install the clone:

cd your-project
bash /path/to/claude-forge/bin/install-tracing.sh

The script creates a dedicated venv at. ~/.local/share/claude-forge/venv (prefers uvreturns to python3 -m venv), installs the OpenTelemetry packages, copies the hook into place, merges the hook entries into .claude/settings.local.jsonand self-test against the OTLP endpoint.

pass --no-settings To skip merging settings, or --uninstall to end everything.

Opting in

Add to your shell init and restart your terminal:

export CLAUDE_FORGE_TRACING=1

Restart cloudcode, run /pipelinethen check. for claude-forge Service

Understanding the span model

Here’s what the rankings look like for a typical sheep race:

session: "implement login form with OAuth"        <- root span
├── subagent:planner
│   ├── tool:Write  (Phase-0.md)                  <- mutation spans (on by default)
│   ├── tool:Write  (Phase-1.md)
│   └── subagent_result:planner                   <- duration, token counts, output
├── subagent:implementer
│   ├── tool:Edit   (src/auth.ts)
│   ├── tool:Bash   (npm test)
│   ├── tool:Write  (src/oauth.ts)
│   └── subagent_result:implementer
├── subagent:reviewer
│   └── subagent_result:reviewer
└── session_complete                              <- session totals

The name of the root span comes from the first line of your prompt. Search tokens by what you requested, not by UUID.

Subagents get an anchor span on start and a result span on completion. The result contains duration, token count, prompt and output.

Three levels of detail

Not all internal tool calls are equally interesting. Write, edit, multi-edit, and bash are variable: smaller in number, more signal. They tell you what has actually changed. Read, glob, grp, and webfetch are navigation: lots of them, mostly noise.

Tracing captures mutations by default. That middle ground turns out to be right. Before this change, you either saw nothing within subagents or you saw 200+ spans per run.

Mode	Sub-Agents	Variations (write/modify/bash)	Other internal tools
Fixed	yes	yes	No
`CLAUDE_FORGE_TRACE_INNER=1`	yes	yes	Yes (minus blocklist)
`CLAUDE_FORGE_TRACE_MUTATIONS=0`	yes	No	No (or per internal)

Spain attributes

But session_complete: session.tokens.input, session.tokens.output, session.tokens.total, session.tokens.turns, session.duration_ms, user.prompt (previously 2KB).

But subagent_result: agent.description, agent.prompt, agent.output, agent.duration_ms, agent.is_error, agent.tokens.input, agent.tokens.output.

But tool:*: tool.name, tool.input, tool.output, tool.duration_ms, tool.is_error.

Building a Multi-Agent Swarm Tool

Hook architecture

Cloudcode has lifecycle hooks that fire scripts on specific events. Here are four things:

UserPromptSubmit (Make the root term)
PreToolUse (start a break)
Use the post tool. (Finish it with results), and
stop (Finalize the trace). Each hook receives a JSON payload on stdin and runs as a subprocess.

Sending Spans with OpenTelemetry

Here are some minimal pythons to get span in Jaeger:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

resource = Resource.create({"service.name": "my-agent-system"})
exporter = OTLPSpanExporter(endpoint=" insecure=True)
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("agent-tracer")

with tracer.start_as_current_span("my-agent-task") as span:
    span.set_attribute("agent.name", "planner")
    span.set_attribute("agent.tokens.input", 1500)
    span.set_attribute("agent.tokens.output", 800)

Refresh. localhost:16686Select your service, click “Find Traces”.

Coordinating pre and post events

You need to match each PreToolUse to its PostToolUse. Agent-type tool calls did not include a. tool_use_id In the payload, I hashed the tool name and input instead. Pre and post are the same. tool_inputSo the hash line up.

import hashlib, json

def correlation_key(tool_name: str, tool_input: dict) -> str:
    content = json.dumps({"tool": tool_name, "input": tool_input}, sort_keys=True)
    return hashlib.sha1(content.encode()).hexdigest()(:16)

Statewide applications

Each hook call is a separate process. There is no shared memory. So I wrote span contexts to JSON files on pre and read them back on post:

/tmp/claude-forge-tracing//
├── _root.json              # trace ID, root span context
├── _session_start_ns.json  # timestamp for duration calculation
├── subagent_.json    # per-subagent span context
└── tool_.json        # per-tool span context

Filenames are cleared against path traversal. _safe_name() Removes everything outside. (A-Za-z0-9._-) and returns a SHA1 slug.

Flushing without blocking

try:
    provider.force_flush(timeout_millis=1000)
except Exception:
    pass  # Never block the swarm

I tried 2000ms first and the congestion felt slow. 100ms timeout on cold TLS connections. 1000ms worked. If the Jagger is down, the crowd keeps running regardless.

Viewing symbols in the Jaeger UI

open . choose claude-forge From the Service drop-down. Click on “Find Traces”.

Trace search filters by operation name, tags and time range. Since the session span takes its name from your prompt, searching for “login form” returns the runs where you asked for one.

The timeline view is where I spend most of my time. Each span is a horizontal bar, surrounded by parent-child relationships. I can see that the planner took 12 seconds, the implementer 45, the reviewer 8. Click any bar to view token count, prompt, output, error status.

A trace comparison puts two runs side by side. Good for figuring out why one run succeeded and another didn’t.

Lessons from the trenches

One trace per swarm, not per subagent: My first version cleared the route span’s state file on every stop event, so each subagent started a new trace. I changed the stop to mark the timestamp while preserving the root.

Use specifications, not type names: Subagents all report their own type. general-purpose. The detail field is where the actual character resides.

Token attributes require per-agent transcripts: Cloud code writes subscripts. ~/.claude/projects///subagents/agent-*.jsonl. Match them through. agent-*.meta.json.

Parse boolean env vars explicitly: bool("0") is in Python True. Use the allow list: {"1", "true", "yes", "on"}.

Environment variable reference

variable	Purpose
`CLAUDE_FORGE_TRACING=1`	Master opt-in. Hook is not an option without it.
`CLAUDE_FORGE_TRACE_MUTATIONS=0`	Disable the default mutation span (write/edit/bash). On by default.
`CLAUDE_FORGE_TRACE_INNER=1`	Capture all internal tool calls as child spans (off by default).
`CLAUDE_FORGE_TRACE_TOOL_BLOCKLIST`	Comma-separated tools to skip when internal tracing is on. Default `Read,Glob,Grep,TodoWrite,NotebookRead`.
`CLAUDE_FORGE_HOOK_DEBUG=1`	Enable debug logging of raw hook payloads. Off by default.
`CLAUDE_FORGE_HOOK_DEBUG_LOG`	Override the debug log path. Default `~/.cache/claude-forge/hook.log`.
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP/gRPC endpoint. Default `http://localhost:4317`.

wrap up

Without visibility into the process, you’re wasting tokens and your time. Each run of a multi-agent swarm costs real money. When an agent fails and tries again, or when a reviewer rejects a job that was close, you’re paying for that blind spot.

Tracing gives you a map. You find out where the failure modes are. You find out which agents are going nowhere and burn tokens. A 45-second execution run can be 10 seconds with a well-planned prompt. But you would never know without seeing the defect.

Get an early look. Jaeger and OpenTelemetry make it cheap to set up. Once you can see where things go wrong, you can fix them.

Claude Forge continues to be tracked down. Central branch.

Table of Contents

What is distributed tracing?

Why Jaeger v2?

Conditions

Installing Docker on Debian

Setting up Jaeger v2

Basic run

Permanent stocking with badgers

Setting up cloud forge tracing

Installing Cloud Forge

Installing the tracing hook

Opting in

Understanding the span model

Three levels of detail

Spain attributes

Building a Multi-Agent Swarm Tool

Hook architecture

Sending Spans with OpenTelemetry

Coordinating pre and post events

Statewide applications

Flushing without blocking

Viewing symbols in the Jaeger UI

Lessons from the trenches

Environment variable reference

wrap up

Editor's pick

Get latest news

How to Trace Multi-Agent AI Swarms with Jaeger v2

Table of Contents

What is distributed tracing?

Why Jaeger v2?

Conditions

Installing Docker on Debian

Setting up Jaeger v2

Basic run

Permanent stocking with badgers

Setting up cloud forge tracing

Installing Cloud Forge

Installing the tracing hook

Opting in

Understanding the span model

Three levels of detail

Spain attributes

Building a Multi-Agent Swarm Tool

Hook architecture

Sending Spans with OpenTelemetry

Coordinating pre and post events

Statewide applications

Flushing without blocking

Viewing symbols in the Jaeger UI

Lessons from the trenches

Environment variable reference

wrap up

How PaaS helps developers understand production.

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news