Debugging is straightforward when you run a single AI agent. You read the log, you see what happened.
When you run five agents in a swarm, each making their own tool calls and producing their own output, “read the log” ceases to be a strategy.
I made Claude Forge As an adversarial multi-agent coding framework on top of CloudCode. A typical run produces a planner, an implementer, an evaluator, and a fixer. They review each other’s work and bounce back if it fails quality checks.
But when something went wrong, I had timestamps and text dumps but no way to see which agent was responsible, how long it took, or where the tokens went.
Jaeger fixed it. This article covers setting up Jaeger v2 with Docker, wiring it into a multi-agent system via OpenTelemetry, and what I learned along the way.
Table of Contents
What is distributed tracing?
Distributed tracing tracks a single operation as it passes through multiple services. A span is a unit of work that has start time, end time, and key value attributes. Nests in parent child trees. A tree is one trace per operation.
Microservices people already know this pattern: follow HTTP request from gateway through auth, database and cache. The same idea works for multi-agent AI. Process a swarm request from the orchestrator through each subagent and its tool calls.
OpenTelemetry (OTel) is the standard. It gives you SDKs to build spans and send them to OTLP. Jaeger receives that data and presents it as a searchable timeline.
Why Jaeger v2?
Jaeger started with Uber and graduated as a CNCF project in 2019. v1 end of life in December 2025. v2 is the current release, built on the OpenTelemetry Collector framework. Single Binary: Collector, Query Service, and UI. It speaks OTLP natively on ports 4317 (gRPC) and 4318 (HTTP). A separate collector is not required for local operation.
One important difference from v1: Configuration moved from CLI flags and environment variables to a YAML file. old -e SPAN_STORAGE_TYPE=badger env vars are silently ignored in v2. The container starts fine but falls back to in-memory storage. I lost two days worth of marks before I noticed. More on the correct setup below.
Conditions
Docker Installed and running.
Claude Code installed
Python 3.8+ For the tracing hook.
Claude Forge or for any other multi-agent system-to-device.
Installing Docker on Debian
If you already have Docker, skip it. macOS and Windows users can use Docker Desktop. On Debian:
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb (arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc) \
\
\((. /etc/os-release && echo "\)VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Ubuntu users: Change both. linux/debian URL with linux/ubuntu.
Setting up Jaeger v2
Basic run
For a quick check without any continuation:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/jaeger:2.17.0
Port 16686 is UI. Port 4317 is the OTLP/gRPC entry. Port 4318 is OTLP/HTTP. Remove the container and your marks are gone.
Permanent stocking with badgers
v2 reads configuration from a YAML file, not from environment variables. Save it as ~/.local/share/jaeger/config.yaml:
service:
extensions: (jaeger_storage, jaeger_query, healthcheckv2)
pipelines:
traces:
receivers: (otlp)
processors: (batch)
exporters: (jaeger_storage_exporter)
extensions:
healthcheckv2:
use_v2: true
http: { endpoint: 0.0.0.0:13133 }
jaeger_query:
storage: { traces: main_store }
jaeger_storage:
backends:
main_store:
badger:
directories: { keys: /badger/key, values: /badger/data }
ephemeral: false
ttl: { spans: 720h }
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch:
exporters:
jaeger_storage_exporter:
trace_storage: main_store
The Jaeger container runs as UID 10001. Docker assigns the default name to the skin property without first fixing permissions, causing the container to crash. mkdir /badger/key: permission denied.
Initialize the volume and fix the ownership:
docker volume create jaeger-data
docker run --rm \
-v jaeger-data:/badger \
alpine sh -c "mkdir -p /badger/data /badger/key && chown -R 10001:10001 /badger"
Then run Jaeger with the installed configuration:
docker run -d --name jaeger \
--restart unless-stopped \
-v ~/.local/share/jaeger/config.yaml:/etc/jaeger/config.yaml:ro \
-v jaeger-data:/badger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/jaeger:2.17.0 \
--config /etc/jaeger/config.yaml
Verify persistence by running. docker restart jaeger And the pre-recorded trace has yet to be verified. hit And you should see the UI.
Setting up cloud forge tracing
Installing Cloud Forge
Install it via Cloudcode Plugin Marketplace:
/plugin marketplace add hatmanstack/claude-forge
/plugin install forge@claude-forge
/reload-plugins
Install opens a TUI to verify the scope and settings. After reloading, use the commands forge: The former (for example, /forge:pipeline).
You can also clone from the repo. GitHub.
Installing the tracing hook
From your target project directory, run the install script. To install the plugin:
cd your-project
forge-trace # if you set up the alias from the README
# or, without the alias:
bash "$(find ~/.claude -path '*/forge*' -name install-tracing.sh 2>/dev/null | head -1)"
To install the clone:
cd your-project
bash /path/to/claude-forge/bin/install-tracing.sh
The script creates a dedicated venv at. ~/.local/share/claude-forge/venv (prefers uvreturns to python3 -m venv), installs the OpenTelemetry packages, copies the hook into place, merges the hook entries into .claude/settings.local.jsonand self-test against the OTLP endpoint.
pass --no-settings To skip merging settings, or --uninstall to end everything.
Opting in
Add to your shell init and restart your terminal:
export CLAUDE_FORGE_TRACING=1
Restart cloudcode, run /pipelinethen check. for claude-forge Service
Understanding the span model
Here’s what the rankings look like for a typical sheep race:
session: "implement login form with OAuth" <- root span
├── subagent:planner
│ ├── tool:Write (Phase-0.md) <- mutation spans (on by default)
│ ├── tool:Write (Phase-1.md)
│ └── subagent_result:planner <- duration, token counts, output
├── subagent:implementer
│ ├── tool:Edit (src/auth.ts)
│ ├── tool:Bash (npm test)
│ ├── tool:Write (src/oauth.ts)
│ └── subagent_result:implementer
├── subagent:reviewer
│ └── subagent_result:reviewer
└── session_complete <- session totals
The name of the root span comes from the first line of your prompt. Search tokens by what you requested, not by UUID.
Subagents get an anchor span on start and a result span on completion. The result contains duration, token count, prompt and output.
Three levels of detail
Not all internal tool calls are equally interesting. Write, edit, multi-edit, and bash are variable: smaller in number, more signal. They tell you what has actually changed. Read, glob, grp, and webfetch are navigation: lots of them, mostly noise.
Tracing captures mutations by default. That middle ground turns out to be right. Before this change, you either saw nothing within subagents or you saw 200+ spans per run.
| Mode | Sub-Agents | Variations (write/modify/bash) | Other internal tools |
|---|---|---|---|
| Fixed | yes | yes | No |
CLAUDE_FORGE_TRACE_INNER=1 | yes | yes | Yes (minus blocklist) |
CLAUDE_FORGE_TRACE_MUTATIONS=0 | yes | No | No (or per internal) |
Spain attributes
But session_complete: session.tokens.input, session.tokens.output, session.tokens.total, session.tokens.turns, session.duration_ms, user.prompt (previously 2KB).
But subagent_result: agent.description, agent.prompt, agent.output, agent.duration_ms, agent.is_error, agent.tokens.input, agent.tokens.output.
But tool:*: tool.name, tool.input, tool.output, tool.duration_ms, tool.is_error.
Building a Multi-Agent Swarm Tool
Hook architecture
Cloudcode has lifecycle hooks that fire scripts on specific events. Here are four things:
UserPromptSubmit (Make the root term)
PreToolUse (start a break)
Use the post tool. (Finish it with results), and
stop (Finalize the trace). Each hook receives a JSON payload on stdin and runs as a subprocess.
Sending Spans with OpenTelemetry
Here are some minimal pythons to get span in Jaeger:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
resource = Resource.create({"service.name": "my-agent-system"})
exporter = OTLPSpanExporter(endpoint=" insecure=True)
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("agent-tracer")
with tracer.start_as_current_span("my-agent-task") as span:
span.set_attribute("agent.name", "planner")
span.set_attribute("agent.tokens.input", 1500)
span.set_attribute("agent.tokens.output", 800)
Refresh. localhost:16686Select your service, click “Find Traces”.
Coordinating pre and post events
You need to match each PreToolUse to its PostToolUse. Agent-type tool calls did not include a. tool_use_id In the payload, I hashed the tool name and input instead. Pre and post are the same. tool_inputSo the hash line up.
import hashlib, json
def correlation_key(tool_name: str, tool_input: dict) -> str:
content = json.dumps({"tool": tool_name, "input": tool_input}, sort_keys=True)
return hashlib.sha1(content.encode()).hexdigest()(:16)
Statewide applications
Each hook call is a separate process. There is no shared memory. So I wrote span contexts to JSON files on pre and read them back on post:
/tmp/claude-forge-tracing//
├── _root.json # trace ID, root span context
├── _session_start_ns.json # timestamp for duration calculation
├── subagent_.json # per-subagent span context
└── tool_.json # per-tool span context
Filenames are cleared against path traversal. _safe_name() Removes everything outside. (A-Za-z0-9._-) and returns a SHA1 slug.
Flushing without blocking
try:
provider.force_flush(timeout_millis=1000)
except Exception:
pass # Never block the swarm
I tried 2000ms first and the congestion felt slow. 100ms timeout on cold TLS connections. 1000ms worked. If the Jagger is down, the crowd keeps running regardless.
Viewing symbols in the Jaeger UI
open . choose claude-forge From the Service drop-down. Click on “Find Traces”.
Trace search filters by operation name, tags and time range. Since the session span takes its name from your prompt, searching for “login form” returns the runs where you asked for one.
The timeline view is where I spend most of my time. Each span is a horizontal bar, surrounded by parent-child relationships. I can see that the planner took 12 seconds, the implementer 45, the reviewer 8. Click any bar to view token count, prompt, output, error status.
A trace comparison puts two runs side by side. Good for figuring out why one run succeeded and another didn’t.
Lessons from the trenches
One trace per swarm, not per subagent: My first version cleared the route span’s state file on every stop event, so each subagent started a new trace. I changed the stop to mark the timestamp while preserving the root.
Use specifications, not type names: Subagents all report their own type. general-purpose. The detail field is where the actual character resides.
Token attributes require per-agent transcripts: Cloud code writes subscripts. ~/.claude/projects/. Match them through. agent-*.meta.json.
Parse boolean env vars explicitly: bool("0") is in Python True. Use the allow list: {"1", "true", "yes", "on"}.
Environment variable reference
| variable | Purpose |
|---|---|
CLAUDE_FORGE_TRACING=1 | Master opt-in. Hook is not an option without it. |
CLAUDE_FORGE_TRACE_MUTATIONS=0 | Disable the default mutation span (write/edit/bash). On by default. |
CLAUDE_FORGE_TRACE_INNER=1 | Capture all internal tool calls as child spans (off by default). |
CLAUDE_FORGE_TRACE_TOOL_BLOCKLIST | Comma-separated tools to skip when internal tracing is on. Default Read,Glob,Grep,TodoWrite,NotebookRead. |
CLAUDE_FORGE_HOOK_DEBUG=1 | Enable debug logging of raw hook payloads. Off by default. |
CLAUDE_FORGE_HOOK_DEBUG_LOG | Override the debug log path. Default ~/.cache/claude-forge/hook.log. |
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP/gRPC endpoint. Default http://localhost:4317. |
wrap up
Without visibility into the process, you’re wasting tokens and your time. Each run of a multi-agent swarm costs real money. When an agent fails and tries again, or when a reviewer rejects a job that was close, you’re paying for that blind spot.
Tracing gives you a map. You find out where the failure modes are. You find out which agents are going nowhere and burn tokens. A 45-second execution run can be 10 seconds with a well-planned prompt. But you would never know without seeing the defect.
Get an early look. Jaeger and OpenTelemetry make it cheap to set up. Once you can see where things go wrong, you can fix them.
Claude Forge continues to be tracked down. Central branch.