distributed-tracing · March 26, 2026 · 4 min

Tracing AI Agent Calls Across Service Boundaries with OTel Collector and Galileo

demo for showcasing distributed tracing using multiple application services+OTel Collector+Galileo Observability

Alt text

I. Introduction: The Observability Problem in Multi-Service AI Apps

The rise of multi-service architectures — especially with AI agents spanning multiple runtimes
Why logs and metrics alone don’t cut it: the need to follow a request across service boundaries
What distributed tracing gives you: a single trace ID that travels end-to-end
Brief preview of what the demo builds: Python/LangGraph → TypeScript/Express, unified in Galileo

II. The Stack at a Glance

Service A: Python + FastAPI + LangGraph (entry point, orchestrates the agent graph)
Service B: TypeScript + Express (downstream agent logic — LLM calls, tool use)
OTel Collector: the central telemetry hub — receives, batches, and fans out spans
Galileo Observability: the trace visualization and LLM observability backend
Quick architecture diagram walkthrough (the ASCII diagram from the README is blog-ready)

III. How Context Propagation Works (The “Magic” Explained)

What traceparent is and why it matters (W3C Trace Context standard)
How Service A’s opentelemetry-instrumentation-httpx auto-injects the header on outbound calls — no manual code
How Service B’s @opentelemetry/instrumentation-http auto-extracts it on the inbound side
Result: all spans in both services share the same trace_id
Key insight: auto-instrumentation means you don’t touch your business logic

IV. Instrumentation Deep Dive

Service A (Python)

tracing.py: setting up the OTel SDK, configuring the OTLP exporter pointed at the Collector
FastAPI auto-instrumentation: server spans created automatically on each request
LangGraph node (call_ts_service) — how the graph triggers the cross-service call via httpx

Service B (TypeScript)

tracing.ts: SDK initialization, registering HTTP + Express instrumentation
server.ts: Express route receiving the propagated context; USE_REAL_LLM env var read here (server.ts:20) and passed into the agent
agent.ts: the downstream agent logic (invoke_agent → chat → tool → chat cycle)
- Mode branching at agent.ts:106 and agent.ts:194 — mock vs. real LLM path
- In real mode, realLLMCall (agent.ts:10–50) calls OpenAI; tool execution still uses mock (agent.ts:167)
mock-llm.ts: simulated LLM responses — the default run mode (USE_REAL_LLM=false)
- Important note: this is a deliberate design choice, not a placeholder. The demo is fully functional without an OpenAI key.
- USE_REAL_LLM=true swaps in real OpenAI reasoning while keeping tool execution mocked — a useful middle ground for testing

V. The OTel Collector: Glue Between Services and Backend

Why route through a Collector instead of exporting directly to Galileo
- Decoupling: services don’t need to know about the backend
- Batching: batch processor with 5s timeout and 512-span batch size
- Fan-out: same spans go to both Galileo (OTLP/HTTP) and stdout debug
Walking through otel-collector-config.yaml:
- Receivers: OTLP over gRPC (4317) and HTTP (4318)
- Exporters: otlphttp/galileo with API key + project headers; debug for local visibility
- Pipeline: otlp → batch → [galileo, debug]

VI. Running It Locally

Prerequisites: Docker, .env file with Galileo credentials
docker compose up -d — brings up the Collector, Service A, and Service B together
Sending a test request: curl -X POST http://localhost:8000/ask -d '{"question":"Find me a good restaurant"}'
What to look for in Galileo: one unified trace tree spanning both services
Local dev workflow (Collector in Docker, services run natively for fast iteration)

VII. What You See in Galileo

The reconstructed trace: how Galileo assembles spans from two independent services into a single tree
Use the screenshot here — the trace tree from a real end-to-end run
Span hierarchy walkthrough: POST /ask (FastAPI) → POST /ask http send (httpx outbound) → POST (Express inbound) → LangGraph → call_ts_service → format_response
Latency breakdown: total 1.09s, with call_ts_service at 1.07s — immediately shows where time is spent
LangGraph span metadata: langgraph_node, langgraph_path, langgraph_step, langgraph_triggers — OTel carrying framework-level context, not just HTTP spans
Input/Output visibility in the format_response span — the question in, the final formatted answer out
Practical debugging scenarios this unlocks (e.g., “why is this agent call slow?”, “what did the LLM actually receive?”)

VIII. Key Takeaways and Extensions

Auto-instrumentation is your friend: minimal code, maximum coverage
The OTel Collector pattern scales — add more services, more exporters, without changing app code
Polyglot is not a barrier: W3C traceparent is language-agnostic
What to try next:
- Add a third service (Node.js, Go, etc.) and watch the trace tree grow
- Swap the mock LLM for a real model and observe token-level spans
- Add metrics and logs pipelines to the same Collector
- Explore Galileo’s evaluation features on top of the trace data

IX. Conclusion

Distributed tracing across polyglot AI services is achievable with surprisingly little boilerplate
OpenTelemetry’s auto-instrumentation + W3C propagation standards do the heavy lifting
The OTel Collector is the right abstraction layer between your apps and your observability backend
Link to repo for readers to clone and run themselves https://github.com/KazChe/distributed-tracing-otelcollector-galileo

*All opinions are my own Target audience: backend/platform engineers building multi-service AI apps; ~1,500–2,500 words Companion assets: architecture diagram, annotated config snippets, Galileo screenshot

#distributed-tracing#otel-collector#galileo-gen-ai-observability