distributed-tracing · March 26, 2026 · 4 min
Tracing AI Agent Calls Across Service Boundaries with OTel Collector and Galileo
demo for showcasing distributed tracing using multiple application services+OTel Collector+Galileo Observability

I. Introduction: The Observability Problem in Multi-Service AI Apps
- The rise of multi-service architectures — especially with AI agents spanning multiple runtimes
- Why logs and metrics alone don’t cut it: the need to follow a request across service boundaries
- What distributed tracing gives you: a single trace ID that travels end-to-end
- Brief preview of what the demo builds: Python/LangGraph → TypeScript/Express, unified in Galileo
II. The Stack at a Glance
- Service A: Python + FastAPI + LangGraph (entry point, orchestrates the agent graph)
- Service B: TypeScript + Express (downstream agent logic — LLM calls, tool use)
- OTel Collector: the central telemetry hub — receives, batches, and fans out spans
- Galileo Observability: the trace visualization and LLM observability backend
- Quick architecture diagram walkthrough (the ASCII diagram from the README is blog-ready)
III. How Context Propagation Works (The “Magic” Explained)
- What
traceparentis and why it matters (W3C Trace Context standard) - How Service A’s
opentelemetry-instrumentation-httpxauto-injects the header on outbound calls — no manual code - How Service B’s
@opentelemetry/instrumentation-httpauto-extracts it on the inbound side - Result: all spans in both services share the same
trace_id - Key insight: auto-instrumentation means you don’t touch your business logic
IV. Instrumentation Deep Dive
Service A (Python)
tracing.py: setting up the OTel SDK, configuring the OTLP exporter pointed at the Collector- FastAPI auto-instrumentation: server spans created automatically on each request
- LangGraph node (
call_ts_service) — how the graph triggers the cross-service call viahttpx
Service B (TypeScript)
tracing.ts: SDK initialization, registering HTTP + Express instrumentationserver.ts: Express route receiving the propagated context;USE_REAL_LLMenv var read here (server.ts:20) and passed into the agentagent.ts: the downstream agent logic (invoke_agent → chat → tool → chat cycle)- Mode branching at agent.ts:106 and agent.ts:194 — mock vs. real LLM path
- In real mode,
realLLMCall(agent.ts:10–50) calls OpenAI; tool execution still uses mock (agent.ts:167)
mock-llm.ts: simulated LLM responses — the default run mode (USE_REAL_LLM=false)- Important note: this is a deliberate design choice, not a placeholder. The demo is fully functional without an OpenAI key.
USE_REAL_LLM=trueswaps in real OpenAI reasoning while keeping tool execution mocked — a useful middle ground for testing
V. The OTel Collector: Glue Between Services and Backend
- Why route through a Collector instead of exporting directly to Galileo
- Decoupling: services don’t need to know about the backend
- Batching:
batchprocessor with 5s timeout and 512-span batch size - Fan-out: same spans go to both Galileo (OTLP/HTTP) and stdout debug
- Walking through
otel-collector-config.yaml:- Receivers: OTLP over gRPC (4317) and HTTP (4318)
- Exporters:
otlphttp/galileowith API key + project headers;debugfor local visibility - Pipeline:
otlp → batch → [galileo, debug]
VI. Running It Locally
- Prerequisites: Docker,
.envfile with Galileo credentials docker compose up -d— brings up the Collector, Service A, and Service B together- Sending a test request:
curl -X POST http://localhost:8000/ask -d '{"question":"Find me a good restaurant"}' - What to look for in Galileo: one unified trace tree spanning both services
- Local dev workflow (Collector in Docker, services run natively for fast iteration)
VII. What You See in Galileo
- The reconstructed trace: how Galileo assembles spans from two independent services into a single tree
- Use the screenshot here — the trace tree from a real end-to-end run
- Span hierarchy walkthrough:
POST /ask(FastAPI) →POST /ask http send(httpx outbound) →POST(Express inbound) →LangGraph→call_ts_service→format_response - Latency breakdown: total 1.09s, with
call_ts_serviceat 1.07s — immediately shows where time is spent - LangGraph span metadata:
langgraph_node,langgraph_path,langgraph_step,langgraph_triggers— OTel carrying framework-level context, not just HTTP spans - Input/Output visibility in the
format_responsespan — the question in, the final formatted answer out - Practical debugging scenarios this unlocks (e.g., “why is this agent call slow?”, “what did the LLM actually receive?”)
VIII. Key Takeaways and Extensions
- Auto-instrumentation is your friend: minimal code, maximum coverage
- The OTel Collector pattern scales — add more services, more exporters, without changing app code
- Polyglot is not a barrier: W3C
traceparentis language-agnostic - What to try next:
- Add a third service (Node.js, Go, etc.) and watch the trace tree grow
- Swap the mock LLM for a real model and observe token-level spans
- Add metrics and logs pipelines to the same Collector
- Explore Galileo’s evaluation features on top of the trace data
IX. Conclusion
- Distributed tracing across polyglot AI services is achievable with surprisingly little boilerplate
- OpenTelemetry’s auto-instrumentation + W3C propagation standards do the heavy lifting
- The OTel Collector is the right abstraction layer between your apps and your observability backend
- Link to repo for readers to clone and run themselves https://github.com/KazChe/distributed-tracing-otelcollector-galileo
*All opinions are my own Target audience: backend/platform engineers building multi-service AI apps; ~1,500–2,500 words Companion assets: architecture diagram, annotated config snippets, Galileo screenshot
#distributed-tracing#otel-collector#galileo-gen-ai-observability
Read next →
Leveraging Graph Databases for AI Memory with Neo4j and Mastra AI
Learn how to integrate Neo4j graph database with Mastra AI memory system. Complete guide with code examples, architecture patterns, and implementation tips