Datadog published their State of AI report in June. The headline number: 68% of organizations deploying AI agents have no dedicated monitoring for them. They monitor the servers. They monitor the APIs. They don't monitor the agent's decision-making.
This is like monitoring your car's engine but not the steering.
What agent observability means
Traditional observability covers three pillars: logs, metrics, traces. Agent observability needs a fourth: decision traces.
A decision trace records not just what the agent did, but why. Which tools did it consider? What was the confidence score? Why did it pick tool A over tool B? What was the full context window at each step?
Without decision traces, debugging an agent failure means re-running the request and hoping it fails the same way. It usually doesn't, because the model is non-deterministic and the external world has changed.
The minimum viable stack
You don't need a specialized AI observability platform. You need four things.
Correlation IDs end-to-end. From the user's HTTP request through every model call, tool invocation, and response assembly. One ID per user interaction, propagated everywhere.
Structured logs per agent step. Not console.log("calling tool"). A JSON object with: step number, tool name, input payload, output payload, latency, token count, model used, temperature, and whether the result was cached.
Cost tracking per request. Sum the token costs across all model calls in a single agent interaction. Alert when a single request exceeds your p99 cost threshold. That alert will catch prompt injection, infinite loops, and runaway chains before your bill does.
Replay capability. Store the full context window at each decision point. When a user reports a bad result, replay the exact sequence of inputs and decisions to understand what went wrong.
The cost of not having it
We've seen teams spend 40+ engineering hours debugging a single agent failure that proper observability would have surfaced in 5 minutes. At senior engineer rates, that's $8,000-12,000 per incident.
Observability infrastructure that prevents two incidents per quarter pays for itself in the first month.
We instrument what others overlook. If your agents are a black box, let's fix that.