Deploying Agents to Edge Is Harder Than You Think

The pitch is compelling: deploy your agent closer to the user, cut latency, improve experience. Cloudflare Workers, Deno Deploy, Vercel Edge Functions. The platforms are mature. The developer experience is excellent.

The problems start when you try to run a stateful agent on stateless infrastructure.

Edge is stateless

Edge functions are designed for short-lived, stateless computation. An HTTP request arrives, a function runs, a response returns. The function dies. There's no persistent memory, no long-running process, no background threads.

AI agents are inherently stateful. They maintain conversation context, track tool call results, and accumulate a working memory across multiple steps. A typical agent interaction involves 5-12 sequential LLM calls. Each call depends on the results of the previous one.

Running that on edge means either serializing the entire agent state to a database between every step, or keeping the connection alive for the full duration. The first option adds 50-100ms per step in storage latency. The second fights the platform's execution time limits.

The execution time wall

Cloudflare Workers have a 30-second CPU time limit on the paid plan. Deno Deploy has 50 seconds. A complex agent interaction with 8 LLM calls, each taking 1-2 seconds of model inference plus tool execution, can easily exceed these limits.

You can work around this with streaming, sub-requests, and Durable Objects. But at that point you're fighting the platform instead of building on it. The ergonomics that made edge attractive in the first place are gone.

Where edge does work for AI

Routing and classification. A small model or even a fine-tuned classifier can run on edge to determine where a request should go. Fast, stateless, perfect for edge.

Response streaming proxies. Edge functions that proxy model API calls and stream responses to the client. The function doesn't run the model, it just pipes the stream with minimal transformation.

Caching layers. Semantic caching on edge, checking if a similar request was recently answered. Cache hit means an instant response from edge. Cache miss routes to origin for full agent processing.

The pattern: use edge for the fast, stateless parts. Use traditional compute for the stateful orchestration. Don't try to run your agent loop on a platform designed for request-response cycles.

Figuring out where edge fits in your agent architecture? We help with those tradeoffs.