Why Your AI UX Is Broken (and It's Not the Model's Fault) — Mike Christensen, Ably

Direct HTTP streaming for AI chat applications is fundamentally limited to single-client, single-connection patterns; adopting a durable sessions architectur...

By Sean Weldon

Decoupling Client-Agent Connections: A Durable Sessions Architecture for Production AI Applications

Abstract

Contemporary AI chat applications predominantly implement Server-Sent Events (SSE) for streaming language model responses, establishing persistent point-to-point connections between individual clients and agents. This architectural pattern fundamentally constrains production AI experiences by coupling stream health to client connection stability, preventing cross-device synchronization, and limiting real-time agent control. This analysis examines the structural limitations of single-connection paradigms and presents a durable sessions architecture that decouples agent and client layers through persistent, stateful intermediary channels. Through examination of production implementations serving 2+ billion devices and insights from 40+ companies across 10 industries, this work identifies three critical capabilities enabled by durable sessions: resilient delivery across network disruptions, continuity across multiple surfaces, and bidirectional control for live agent steering. The analysis demonstrates that pub/sub-based durable sessions significantly simplify multi-agent architectures while supporting concurrent activity and seamless reconnection without additional agent logic.

1. Introduction

The rapid adoption of AI-powered conversational interfaces has established a dominant architectural pattern characterized by direct HTTP streaming via Server-Sent Events (SSE). Popular frameworks including Vercel AI SDK and similar toolkits implement SSE as their default transport mechanism, creating a ubiquitous single-connection paradigm wherein individual clients establish persistent, point-to-point connections to individual agents. While this approach provides sufficient functionality for demonstration purposes, it imposes fundamental structural constraints on the development of robust, production-grade AI applications.

The central limitation of direct HTTP streaming architectures stems from their inherent coupling: stream health is directly tied to end-client connection health, creating a fragile system wherein network disruptions, page refreshes, or device switches result in complete loss of streaming context. Furthermore, the private nature of these point-to-point connections prevents visibility across multiple tabs or devices, fundamentally precluding the development of seamless multi-surface experiences that contemporary users expect from production applications.

This analysis examines the technical constraints imposed by SSE-based architectures and proposes an alternative paradigm based on durable sessions—persistent, stateful intermediaries that decouple agent computation from client connectivity. The investigation establishes that single-connection patterns are structurally incompatible with three foundational capabilities required for production AI experiences: resilient delivery, cross-surface continuity, and live control. Through analysis of production implementations and examination of specific technical trade-offs, this work demonstrates how pub/sub-based durable sessions address these limitations while simplifying multi-agent coordination patterns.

2. Background and Related Work

2.1 Server-Sent Events and Direct HTTP Streaming

Server-Sent Events represent a standardized protocol for unidirectional server-to-client streaming over HTTP. SSE establishes a persistent connection maintained by the client through which servers push data in real-time. This pattern has achieved widespread adoption in AI applications due to its simplicity, native browser support, and straightforward implementation model. The protocol enables real-time streaming of language model token generation without the complexity of WebSocket implementations, making it an attractive default choice for AI framework developers.

2.2 Publish-Subscribe Messaging Patterns

The publish-subscribe pattern provides an alternative communication model wherein publishers and subscribers interact through intermediary message brokers rather than maintaining direct connections. This architectural approach decouples message producers from consumers, enabling asynchronous communication, message persistence, and one-to-many distribution patterns. Pub/sub systems typically provide guarantees around message ordering, delivery semantics, and connection resumability through mechanisms such as sequence numbers on stored events and automatic reconnection protocols. Contemporary implementations leverage in-memory stores such as Redis for buffering events during client disconnections, enabling resumable streams that survive temporary network failures.

3. Core Analysis

3.1 Structural Limitations of Single-Connection Paradigms

The direct HTTP streaming pattern exhibits three fundamental limitations that constrain production AI applications. First, connection health coupling creates fragility: because the live response stream's health is tied to the end client's connection health, any network disruption—mobile network switches, page navigation, or temporary connectivity loss—results in complete stream termination. The dropped connection means lost stream, requiring complete session restart and loss of conversational context.

Second, the private pipe architecture prevents cross-device and cross-tab visibility. Because the connection exists as a point-to-point channel between a single client and single agent, other clients and devices cannot interact with or observe the agent's activity. Opening the same session in a second browser tab provides no visibility of the live response stream from the first tab's request, nor does the second tab possess any upstream channel to send follow-up requests to the agent.

Third, SSE's strictly unidirectional nature creates mutual exclusivity between resumability and live control. The protocol's server-to-client-only communication creates fundamental conflicts in implementing common interaction patterns. For example, implementing a stop button requires the client to signal cancellation upstream, but closing the SSE connection creates ambiguity about whether the agent should halt processing or the client merely disconnected temporarily. As documented in Vercel AI SDK documentation, abort functionality is explicitly incompatible with resume functionality due to these SSE constraints.

3.2 Durable Sessions Architecture Design

The durable sessions architecture addresses these limitations by introducing a persistent, stateful intermediary layer that decouples the agent layer from the client layer. This shared medium between agents and clients enables agents to write events without managing individual client connection health, while simultaneously allowing clients to connect to sessions and resume streams without requiring agents to implement reconnection logic.

The architecture's core mechanism operates through independently addressable channels: any client or agent connects by specifying a channel name, with the channel itself persisting independently of individual connections, devices, or agents. Messages outlive individual connections through persistent storage, while dropped clients automatically reconnect and receive events from the exact point of disconnection through sequence number-based replay mechanisms.

This decoupling enables three critical capabilities. Resilient delivery ensures streams survive disconnections across mobile network switches, page refreshes, and navigation patterns. Continuity across surfaces allows conversation sessions to follow users across devices and tabs with full synchronization including live activity. Live control enables clients to communicate with agents while they work, supporting steering and follow-up requests without terminating the primary stream.

3.3 Bidirectional Transport and Multi-Client Coordination

Transitioning from SSE to bidirectional transports such as WebSockets addresses the unidirectional limitation but alone does not solve multi-device visibility problems. While WebSockets provide bidirectional control enabling richer client-agent interactions—resolving the resume-cancel mutual exclusivity—they maintain the single-connection paradigm unless combined with a durable sessions approach.

The durable sessions pattern enables multi-client synchronization through continuously maintained connections to the session entity itself, rather than connections established only when initiating requests. All clients hold persistent connections to the session, transforming it into a shared resource that allows any client to route to and interact with the agent from any tab or device. This constant visibility mechanism ensures that when a user opens the same session in a second tab, that tab immediately receives the live response stream and possesses full upstream channel access for follow-up requests.

3.4 Multi-Agent Architecture Simplification

Traditional multi-agent implementations employ orchestrator patterns wherein a central agent serves dual purposes: orchestrating task delegation and proxying granular updates from sub-agents. This centralization requires the orchestrator to aggregate and forward all sub-agent progress updates, adding unnecessary architectural complexity and creating a single point of failure for update relay.

Durable sessions fundamentally restructure this pattern by allowing all agents to write independently to the session without requiring a central proxy. Clients subscribe to a single session entity and receive full visibility of all agent activity through the shared channel. This pattern drastically simplifies architecture by eliminating the need for centralized update relay—specialized sub-agents write events directly to the session, with the channel handling multiplexing for concurrent activity automatically.

Production implementations demonstrate this capability through concurrent agent operations: multiple agents working simultaneously (e.g., processing purchase orders while canceling conflicting reservations) maintain full synchronization through the shared session. The architecture additionally supports seamless human agent handoff, wherein new participants are added to sessions with full visibility of AI interaction history without requiring specialized relay logic.

4. Technical Insights

4.1 Implementation Considerations

Production implementations of durable sessions leverage pub/sub infrastructure to materialize the shared session abstraction. The Ably Channels implementation demonstrates this approach through independently addressable channels that serve as shared resources for publisher-subscriber communication. The platform's scale—handling 2+ billion devices monthly, 30+ billion monthly connections, and 2+ trillion API operations—provides empirical validation of the architecture's production viability.

The Ably AI Transport SDK provides a drop-in implementation that automatically integrates with existing event stream formats and model providers. Key technical capabilities include automatic materialization of streamed text chunks into complete responses, built-in resumability without additional agent logic, and automatic handling of multiplexing for concurrent channel activity. The system supports both client-side tool calls (e.g., geolocation requests) and service-side tool calls (e.g., database queries) seamlessly within the durable sessions framework.

4.2 Trade-offs and Limitations

The durable sessions architecture introduces additional infrastructure requirements compared to direct HTTP streaming. Organizations must deploy and maintain pub/sub messaging infrastructure or adopt third-party platforms, increasing operational complexity. The intermediary layer adds latency compared to direct connections, though this overhead is typically negligible relative to language model inference time.

Message persistence requirements introduce storage considerations: systems must buffer events during client disconnections, necessitating decisions about message retention policies and storage backend selection. The analysis notes Redis usage as an in-memory store for event buffering, balancing performance with persistence requirements.

Furthermore, the architecture requires careful attention to sequence number management for maintaining correct event replay order during reconnection scenarios. Implementations must handle edge cases around concurrent client modifications and conflict resolution when multiple clients interact with agents simultaneously.

5. Discussion

The transition from direct HTTP streaming to durable sessions architectures represents a fundamental shift in how AI applications structure client-agent communication. The analysis demonstrates that this shift is not merely an optimization but addresses structural limitations that prevent SSE-based systems from achieving production-grade user experiences. The mutual exclusivity between resume and cancel functionality in SSE systems, explicitly documented in major framework implementations, illustrates how protocol-level constraints propagate to user-facing feature limitations.

The findings have significant implications for organizations building production AI applications. The three foundational capabilities—resilient delivery, cross-surface continuity, and live control—emerge from analysis of 40+ companies across 10 industries as distinguishing factors separating fragile demonstrations from robust product experiences. Organizations continuing to employ direct HTTP streaming patterns may find themselves architecturally constrained as user expectations evolve toward seamless multi-device experiences and sophisticated agent interaction patterns.

The multi-agent architecture simplification enabled by durable sessions addresses a critical complexity challenge in contemporary AI systems. As applications increasingly employ specialized agents for distinct tasks, the traditional orchestrator pattern's requirement for centralized update relay becomes a significant architectural burden. The demonstrated ability for agents to write independently to shared sessions while maintaining full client visibility suggests that pub/sub-based architectures may become increasingly important as multi-agent systems grow in sophistication.

Areas for future investigation include formal analysis of latency characteristics across different durable session implementations, examination of security and privacy implications of persistent session storage, and exploration of optimal message retention policies for different application domains. Additionally, research into conflict resolution strategies for concurrent multi-client agent interactions would provide valuable guidance for production implementations.

6. Conclusion

This analysis has established that direct HTTP streaming via Server-Sent Events imposes fundamental structural limitations on production AI applications through its single-connection paradigm. The coupling of stream health to client connection stability, inability to support cross-device synchronization, and mutual exclusivity between resumability and live control create constraints incompatible with robust user experiences.

The durable sessions architecture addresses these limitations by decoupling agent and client layers through persistent, stateful intermediary channels implemented via pub/sub patterns. This approach enables three critical capabilities: resilient delivery across network disruptions, continuity across multiple surfaces and devices, and bidirectional control for live agent steering. Furthermore, the architecture significantly simplifies multi-agent coordination by eliminating the need for centralized update relay through orchestrator components.

For practitioners developing production AI applications, the key takeaway is that architectural decisions at the transport layer have cascading effects on achievable user experience quality. Organizations should evaluate whether their current streaming implementations can support the resilient, multi-surface, and interactive experiences that distinguish production systems from demonstrations. The pub/sub-based durable sessions pattern, validated through large-scale production deployments, provides a proven architectural foundation for building sophisticated AI applications that meet contemporary user expectations for seamless, resilient interaction across devices and contexts.


Sources


About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub