How we solved Context Management in Agents — Sally-Ann Delucia

Context management, not prompt engineering, is the critical factor determining AI agent success, requiring strategic decisions about what data agents see rat...

2026-05-15 By Sean Weldon

Abstract

This synthesis examines the paradigm shift from prompt engineering to context engineering in production AI agent systems, demonstrating that strategic context management—rather than prompt optimization—constitutes the primary determinant of agent reliability. Through analysis of production implementations in observability platforms, this research identifies critical failure modes in naive context handling approaches and presents a three-component solution architecture: smart truncation with memory separation, sub-agent delegation for data-intensive operations, and long-session evaluation methodologies. Empirical observations reveal that agents fail predominantly due to context mismanagement rather than prompt deficiencies, particularly in conversations exceeding 10 turns. The proposed framework employs selective context retention (first and last 100 characters), memory store integration, and architectural separation of concerns to maintain performance across growing conversation lengths. These findings have significant implications for production AI systems handling complex data structures and extended multi-turn interactions.

1. Introduction

The deployment of AI agents in production environments has revealed fundamental limitations in traditional prompt engineering paradigms. While early agent development concentrated extensively on prompt optimization—including instruction clarity, few-shot examples, and structural refinement—operational experience demonstrates that context engineering has emerged as the primary factor determining agent reliability and performance. Context engineering, defined as the strategic selection and management of information presented to language models, represents a qualitatively different challenge than prompt construction.

This synthesis examines the evolution from prompt-centric to context-centric agent design through analysis of production implementations, specifically focusing on challenges encountered in observability platforms where agents must process exponentially growing trace and span data. The central thesis posits that context management represents not merely an engineering challenge but a product and user experience problem requiring architectural solutions beyond simple token limit compliance. As one practitioner observed, "Agents don't fail because of prompts, they fail because of context."

The analysis proceeds through examination of failure modes in naive context handling, presentation of a three-component solution architecture, and discussion of evaluation methodologies for long-session agent interactions. Technical insights focus on practical implementation strategies that have demonstrated stability in production environments over multiple months, with particular attention to the trade-offs between context completeness and agent performance.

2. Background and Related Work

2.1 The Observability Context Problem

Observability platforms generate inherently complex data structures that create unique challenges for context management. A single trace encompasses user inputs, system prompts, model responses, and extensive metadata. When agents analyze multiple traces across conversation turns, context grows exponentially rather than linearly. This data structure creates a recursive constraint: the system analyzing observability data becomes limited by the volume and complexity of that data itself, creating what can be termed a vicious loop problem.

2.2 Traditional Context Management Approaches

Early context management strategies employed two primary approaches: naive truncation and summarization. Naive truncation typically retained initial portions of context (commonly the first 100 characters) while discarding remainder, operating under the assumption that early information holds greater importance. Summarization delegated context reduction to language models themselves, allowing models to determine information salience. However, both approaches demonstrated critical limitations in production environments. Naive truncation broke agent reasoning and caused memory loss in follow-up questions, while summarization proved too inconsistent, providing no control over what the language model deemed important. These failures necessitated development of more sophisticated context management architectures.

3. Core Analysis

3.1 The Three-Part Context Escape Strategy

The solution architecture comprises three interconnected components: context control, context-memory separation, and sub-agent delegation. This three-part escape strategy addresses the fundamental problem that naive approaches either discard critical information or overwhelm the model with excessive data.

The smart truncation with memory approach retains both the first 100 characters and last 100 characters of context while storing the middle section in a memory store. This strategy preserves both initial framing information and recent context while making intermediate content retrievable on demand. Critically, the agent maintains control over what information it considers important through explicit retrieval actions. Implementation details include compression of duplicate messages and long tool calls, retention of only the latest tool results, and preservation of the system prompt without reset. This approach has demonstrated stability over several months in production without requiring modifications, suggesting robustness across diverse use cases.

3.2 Sub-Agent Architecture for Data-Intensive Operations

The recognition that not all context belongs in the same agent led to development of a sub-agent architecture for handling data-intensive operations. In observability contexts where search tasks may involve hundreds of spans, maintaining all relevant data in the main conversation agent proves infeasible. The architectural solution separates concerns: the main agent handles conversation flow and light context only, while delegating heavy data processing tasks to specialized sub-agents.

This pattern functions as follows: sub-agents maintain heavy data context in isolation while the main conversation context remains minimal. Results from sub-agent processing are passed back to the main agent in compressed form, with the memory store remaining accessible when detailed information is required. This architectural separation has proven to be, in the words of practitioners, "a game-changer for data-intensive operations," enabling agents to handle tasks that would otherwise exceed context limits or degrade performance through information overload.

3.3 Long Session Evaluation Methodology

A critical insight from production deployment concerns the temporal distribution of failures. Users do not restart conversations frequently; instead, conversations grow naturally across multiple pages and interactions. Consequently, failures manifest late in conversations and remain undetected until user reports surface them. This observation pattern necessitated development of long session evaluation methodologies.

The evaluation approach loads 10 conversation turns and tests the 11th turn specifically to identify context management failures early in the development cycle. This methodology transforms previously user-discovered bugs into testable conditions, enabling proactive identification of context degradation. The evaluation framework proves particularly valuable given observed conversation growth patterns: initial implementations handled fewer than 10 turns per conversation, while current usage patterns show conversations extending to 20 or more turns as users traverse applications more extensively.

3.4 Observed Failure Patterns and Constraints

Production experience reveals specific failure modes that inform design decisions. Very large prompts continue to hit provider limits despite context management strategies, indicating that huge context "still breaks things" even with optimization. Customer context grows as agents are used to understand agent data itself—system prompts, messages, and history accumulate across sessions. The recursive nature of observability analysis exacerbates this growth: each trace contains user input, prompts, and metadata, and multiple traces multiply data exponentially.

Significantly, these challenges emerge despite context management implementation, suggesting that context engineering represents an ongoing optimization problem rather than a solved challenge. Current implementations employ basic heuristics (first 100, last 100 characters) without principled context budgets or clear quality metrics, indicating substantial room for refinement.

4. Technical Insights

4.1 Implementation Specifications

The smart truncation strategy employs specific numerical thresholds: retention of the first 100 characters and last 100 characters of context, with middle sections stored in memory with associated IDs and conversation positions. The memory store provides agents with a tool containing IDs and preview capabilities, enabling selective retrieval of stored context. This implementation balances deterministic control (guaranteed retention of boundaries) with flexible access (agent-directed retrieval of intermediate content).

Tool call compression removes duplicate messages and excessively long tool calls, retaining only the latest result. This compression proves essential in observability contexts where repeated queries generate similar outputs. The system prompt remains unmodified across turns, providing stable grounding for agent behavior even as conversation context evolves.

4.2 Architectural Trade-offs

The sub-agent pattern introduces architectural complexity in exchange for context scalability. Main agents must manage delegation decisions, handle sub-agent results, and maintain coherent conversation flow despite distributed processing. However, this complexity proves manageable relative to the alternative: context overflow and performance degradation in monolithic agent architectures.

Memory separation creates a trade-off between context completeness and model capacity. By removing information from immediate context, the system risks the agent failing to retrieve critical information when needed. However, empirical evidence suggests that agent-directed retrieval proves more reliable than either naive truncation or summarization, as agents demonstrate capability to recognize when additional information is required.

4.3 Convergent Design Patterns

Notably, Claude's subsequent code release demonstrated similar truncation and compression strategies to those developed independently at Arize. This convergence suggests that the smart truncation approach represents a robust solution to fundamental context management challenges rather than a domain-specific optimization. The independent arrival at similar solutions across different organizations and use cases provides validation for the architectural principles underlying the approach.

5. Discussion

The findings presented in this synthesis have significant implications for production AI agent development. The paradigm shift from prompt engineering to context engineering reflects a maturation in understanding of agent failure modes. While prompt optimization remains relevant, it operates within constraints established by context management decisions. As context management improves, prompt engineering becomes more effective; conversely, optimal prompts cannot compensate for poor context management.

The three critical elements identified—context engineering, memory management, and evaluation methodologies—form an integrated framework rather than independent optimizations. Context engineering determines what information is available; memory management enables selective access to information beyond immediate context; evaluation methodologies ensure that context strategies perform reliably across extended interactions. These elements must be developed in concert to achieve robust agent performance.

Several areas require further investigation. Long-term memory remains unimplemented in current systems; existing memory represents context with memory store access rather than persistent knowledge across sessions. As conversation lengths continue growing from fewer than 10 turns to 20 or more turns, long-term memory will likely become essential. Additionally, cache invalidation has not yet been prioritized, though it will become critical as memory stores grow. Current context selection employs basic heuristics without principled context budgets or clear quality metrics, suggesting opportunities for more sophisticated selection algorithms informed by information theory or relevance modeling.

The observation that context management constitutes a product and UX problem, not merely an engineering challenge, deserves emphasis. Decisions about what information agents see directly impact user experience and agent utility. Product requirements should inform context management strategies: what tasks must agents perform reliably? What information is truly essential versus merely available? These questions require cross-functional collaboration between engineering, product, and user experience teams.

6. Conclusion

This synthesis demonstrates that context management has emerged as the primary determinant of AI agent success in production environments, superseding prompt engineering as the critical optimization vector. The three-part solution architecture—smart truncation with memory separation, sub-agent delegation, and long-session evaluation—provides a practical framework for addressing context management challenges in data-intensive applications.

Key contributions include: (1) identification of the vicious loop problem in observability contexts, (2) specification of the smart truncation with memory approach including concrete implementation details, (3) articulation of the sub-agent pattern for data-intensive operations, and (4) development of long-session evaluation methodologies for detecting context degradation. These contributions are grounded in production experience and have demonstrated stability over multiple months of deployment.

Practical takeaways for practitioners include: prioritizing context engineering over prompt engineering, implementing memory separation to enable selective information access, employing sub-agent architectures for data-intensive operations, and developing evaluation frameworks that test extended conversation scenarios. Importantly, context management should be understood as an iterative optimization process rather than a one-time design decision. As agent capabilities expand and usage patterns evolve, context management strategies must adapt accordingly. The field remains in early stages of understanding optimal context management, with substantial opportunities for refinement through principled approaches to context budgeting, selection algorithms, and long-term memory integration.

Sources

How we solved Context Management in Agents — Sally-Ann Delucia - Original Creator (YouTube)
Analysis and summary by Sean Weldon using AI-assisted research tools

About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub