MCP = Mega Context Problem - Matt Carey

Matt Carey on MCP's mega-context problem — Cloudflare's 2.3M-token OpenAPI spec, progressive tool discovery, and isolated code-gen for 2,600+ endpoints.

2026-04-27 By Sean Weldon

Abstract

This paper examines the evolution of agent-API interaction paradigms, addressing the fundamental challenge of providing comprehensive API access to AI agents without overwhelming context windows. The analysis traces the progression from bundled tool implementations through standardized Model Context Protocol (MCP) servers to code generation approaches executed in isolated environments. The central finding demonstrates that progressive tool discovery via code generation against typed SDKs, executed in isolation primitives such as V8 isolates, provides superior scalability compared to exhaustive tool enumeration. Cloudflare's implementation, exposing over 2,600 API endpoints through a single MCP server, validates this approach. The paper identifies isolated code execution as an emerging infrastructure primitive and discusses implications for both API providers and client implementations, including the integration of MCP as native middleware in full-stack frameworks.

1. Introduction

The proliferation of AI agents capable of interacting with external systems has necessitated robust mechanisms for API access. The fundamental question—"How do we give agents hands?"—has driven the development of multiple paradigms for tool access and function calling. Early implementations bundled tools directly with agent code, creating duplication and maintenance challenges across deployments where each agent implementation required its own tool definitions.

The Model Context Protocol (MCP), introduced as a standardization effort, attempted to address these issues by treating tools as a shared API surface that service providers could expose. However, this approach encountered immediate scalability limitations when confronted with enterprise-scale API surfaces. Cloudflare's OpenAPI specification, comprising 2.3 million tokens and representing over 2,600 API endpoints, exemplifies the context window explosion problem that renders naive tool enumeration approaches infeasible. Even after conversion to tool definitions, the specification consumed 1.1 million tokens—far exceeding practical context window limits.

This analysis examines three progressive discovery approaches—CLI introspection, tool search, and code generation—and establishes code generation with isolated execution as the most scalable solution. The paper further explores the infrastructure requirements, security considerations, and architectural implications of this paradigm shift, demonstrating how isolated code execution environments enable comprehensive API access while maintaining security boundaries.

2. Background and Related Work

2.1 Tool Calling and Early Standardization Efforts

Tool calling or function calling represents the foundational approach wherein large language models generate structured function invocations that are subsequently executed by the host system. This pattern enables agents to perform discrete actions such as weather queries, database lookups, or API calls while maintaining separation between reasoning and execution. The initial phase of bundled tools, where each agent implementation maintained its own tool definitions, created significant duplication across deployments and complicated maintenance as APIs evolved.

The Model Context Protocol emerged as an industry response to tool fragmentation, enabling service providers to expose standardized tool interfaces. This approach parallels Retrieval-Augmented Generation (RAG) patterns, where relevant context is selectively loaded rather than maintaining all information in active memory. However, the standardization effort revealed a critical limitation: comprehensive API coverage through enumerated tools proved impractical at enterprise scale.

2.2 The Product-Based Fragmentation Problem

Cloudflare's initial MCP implementation exemplifies the scalability challenges inherent in exhaustive tool enumeration. The organization deployed 16 separate MCP servers to manage its API surface, yet each server provided incomplete coverage—approximately six tools per server against product suites containing up to 30 endpoints. This fragmentation forced users to select product-specific MCP services, fundamentally failing the objective of comprehensive API accessibility. Users requiring access to multiple product domains faced the choice of either loading multiple MCP servers (exacerbating context window constraints) or accepting incomplete API coverage. This product-based fragmentation demonstrated that the core goal—"make every API a tool for agents"—remained unachievable through direct tool enumeration.

3. Core Analysis

3.1 Progressive Discovery Mechanisms

Three distinct approaches have emerged to address context window limitations while providing comprehensive API access. The CLI-based introspection approach leverages existing command-line interfaces, enabling agents to invoke help flags and discover available commands dynamically. This method, utilized by systems such as Open Claw, requires shell access and relies on the availability of well-documented CLI tools. While effective for systems with mature CLI interfaces, this approach introduces security considerations related to shell access.

The tool search mechanism employs keyword matching to load a subset of relevant tools into context based on the agent's current task. In observed implementations, this approach loaded K=8 tools into context, utilizing approximately 500 of 2,100 available tokens. This method provides a middle ground between exhaustive enumeration and complete progressive discovery, though it requires effective keyword matching and risks missing relevant tools when queries do not align with indexed keywords.

The code generation approach represents the most scalable solution, wherein models generate executable code against typed SDKs derived from OpenAPI specifications. This paradigm shift treats API access as a code generation problem rather than a tool selection problem, enabling agents to reason about API capabilities through type information rather than exhaustive tool definitions. Cloudflare published validation of this approach in a blog post demonstrating comprehensive API access through generated TypeScript code.

3.2 Typed SDKs and Compact Representation

The efficacy of code generation approaches derives from the compactness of type information relative to full tool specifications. Typed SDKs, generated from OpenAPI specifications, provide concise representations of API inputs and outputs that agents can reason about without requiring complete documentation in context. As noted in the analysis, "code is actually a very compact plan"—a single code generation tool with type information provides more degrees of freedom than multiple discrete tool definitions.

The Cloudflare implementation demonstrates this efficiency: rather than loading 1.1 million tokens of tool definitions, the system provides type information for the entire API surface, enabling the model to generate code that lists workers, deploys workers, adds Access policies, inspects DNS configurations, and sends emails. This approach achieves complete API coverage through a single MCP server, contrasting sharply with the previous 16-server fragmented implementation.

3.3 Isolated Execution Environments

The primary barrier to widespread adoption of code generation approaches has been security concerns surrounding untrusted code execution. As explicitly stated in the source material, "Running untrusted code is mega mega scary... That's a CV. Like, it's a vulnerability." Generated code could potentially read filesystems, exfiltrate secrets, execute infinite loops, consume excessive resources, or run cryptocurrency miners.

Previous solutions—domain-specific languages (DSLs), JSON specifications, virtual machines, sandboxes, and code review—proved insufficient for production deployment. The emergence of V8 isolates as a lightweight, programmable sandbox solution addresses these security concerns through multiple mechanisms. Cloudflare Workers, built on V8 isolates, provide programmable guardrails including boolean toggles for internet access, domain-level restrictions, and node compatibility controls. Critically, generated code executes in dynamic workers with no access to process.env or secrets by default, ensuring isolation from sensitive data.

This isolation primitive enables the execution of agent-generated code on backend infrastructure, fully separated from client environments. The programmable nature of these guardrails allows fine-grained control over execution capabilities, balancing functionality with security requirements.

3.4 Architectural Implications and Scalability

The code generation paradigm necessitates architectural considerations at both infrastructure and client levels. At infrastructure scale, APIs must handle aggressive rate limiting scenarios where agents can execute code in for loops across multiple sandboxes simultaneously. Services require protection against resource exhaustion from AI-generated requests, particularly as APIs become the primary interface for AI users rather than humans. Infrastructure primitives must scale to billions of requests, as demonstrated by Cloudflare's implementation.

Client-side architectures face different challenges. Building MCP clients has proven difficult due to requirements for managing stateful connections, resumability, and performance optimization, resulting in stripped-down implementations. The analysis identifies programmatic tool calling as an emerging pattern where clients execute untrusted code either locally (termed "YOLO eval") or remotely in sandboxes. This enables saved mini scripts where users preserve generated code for repeated tasks such as cron jobs or web scraping, with agents fixing and resaving code when breakage occurs.

At scale, stateless agent loops are preferred over per-agent sandboxes, particularly in scenarios involving 100+ agents per person. This architectural choice optimizes resource utilization while maintaining isolation boundaries through the underlying execution primitive rather than persistent sandbox instances.

4. Technical Insights

4.1 Implementation Metrics and Performance Characteristics

The Cloudflare implementation provides concrete metrics validating the code generation approach. The system exposes over 2,000 API endpoints through a single MCP server with authentication, providing read-only access to the entire Cloudflare API surface by default. This represents a reduction from 16 servers with incomplete coverage to comprehensive access through a unified interface. The token efficiency gains are substantial: from 2.3 million tokens (OpenAPI specification) to 1.1 million tokens (tool definitions) to the compact type information required for code generation.

4.2 Security Model and Access Control

The security model relies on multiple layers of isolation. V8 isolates provide the foundational isolation primitive, while programmable guardrails enable fine-grained access control. Node compatibility can be toggled to control access to Node APIs and environment variables. Internet access and domain restrictions operate as boolean toggles, enabling precise control over execution capabilities. Importantly, account IDs are not considered secrets in Cloudflare's security model, simplifying authentication patterns while maintaining security boundaries.

4.3 Emerging Infrastructure Patterns

Multiple platforms are developing code execution primitives, including Pydantic Monty for Python, Deno, and Cloudflare Workers (WorkerD). This convergence suggests isolated code execution is becoming a standard infrastructure primitive. The analysis draws a historical parallel: "In the 1950s, when you wanted to run something on a computer in your local town, you printed out some punch cards and you stamped them... that was running untrusted code." The AI era thus represents a return to untrusted code submission patterns, now mediated by modern isolation primitives.

4.4 Framework Integration Trajectory

The analysis predicts that MCP will become a native flag in TypeScript full-stack frameworks such as Next.js by year's end. The SDK is becoming lightweight enough to bundle natively without introducing bloat. A single Next.js application could expose 1,000+ APIs via MCP with an MCP=true flag. This integration becomes feasible specifically because clients perform programmatic tool calling—one code tool replaces a thousand individual tool definitions, eliminating the context window constraints that would otherwise prevent such comprehensive API exposure.

5. Discussion

The progression from exhaustive tool enumeration to code generation with isolated execution represents a fundamental paradigm shift in agent-API interaction. This evolution addresses the core scalability constraint—context window limitations—while simultaneously providing more comprehensive API access than previous approaches. The success of this paradigm depends critically on the availability of robust isolation primitives, suggesting that infrastructure development in this domain will be a key enabler for agent capabilities.

The emergence of isolated code execution as a standard primitive has broader implications for cloud architecture. The analysis notes that cloud computing historically moved away from untrusted code execution patterns, but the AI era necessitates their return with modern security guarantees. This represents a significant architectural shift that will require infrastructure providers to develop and scale isolation primitives capable of handling billions of agent-generated requests.

Several areas warrant further investigation. The interaction between model capabilities and code generation success rates remains an open question—as models improve, code generation becomes more viable, creating a positive feedback loop. The optimal balance between local and remote code execution for different use cases requires empirical validation. Additionally, the integration patterns for MCP in full-stack frameworks will likely evolve as adoption increases and best practices emerge.

The shift toward APIs as the primary interface for AI users rather than humans necessitates rethinking API design principles. Rate limiting strategies, error handling patterns, and documentation approaches may require adaptation to accommodate agent interaction patterns that differ fundamentally from human usage.

6. Conclusion

This analysis establishes code generation with isolated execution as the most scalable approach for comprehensive agent-API interaction, validated through Cloudflare's implementation exposing over 2,600 endpoints through a single MCP server. The key insight is that type information provides a more compact and flexible representation than exhaustive tool enumeration, enabling agents to reason about entire API surfaces within context window constraints.

The practical implications are significant for both infrastructure providers and application developers. Infrastructure providers must develop robust isolation primitives with programmable guardrails to enable secure execution of agent-generated code at scale. Application developers can leverage emerging MCP integration in full-stack frameworks to expose comprehensive API surfaces without context window concerns. The convergence of multiple platforms on isolated code execution primitives—Pydantic Monty, Deno, Cloudflare Workers—suggests this approach will become standard infrastructure within the AI ecosystem. As model capabilities continue to improve, code generation approaches will increasingly dominate agent-API interaction patterns, fundamentally reshaping how AI systems interface with external services.

Sources

YBYUvGOuotE - Original Creator (YouTube)
Analysis and summary by Sean Weldon using AI-assisted research tools

About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub