MCP UI: Extending the frontier — Liad Yosef and Ido Salomon, MCP Apps

MCP apps standardize how UI is passed through the Model Context Protocol, enabling interactive applications from multiple services to be composed within chat...

By Sean Weldon

Abstract

The Model Context Protocol (MCP) apps standard addresses a critical limitation in agent-based interfaces: the degradation of rich, interactive data into unattributed text responses that eliminate source context and brand identity. This synthesis examines how MCP apps enables services to transmit HTML resources through the Model Context Protocol, which host applications transform into sandboxed, interactive interfaces while preserving conversation context and user control. Through standardized message passing mechanisms spanning a spectrum from notifications to tool calls to prompts, the protocol maintains host authority over user interactions while allowing domain experts to contribute specialized interfaces. With adoption across platforms serving over one billion users—representing an audience 160 times larger than the iPhone App Store at launch—MCP apps demonstrates a paradigm shift from monolithic applications to distributed UI components coordinated by intelligent agents, with projected standardization as a global protocol by 2026.

1. Introduction

The proliferation of large language model (LLM) interfaces has exposed fundamental tensions between conversational paradigms and the requirements for rich, interactive data presentation. Traditional Model Context Protocol (MCP) implementations return text-based tool responses that strip contextual information about data provenance and reduce complex information structures to undifferentiated text streams. This limitation has created significant barriers to enterprise adoption, as organizations hesitate to transmit proprietary data through systems that eliminate source attribution and brand identity cultivated through years of user experience refinement.

The core problem manifests in what practitioners describe as a "wall of text" phenomenon: when MCP tools return data as unformatted text, users lose the ability to distinguish whether information originates from Shopify, Booking.com, Expedia, or other services. This homogenization proves particularly problematic for data requiring visual understanding—product catalogs, booking interfaces, data visualizations—where the mismatch between presentation medium and information structure obscures rather than illuminates content. Chat interfaces, while effective for conversational exchanges, prove fundamentally inadequate for presenting structured, interactive data.

MCP apps represents the first official extension to the Model Context Protocol, standardizing how user interfaces are transmitted, rendered, and controlled within agent-based environments. Rather than forcing all interactions through text-based responses, MCP apps enables services to return HTML resources that host applications transform into interactive, sandboxed applications. This synthesis examines the technical architecture of MCP apps, its evolution from experimental protocols, the interaction models it enables, and its implications for application distribution and interface design in agent-mediated computing environments.

2. Background and Related Work

2.1 The MCPUI Prototype and Standardization Process

The MCPUI protocol, released in May 2023, represented an initial experimental approach to transmitting UI over MCP while preserving branding and user experience knowledge. The prototype phase validated technical feasibility through significant early adoption: Shopify transmitted MCPUI chunks for millions of stores, while Hugging Face converted all spaces to MCPUI widgets. This proof-of-concept demonstrated both the technical viability of the approach and substantial market demand from service providers seeking to maintain brand identity within conversational interfaces.

Collaboration with Anthropic and OpenAI subsequently formalized MCPUI into MCP apps, establishing it as the first official MCP extension. This standardization process ensured interoperability across major platforms including Visual Studio Code, Cursor, Claude, ChatGPT, and Microsoft Copilot. ChatGPT now recommends MCP apps as the official method for building ChatGPT applications, signaling institutional commitment to the protocol. The standardization has extended beyond traditional graphical interfaces to terminal applications, demonstrating cross-platform applicability across diverse computing environments.

2.2 Theoretical Foundation: Distributed UI in Agent Systems

MCP apps operates on a fundamental principle that challenges traditional web architecture: domain experts with decades of UX refinement should contribute specialized interfaces rather than expecting host applications to generate all user interfaces. This approach preserves accumulated knowledge about effective information presentation while adapting to agent-centric computing paradigms. The protocol recognizes that companies possess irreplaceable expertise in presenting their specific data types—expertise that should not be discarded in transitioning to conversational interfaces.

The architecture reflects a broader shift from monolithic applications to distributed UI components coordinated by intelligent agents. Rather than users navigating complete websites, the vision anticipates personal assistants accepting small UI chunks that compose into coherent experiences. This represents not merely a technical modification but a fundamental reconceptualization of how users interact with digital services.

3. Core Analysis

3.1 Architectural Principles and Resource-Based Design

MCP apps fundamentally alters the response model of MCP servers. Instead of returning text, servers return HTML resources that hosts supporting MCP apps transform into interactive applications. This resource-based approach enables rich, branded interfaces while maintaining the protocol's client-server separation. The architecture employs sandboxed rendering to ensure security while preserving interactivity—UI components execute within isolated environments that prevent unauthorized access to host systems or user data.

The message passing system standardizes how UI interactions communicate with host applications. When users interact with UI elements, events flow back to the host rather than directly to service backends. This design ensures that the host maintains control and context: every click, every form submission, every UI interaction remains within the conversation context. The host then decides whether to invoke tools, fetch additional resources, or process the interaction through the model. This architectural decision preserves the host's authority over the user's journey within the platform while allowing services to provide rich interfaces.

3.2 Interaction Models and the Control Spectrum

UI interactions in MCP apps exist on a standardized spectrum representing different levels of control distribution between UI components and host applications. At one end, notifications provide the highest UI control: the interface informs the host that something occurred without requiring host action. In the middle, tool calls represent moderate control: the UI instructs the host to invoke a specific tool with defined parameters. At the opposite end, prompts release all control to the host: the UI asks the host to process an arbitrary prompt, allowing the model full discretion in response generation.

This spectrum breaks traditional backend-direct interaction models where user clicks trigger immediate service backend calls. By routing all interactions through the host, MCP apps ensures that context remains centralized and that the conversational agent maintains awareness of all user actions. This design enables sophisticated multi-turn interactions where the agent can reason about UI interactions in relation to conversation history, user preferences, and cross-service context.

The standardization of this interaction model solves a critical coordination problem in multi-service environments. When multiple services provide UI components within a single conversation, the host must arbitrate between potentially conflicting interaction patterns. The control spectrum provides a common vocabulary for expressing interaction intent, enabling hosts to make consistent decisions regardless of which service originated the UI component.

3.3 UI Generation Approaches and Interoperability

MCP apps remains agnostic to UI generation methodology, supporting multiple approaches that address different use cases. Predefined UI represents the classic MCP app model where services build and transmit complete interfaces, accounting for approximately 8% of use cases. This approach suits scenarios where services possess specialized knowledge about optimal information presentation and wish to maintain complete control over visual design.

Declarative UI employs structured formats, typically JSON, where applications declare interface structure but hosts control component rendering. This approach enables consistent look-and-feel across services while allowing each service to specify its information architecture. Hosts can apply unified design systems, ensuring that UI components from multiple services compose harmoniously within the conversation interface.

Generative UI represents the most dynamic approach: models generate interfaces on-the-fly based on context and user needs. Claude's generative UI feature employs MCP apps as its underlying protocol, demonstrating how model-generated interfaces can integrate with the standardized framework. This approach enables unprecedented flexibility, allowing interfaces to adapt to specific user queries and contexts rather than relying on predefined templates.

The protocol's interoperability work extends to other emerging standards. Active development focuses on integration with A2UI (Google's generative UI protocol) and WebMCP, aiming to create a unified ecosystem where different UI generation approaches can coexist and interoperate. This interoperability work occurs through tri-weekly work group meetings where the specification evolves based on community feedback and implementation experience.

3.4 Distribution Model and Adoption Metrics

The distribution opportunity for MCP apps demonstrates unprecedented scale. ChatGPT serves 800 million weekly users, representing approximately 10% of global population. Combined with Claude, Visual Studio Code, and other supporting platforms, the total addressable audience exceeds one billion users—an audience 160 times larger than the iPhone App Store commanded at launch. This scale fundamentally alters the economics of application development.

The protocol's "write once, deploy everywhere" model eliminates platform-specific development overhead. A single MCP app codebase functions across all supporting hosts, including LibreChat, ChatGPT, Claude, Visual Studio Code, Cursor, and others. This cross-platform compatibility dramatically reduces development and maintenance costs compared to traditional application ecosystems requiring separate implementations for each platform.

Early adoption patterns demonstrate commercial viability. Shopify's deployment of MCPUI chunks for millions of stores illustrates enterprise-scale implementation, while Hugging Face's conversion of all spaces to MCPUI widgets demonstrates adoption in technical communities. These implementations validate both the technical architecture and the business model for service providers seeking to maintain presence in conversational interfaces.

4. Technical Insights

4.1 Implementation Architecture and Security Model

The technical implementation of MCP apps requires hosts to support resource transformation and sandbox rendering. When an MCP server returns an HTML resource, the host must parse the resource, establish a sandboxed execution environment, and render the interface while maintaining security boundaries. The sandbox prevents malicious code execution while allowing legitimate interactivity—a balance requiring careful security engineering.

Message passing between sandboxed UI components and host applications employs standardized event formats. UI interactions generate events containing interaction type (notification, tool call, or prompt), relevant parameters, and context information. Hosts process these events according to their control policies: some interactions may trigger immediate tool invocation, while others may require model reasoning or user confirmation. This flexibility allows hosts to implement varying security and user experience policies while maintaining protocol compatibility.

4.2 Performance Optimization and Reusable Views

Current development addresses performance challenges in heavy applications. Reusable views represent a forthcoming feature allowing services to reference the same view and push data into it rather than re-rendering complete interfaces. This optimization proves critical for applications like Autodesk, where complex visualizations impose significant rendering overhead. By maintaining view state and updating only data, reusable views reduce computational requirements and improve response latency.

Model-to-view interactions represent another active development area, standardizing how language models interact with UI components. By exposing tools to models that enable actions like clicking buttons or filling forms, this feature allows agents to manipulate interfaces programmatically. This capability enables sophisticated automation scenarios where models orchestrate multi-step interactions across complex interfaces based on user intent.

4.3 SDK Ecosystem and Developer Tools

The XApps SDK serves as the official toolkit for building MCP apps, providing abstractions for resource generation, event handling, and host communication. For host implementations, the MCP UIs SDK offers recommended client-side functionality for rendering and managing UI components. These SDKs reduce implementation complexity and ensure consistency across the ecosystem.

The SDK architecture reflects lessons learned during the MCPUI prototype phase. By providing high-level abstractions for common patterns while allowing low-level control when necessary, the tooling accommodates both simple use cases and complex applications requiring fine-grained control over interaction behavior.

5. Discussion

The emergence of MCP apps as a standardized protocol represents a significant inflection point in human-computer interaction. The projection that 2026 will mark MCP apps' establishment as a global standard for UI in chat applications suggests rapid industry convergence around this model. This timeline implies substantial ongoing investment from major platform providers and anticipates that conversational interfaces will become primary interaction modalities for many services currently accessed through traditional web interfaces.

The protocol's success depends on resolving several ongoing challenges. Interoperability with competing standards like A2UI requires sustained coordination across organizations with potentially divergent interests. Performance optimization for complex applications remains an active research area, with reusable views representing one approach among potentially many required solutions. The security model must evolve to address emerging threats as adoption scales and attackers develop sophistication in exploiting sandbox boundaries.

The broader implications extend beyond technical architecture to business models and user behavior. If personal assistants accepting small UI chunks replace traditional web browsing, the economics of online services will transform fundamentally. Advertising models predicated on full-page layouts and extended browsing sessions may become obsolete. Service providers will need to optimize for brief, focused interactions within conversational contexts rather than extended engagement within their own properties. The vision that "we won't have browsers as we know them" within two years represents a radical disruption to established patterns of digital interaction.

6. Conclusion

MCP apps establishes a standardized protocol for transmitting, rendering, and controlling user interfaces within agent-based computing environments. By enabling services to return HTML resources that hosts transform into interactive, sandboxed applications, the protocol preserves brand identity and domain expertise while adapting to conversational paradigms. The standardized message passing system—spanning notifications, tool calls, and prompts—maintains host control over user interactions while allowing rich, service-specific interfaces.

The protocol's adoption across platforms serving over one billion users demonstrates commercial viability and suggests potential for fundamental transformation in application distribution models. The "write once, deploy everywhere" architecture dramatically reduces development overhead compared to platform-specific implementations, while the audience scale exceeds early mobile ecosystems by orders of magnitude. For researchers and practitioners, MCP apps represents both a technical specification enabling immediate implementation and a paradigm shift anticipating agent-mediated computing's evolution. Future work should examine user experience patterns emerging from multi-service UI composition, security implications of widespread sandbox rendering, and economic models for services operating within conversational rather than destination-based interaction paradigms.


Sources


About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub