Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase

Building effective AI agent skills requires treating them as documentation, being opinionated about product workflows, and providing critical guidance rather...

2026-05-19 By Sean Weldon

Guidance Over Context: A Framework for Effective AI Agent Skills Development

Abstract

This research synthesis examines the development and implementation of agent skills for AI systems, demonstrating that effective agent guidance requires structured documentation strategies emphasizing critical workflow direction rather than comprehensive context provision. Through systematic evaluation of six Supabase-specific scenarios across five large language models (Claude Code, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4o, and GPT-4o mini), the study establishes that combining skills with Model Context Protocol (MCP) servers significantly outperforms either approach independently. Three core principles emerge: treating skills as documentation pointers to single sources of truth, embedding critical information directly in skill files to overcome agent information retrieval laziness, and providing opinionated workflow guidance based on product expertise. Empirical testing using Braintrust evaluation metrics confirms that skills-plus-MCP configurations consistently achieve superior task completeness scores across all tested models, with particular efficacy in preventing security vulnerabilities such as PostgreSQL row-level security (RLS) misconfigurations.

1. Introduction

Contemporary AI agents function as autonomous systems capable of interacting with complex software products, APIs, and development workflows. However, these systems face fundamental limitations that constrain their effectiveness: they operate on stale training data, exhibit reluctance to acknowledge knowledge gaps, and frequently miss critical product-specific requirements such as security configurations. The Supabase implementation case study reveals that agents provided only with MCP server access consistently failed to implement proper row-level security configurations, exposing potential data vulnerabilities that human developers would typically avoid.

The central thesis of this research posits that effective agent performance requires structured skills—formalized guidance documents that direct agent behavior—rather than mere access to reference materials or tool interfaces. This represents a paradigm shift from context maximization to guidance optimization. As the research demonstrates, "the bottom line is not the context, it's the guidance."

This synthesis examines the development of agent skills for Supabase, an open-source backend-as-a-service platform, where systematic evaluation establishes three foundational principles for skill development: avoiding information duplication through documentation pointers, accounting for agent laziness in information retrieval by embedding critical information, and providing opinionated workflow guidance. The analysis presents controlled evaluation results across multiple LLM architectures and discusses implementation challenges including distribution mechanisms and standardization efforts.

2. Background and Related Work

2.1 Agent Architecture and Tool-Calling Economics

Contemporary AI agents leverage Large Language Models (LLMs) trained on historical data, creating inherent knowledge staleness that becomes increasingly problematic as products evolve. These systems access external resources through tool-calling mechanisms, including the Model Context Protocol (MCP), which provides standardized interfaces to external data sources and APIs. However, tool-calling incurs computational costs that create perverse incentives: agents preferentially default to training data rather than fetching current information, even when such information is readily available through provided tools.

2.2 The Skills Framework

The skills framework structures agent guidance as folder hierarchies containing front matter metadata (name, description), instruction files (skill.md), and optional bundled resources and scripts. Unlike traditional documentation or reference files, skills function as executable guidance that agents progressively discover and load. This approach treats skills as agent-agnostic standards, distinguishing them from model-specific plugins or system prompts. The framework enables what was previously impossible: systematic evaluation of documentation effectiveness and agent behavior through evals (evaluations).

2.3 Security Context: PostgreSQL Row-Level Security

Row-level security (RLS) in PostgreSQL restricts data access at the row level based on user permissions. When creating database views, omitting the security_invoker = true flag causes the view to execute with the permissions of the view creator rather than the querying user, potentially exposing data that would otherwise remain protected by table-level RLS policies. This specific vulnerability serves as a critical test case for agent skill effectiveness, as it represents precisely the type of non-obvious security requirement that training data alone fails to capture.

3. Core Analysis

3.1 The Guidance Problem: Empirical Evidence of Agent Limitations

Systematic testing revealed that agents provided only with MCP server access to Supabase APIs consistently failed security-critical tasks. Specifically, when instructed to create SQL views with appropriate security configurations, agents without skill guidance omitted the security_invoker = true flag, creating potential data exposure vulnerabilities. This failure occurred despite the availability of MCP tools that could theoretically provide access to relevant documentation.

The research identified three distinct failure modes. First, agents operated on stale training data rather than fetching current product information, even when tools enabling such retrieval were available. Second, agents exhibited reluctance to acknowledge knowledge gaps, preferring to proceed with potentially incorrect implementations rather than searching for authoritative guidance. Third, agents failed to recognize product-specific optimized workflows, defaulting instead to generic approaches that created inefficiencies or errors.

3.2 Principle One: Documentation Pointers Over Duplication

The first principle establishes that skills should function as pointers to authoritative documentation rather than duplicating information. This approach maintains a single source of truth while directing agents to relevant resources. The implementation strategy requires persistence: agents must be explicitly instructed to search documentation and web resources rather than defaulting to training data.

Supabase implemented this principle by exposing documentation through SSH, leveraging agents' familiarity with file system navigation and Linux-based tools. This architectural decision exploited existing agent capabilities rather than requiring new interaction paradigms. The approach acknowledges that agents possess strong competencies in file system operations, making documentation navigation through familiar interfaces more reliable than alternative methods.

3.3 Principle Two: Accounting for Agent Information Retrieval Laziness

The second principle addresses a critical empirical finding: agents consistently skip optional information retrieval steps. Reference files, even when explicitly provided, are rarely loaded. The research established that agents nearly never load three to four reference files, demonstrating severe limitations in multi-document information synthesis. This behavior stems from the computational expense of tool-calling and the availability of training data as a lower-cost alternative.

Consequently, critical information that cannot be missed must be embedded directly in the skill.md file rather than relegated to reference files. The Supabase implementation initially placed a security checklist in a reference file; agents consistently failed to consult it. Moving this checklist to the main skill file immediately improved compliance. This principle establishes a clear hierarchy: information criticality determines placement, with essential guidance embedded in primary skill files and supplementary information available through documentation pointers.

3.4 Principle Three: Opinionated Workflow Guidance

The third principle advocates for explicit workflow recommendations based on product expertise and observed user behavior patterns. Rather than presenting agents with all possible approaches, skills should guide agents toward workflows known to be most effective for specific products.

Supabase's recommended schema management workflow exemplifies this principle: agents should run DDL (Data Definition Language) operations freely on development and staging environments, use an advisor tool to identify security and performance issues, fix identified issues, and only then create migration files. This opinionated guidance prevents agents from creating migration files on every schema change—a common but inefficient pattern that creates unnecessary version control complexity.

This principle recognizes that product teams possess valuable knowledge about optimal usage patterns that may not be discoverable through documentation alone. Skills serve as the appropriate vehicle for transmitting this expertise to agents.

3.5 Empirical Validation Through Systematic Evaluation

The research conducted systematic evaluation across six specific Supabase scenarios, testing four distinct agents from two vendors under three experimental conditions: baseline (no MCP or skills), MCP only, and MCP plus skills. The tested models included Claude Code, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4o, and GPT-4o mini. Evaluation used test completeness scores graded on the Braintrust platform.

Results demonstrated that skills-plus-MCP configurations outperformed all other conditions on every tested model. This finding held across both Claude and GPT model families, suggesting that the effectiveness of skills transcends specific architectural differences between model providers. The consistency of results across model types provides strong evidence that skills address fundamental agent limitations rather than vendor-specific quirks.

4. Technical Insights

4.1 Implementation Architecture

Skills consist of structured folders containing front matter metadata, a primary skill.md instruction file, and optional bundled resources and scripts. Agents progressively discover and load skills, creating a dynamic guidance system that adapts to task requirements. This progressive loading mechanism prevents overwhelming agents with unnecessary information while ensuring relevant guidance remains accessible.

The Supabase implementation exposes documentation through SSH interfaces, enabling agents to navigate documentation using familiar file system commands. This architectural decision reduces the cognitive load associated with learning new interaction paradigms while leveraging existing agent competencies in Linux-based tool usage.

4.2 Distribution Challenges and Current Approaches

Skills distribution remains an unsolved problem in the ecosystem. No standardized registry or package manager currently exists for skills, creating fragmentation in distribution approaches. Vercel has introduced a skills package, while some implementations bundle skills with MCP servers as model-specific plugins. Supabase packages skills directly in product repositories (.claude plugin, .cursor plugin), enabling skills packages to fetch from open-sourced repositories with appropriate access credentials.

The lack of standardization creates practical barriers to adoption despite growing recognition of skills as an emerging open standard. Future work must address discovery mechanisms, versioning strategies, and dependency management to enable ecosystem-wide skills adoption.

4.3 Iterative Development Strategy

The research advocates for minimal initial skill development with iterative expansion based on observed agent behavior. Rather than attempting comprehensive skill creation upfront, developers should start with essential guidance and expand as patterns of agent failure emerge through evaluation. This approach acknowledges that predicting agent failure modes proves difficult without empirical testing, making iterative refinement more efficient than comprehensive upfront design.

Creating new skill versions as understanding expands enables A/B testing of guidance approaches and systematic improvement of agent performance over time. The availability of evaluation frameworks makes this iterative approach practical by providing quantifiable metrics of skill effectiveness.

5. Discussion

The findings establish that effective agent guidance requires fundamentally different approaches than human-oriented documentation. While comprehensive context provision might benefit human developers, agents require curated, opinionated guidance that explicitly directs behavior toward known-effective patterns. This distinction reflects underlying differences in how agents and humans process information: humans synthesize information from multiple sources and apply contextual judgment, while agents face computational constraints that incentivize shortcuts.

The consistent performance improvements observed across multiple model architectures suggest that current agent limitations reflect fundamental characteristics of tool-calling economics rather than transient implementation details. As models evolve, the relative cost of information retrieval versus training data reliance may shift, potentially altering optimal skill design patterns. However, the core insight—that guidance matters more than context—likely remains relevant across architectural generations.

Several areas warrant further investigation. First, the optimal balance between embedded guidance and documentation pointers remains unclear for different types of information. Second, the interaction between skills and semantic search augmentation using vector embeddings deserves systematic study. Third, the generalization of these principles beyond development tools to other agent application domains requires validation.

The distribution challenge represents a critical barrier to ecosystem-wide adoption. Without standardized discovery and versioning mechanisms, skills risk remaining siloed within individual products rather than becoming truly interoperable agent guidance standards. Industry coordination on registry infrastructure and skill packaging standards would significantly accelerate adoption.

6. Conclusion

This research establishes that effective AI agent skills require treating guidance as fundamentally distinct from context provision. The three core principles—using skills as documentation pointers, embedding critical information to overcome agent laziness, and providing opinionated workflow guidance—provide actionable frameworks for skill development. Empirical validation across five models demonstrates that skills-plus-MCP configurations consistently outperform alternative approaches, with particular efficacy in preventing security vulnerabilities.

Practical takeaways for practitioners include: pointing skills to single sources of truth rather than duplicating documentation, placing critical information directly in primary skill files rather than reference files, providing explicit workflow recommendations based on product expertise, and adopting iterative development approaches that expand skills based on observed agent behavior patterns. The research demonstrates that systematic evaluation through evals enables previously impossible testing of documentation effectiveness and agent behavior.

Future work must address distribution standardization to enable ecosystem-wide skills adoption. As agent capabilities evolve, continued empirical validation of skill design principles across model generations will ensure guidance approaches remain effective. The fundamental insight—that agents require guidance over context—provides a foundation for developing increasingly capable autonomous systems that operate reliably with complex software products.

Sources

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase - Original Creator (YouTube)
Analysis and summary by Sean Weldon using AI-assisted research tools

About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub