SKILLS are king

Skills are reusable on-demand workflows that let AI agents compound capabilities instead of starting from scratch — a layer above tools and MCPs.

2025-12-25 By Sean Weldon

I recently watched a video about Skills are reusable, on-demand workflows containing instructions and code that enable AI... and wanted to share the key insights.

I Finally Get Why AI Agents Need "Skills" (And It's Not What I Expected)

So I watched this fascinating video about something called "Skills" for AI agents, and honestly, it completely changed how I think about building these systems. Let me break down what I learned, because this is actually pretty game-changing.

Wait, What Even Are Skills?

Here's the thing—when I first heard "skills," I thought it was just another buzzword for tools or functions. But it's actually something different entirely. Skills are basically reusable workflows that contain both instructions AND code that agents can call on whenever they need them.

The key insight that clicked for me: instead of your AI agent figuring out the same complex process over and over again, skills let it build up a library of "here's how to do this thing properly" that it can just reuse. It's like the difference between googling "how to make lasagna" every single time versus having your grandma's recipe card you can pull out whenever you need it.

The Three Layers (Or Actually Two Layers Plus One Weird Axis)

The video explained that agent capabilities work in three layers, which helped me understand where skills actually fit:

Tools are the basic building blocks—just code wrapped up so an agent can call it to do one specific thing. Think of it like a single function: "add these two numbers" or "fetch this data." Pretty straightforward.

MCP (Model Context Protocol) is basically a standardized way for agents to connect to external systems like Google Drive, Slack, Notion, databases, whatever. The speaker pointed out that the only real difference from custom tools is WHERE they run—custom tools run on your machine, MCP tools run on external servers. MCP makes it super easy to hook agents up to external stuff.

Skills sit on what the speaker called "a separate axis entirely." This is where it got interesting for me. Skills aren't just another type of tool—they're complete workflows. They're the glue that chains code and tools together to actually accomplish something from start to finish. Sometimes a skill is mostly code, sometimes it's calling a bunch of tools, sometimes it's calling MCP tools, but it's always representing a full workflow rather than just one action.

The Problems I Didn't Know Existed

The video highlighted two massive problems with traditional tool calling that I honestly hadn't thought about before:

Token bloat is REAL. Here's what blew my mind: agents have to see ALL available tools upfront, which means you're paying context costs for everything even when you only use one tool. The speaker shared an experiment where they had 2,000 tools and the first run ate up over 76,000 tokens. After converting it to a skill? About 8,000 tokens. That's a 10x reduction! And it was MORE reliable, not less.

Workflow complexity becomes a nightmare. When you need multi-step workflows with specific sequences, the model has to learn the entire process from scratch every single time. It never retains that knowledge. Most tools are too narrow—they only expose a few operations and lock your agent into rigid schemas. So you're stuck either building more tools or adding more schemas, and both just make the token bloat worse.

The speaker made a great point: skills become transformative when you're dealing with sequences of steps, edge cases, data transformations, iteration on output, or generating actual artifacts like documents, spreadsheets, or presentations. Basically, exactly where traditional tool calling falls apart. We use skills extensively in our web development workflow to standardize how we build and ship sites.

Why This Actually Matters (The Real Benefits)

I found the concrete benefits pretty compelling:

Token efficiency: Sure, the first time your agent explores a workflow it might use more tokens, but every subsequent run becomes WAY cheaper because it's not generating and debugging from scratch each time.

Increased autonomy: Instead of being boxed into rigid tool schemas, agents can execute full workflows in code and adapt on the fly. The speaker gave a deck generation example that really illustrated this—when they removed hard-coded tools and just gave the agent a single code execution tool with open-ended code generation, the layouts immediately improved. The agent could suddenly leverage anything the underlying library supported: tables, quotes, images, different layouts. Pretty cool.

Consistency: Your users get documents, spreadsheets, and presentations that follow the same structure and layout instead of getting random outputs every time. This alone seems huge for production use.

Reusability: Once your agent solves the hard part of a workflow, that solution becomes a building block you can reuse instead of reinventing the wheel.

Model distillation: This one was clever—you can use a big, expensive model (like Claude Opus) to CREATE the skill, then give that skill to a smaller, cheaper model without losing quality. Smart.

How Skills Actually Work

The speaker mentioned that ChatGPT already uses skills for documents, PDFs, and spreadsheets. The structure was originally established by Anthropic with MCP servers, and it follows a specific pattern:

A skill.md file with the name, description (so the agent can discover it), and instructions on how to use the scripts in the folder
Markdown files that can reference other markdown files with concrete examples the agent can read when it needs them
Reviewable, versioned code stored in your repository so you always know exactly what code your agent can execute

One critical point the speaker emphasized: Skills only become powerful when they contain code. Without code, you're just changing how the agent thinks, not what it can actually do.

They also recommended a best practice: create a key router tool that lists all your skills with names and descriptions, so the agent can pick the right skill before executing tasks.

The video mentioned that Anthropic's models have been specifically trained to use skills (which gives better results when using Claude for skill creation), and Anthropic has a GitHub repo with all the skills Claude has access to—including a special "skill creator skill" for building other skills. Meta, right?

The Scary Parts (Because There Are Always Scary Parts)

I appreciated that the speaker didn't gloss over the risks. Skills introduce some specific concerns:

Debugging gets harder: With traditional tool calls, you can see exactly what went wrong. With skills, your traces show massive chunks of code and agent logs, making it harder to spot mistakes without reading through all the code.

Code execution is inherently risky: When your agent is writing and executing open-ended code, you need secure sandboxes and you need to avoid sensitive or destructive operations unless you implement human-in-the-loop approval. The speaker compared it to MCP—giving an agent an MCP tool that can delete CRM records could wipe your entire database. Code execution carries the same risk but potentially faster and more destructive.

The simplest safety rule the speaker suggested: agents can USE skills but cannot MODIFY the skills directory themselves. Lock the folder down with permissions inside the sandbox. Skills should be reviewable and versioned in your repository.

Bottom line: if you're deploying production agents with skills, you need secure sandboxes out of the box.

How to Actually Create Good Skills

This part was really practical. The speaker warned against just "prompting AI to create prompts for other AI without additional context"—they called it creating a "slop machine," which made me laugh but is totally accurate.

Effective skill creation requires providing unique documentation, SOPs, and specific process details. Skills represent a way to take something an agent has figured out through open-ended code execution and transform it into a reusable building block.

The workflow they recommended: let your agents explore solutions through open-ended code execution, identify the successful patterns, then codify those patterns as skills for consistent reuse. This approach enables agents to compound their capabilities rather than starting from scratch each time.

My Takeaway

I found this whole concept pretty eye-opening. I've been building agents with traditional tool calling, and I've definitely hit the problems the speaker described—token bloat, inconsistent outputs, agents that seem to forget how to do things they've done before.

Skills seem like a genuinely different approach that solves real problems. It's not just a rebranding of existing concepts—it's a distinct architectural layer that lets agents build up capabilities over time instead of being perpetually stuck in groundhog day.

The tradeoffs are real (debugging complexity, security concerns), but for complex workflows and production systems, this seems like the direction things are heading. I'm definitely going to explore implementing this in my own projects.

Have you worked with skills in your agent systems? I'd love to hear about your experience in the comments!