Dynamic Workflows: How Claude Writes Its Own Harness

By Sean Weldon

Dynamic Workflows: How Claude Writes Its Own Harness

Most coding work fits neatly inside a single Claude Code session: plan and execute in one context window, ship the result. But long-running, parallel, or adversarial work breaks that model. Dynamic Workflows are the answer - Claude writing a custom harness, in JavaScript, that spawns and coordinates subagents for exactly the task in front of it. This is a working field guide: the mental model, the failure modes workflows solve, the core API, and the patterns worth internalizing.

Part 1: The Mental Model

1. A workflow is a harness Claude writes

The default Claude Code harness has Claude plan and execute in the same context window. For most coding work, this is great. For long-running, parallel, or adversarial work, it breaks down.

A Dynamic Workflow is Claude writing its own custom harness for the task - a JavaScript file with a few special functions that spawn and coordinate subagents, plus standard JavaScript (Math, JSON, Array) to process the data flowing between them.

Three things this gives you that the default harness cannot:

Start one by either asking Claude directly ("make a workflow that...") or with the trigger word ultracode. If a workflow is interrupted - user action, terminal quit - resuming the session picks up where it left off.

2. The 3 failure modes workflows solve

To know when a workflow is the right tool, you have to know what it fixes. The longer Claude works on a complex task in a single context window, the more it becomes susceptible to three specific failure modes:

A workflow solves all three structurally: separate Claudes with their own contexts, focused goals, and isolated state. If your task suffers from any of these patterns, that is the signal to reach for a workflow.

3. Static vs Dynamic workflows

You may have already built static workflows using the Claude Agent SDK or claude -p, coordinating multiple Claude Code instances together.

The reason the dynamic version wins isn't the search step - both can search. It's that the workflow gets to shape itself around your context: read your billing code, check each feature against the actual new provider docs, price at your transaction volume, and run an adversarial "why not to migrate" pass against its own emerging answer. A static harness can't do this because it doesn't know your code exists.

Part 2: The Core API

4. agent(), parallel(), pipeline()

Three functions do most of the work in a workflow. Knowing them is enough to read any workflow Claude writes for you and to nudge Claude when you want a specific shape.

parallel() is a barrier: it fans out, then waits for everything before returning. pipeline() is streaming: each item flows through every stage independently.

Pick by the question: do I need all results before I can do anything next? Yes, use parallel. No, use pipeline (cheaper, faster overall).

Part 3: The Patterns

5. Classify-and-act: route the work before doing it

A classifier agent decides on the type of task, then the workflow routes to different agents or behaviors based on the answer. Or a classifier runs at the end, sorting raw outputs into buckets for whatever comes next.

When this pattern earns its keep:

Example: "Explain how the auth module works." A classifier subagent reads the codebase first, estimates complexity, then routes the actual explanation task to Sonnet for a 10-file module or Opus for a 100-file one. The right model for the job, decided after the work is understood.

6. Fan-out-and-synthesize: many small steps, one merged result

Split a task into many smaller steps. Run an agent on each step in parallel. Synthesize the results into one answer.

The synthesize step is a barrier: it waits for every fan-out agent, then merges their structured outputs. Why this pattern dominates in practice: it solves the "too many things at once" failure of single-context work. Each subagent sees only its piece. The orchestrator never gets distracted by 50 unrelated details.

Use this when:

// Fan out: one agent per file. Barrier: wait for all.
const reviews = await parallel(
  files.map(file => () => agent(
    `Review ${file} for security issues`,
    { model: "haiku", schema: IssueList }
  ))
)

// Synthesize: one Opus agent merges everything.
const report = await agent(
  `Merge these reviews into one prioritized report:\n${JSON.stringify(reviews)}`,
  { model: "opus" }
)

7. Adversarial verification

This is the structural fix for self-preferential bias. For each spawned agent, run a separate spawned agent that adversarially verifies its output against a rubric. The verifier has never seen the original work; it can't favor it.

The pattern matters most for:

The pairing rule: the verifier should know only the rubric and the artifact, not who produced it. Otherwise self-preference creeps back in through hints in the prompt.

8. Generate-and-filter

Generate a number of ideas on a topic, then filter them by a rubric or by verification. Dedupe duplicates. Return only the highest quality, tested ideas.

Where this pattern shines:

The opposite of asking Claude for "the best answer." Asking for the best answer makes Claude commit early. Generate-and-filter makes Claude commit late, after every option has been challenged.

9. Tournament: pairwise comparison beats absolute scoring

Instead of dividing the work, have agents compete on it. Spawn N agents that each attempt the same task using different approaches, then judge the results in pairwise fashion until one wins.

Comparative judgment is more reliable than absolute scoring, especially for taste-based work. Why this beats sort-by-score: trying to sort 1,000 items in one prompt fails on two fronts - quality degrades, and it won't fit in context. A tournament splits the bracket across fresh agents, each comparing just two items.

The bracket itself lives in deterministic loop code, not in context. Each comparison is fast, fair, and isolated. The same idea works for taste-based ranking: design choices, candidate selection, content prioritization.

10. Loop until done

For tasks with an unknown amount of work, loop spawning agents until a stop condition is met - no new findings, no more errors in the logs, theory verified - instead of running a fixed number of passes.

This pattern is the answer to "keep going until it's actually done":

Pair this pattern with /goal to set a hard completion requirement ("don't stop until one theory works") and with /loop if you want the entire workflow itself to run on a recurring schedule. The bracket and the stop condition live in code; only the active iteration stays in context.

11. Compose patterns for real use cases

The patterns rarely appear alone. A real workflow composes 2 to 4 of them. The matrix below pairs each use case with the patterns it tends to use:

The right way to internalize these: identify which failure mode your current task is failing under, then pick the pattern that structurally prevents it. Drift, use fan-out. Self-preference, use adversarial verification. Open-ended, use loop until done. Hard-to-score, use a tournament.

Part 4: Cost, Safety, and Reuse

12. Pair with /goal, /loop, and token budgets

Workflows can be expensive. Three controls turn them from "cool but costly" into "a tool I run unattended."

> ultracode quick adversarial review of this assumption:
  "moving to Postgres eliminates our shard rebalancing."
  Use 5k tokens. /goal don't stop until you have either
  a counterexample or three independent confirmations.

Quoting the Claude Code team directly: "Best practices are still developing. Dynamic workflows often use more tokens, so think carefully about when and how to use them." Most traditional coding tasks do not need a panel of 5 reviewers. Ask yourself: does this task really need more compute? If a regular Claude Code session would finish it in five minutes, you don't need a workflow.

13. Use the quarantine pattern for untrusted input

Any workflow that reads untrusted public content - support tickets, bug reports, user feedback, scraped data - needs to assume that content might contain prompt injection.

The fix: quarantine. Bar the agents that read the untrusted content from taking any high-privilege actions. Separate agents, with no exposure to the raw content, do the acting.

This applies to any workflow that processes user-submitted content (support tickets, bug reports, customer feedback, social media), scrapes public web pages, or runs against output from a third-party API. If the input wasn't written by you or a trusted teammate, quarantine it. A 30-line read-only reader agent costs almost nothing and removes an entire class of prompt injection risk.

14. Save workflows, ship them as Skills

Once a workflow works, save it: press s in the workflow menu. Saved workflows go to ~/.claude/workflows. From there you have two paths:

One practical nuance worth knowing: when you package a workflow into a Skill, prompt Claude to treat the workflow as a template, not a script to run verbatim. That leaves room for Claude to adapt the workflow shape to the specific task at hand while keeping the overall structure intact. Especially useful for workflows like "deep verification" or "triage" that need to flex per use case.

The mistakes that waste tokens on workflows