Dynamic Workflows: How Claude Writes Its Own Harness

2026-06-05 By Sean Weldon

Dynamic Workflows: How Claude Writes Its Own Harness

Most coding work fits neatly inside a single Claude Code session: plan and execute in one context window, ship the result. But long-running, parallel, or adversarial work breaks that model. Dynamic Workflows are the answer - Claude writing a custom harness, in JavaScript, that spawns and coordinates subagents for exactly the task in front of it. This is a working field guide: the mental model, the failure modes workflows solve, the core API, and the patterns worth internalizing.

Part 1: The Mental Model

1. A workflow is a harness Claude writes

The default Claude Code harness has Claude plan and execute in the same context window. For most coding work, this is great. For long-running, parallel, or adversarial work, it breaks down.

A Dynamic Workflow is Claude writing its own custom harness for the task - a JavaScript file with a few special functions that spawn and coordinate subagents, plus standard JavaScript (Math, JSON, Array) to process the data flowing between them.

Three things this gives you that the default harness cannot:

Per-agent isolation. Each subagent gets its own context window with one focused goal. No cross-contamination.
Per-agent model choice. The workflow picks which model each subagent uses - Opus for hard reasoning, Haiku for cheap exploration, Sonnet for the middle.
Per-agent isolation level. Worktree (isolated git checkout) or remote (no checkout). The workflow decides what each agent needs.

Start one by either asking Claude directly ("make a workflow that...") or with the trigger word ultracode. If a workflow is interrupted - user action, terminal quit - resuming the session picks up where it left off.

2. The 3 failure modes workflows solve

To know when a workflow is the right tool, you have to know what it fixes. The longer Claude works on a complex task in a single context window, the more it becomes susceptible to three specific failure modes:

Agentic laziness - Claude stops before finishing a complex, multi-part task and declares done after partial progress. Addresses 20 of the 50 items in a security review and calls the rest "handled."
Self-preferential bias - Claude prefers its own results when asked to verify or judge them against a rubric. A verifier with skin in the game can't be a fair verifier.
Goal drift - the gradual loss of fidelity to the original objective across many turns, especially after compaction. Each summarization step is lossy. "Don't do X" constraints quietly disappear at turn 47.

A workflow solves all three structurally: separate Claudes with their own contexts, focused goals, and isolated state. If your task suffers from any of these patterns, that is the signal to reach for a workflow.

3. Static vs Dynamic workflows

You may have already built static workflows using the Claude Agent SDK or claude -p, coordinating multiple Claude Code instances together.

Static workflows are generic: written once to handle every edge case. They work, but they have to be conservative.
Dynamic Workflows are different: Claude writes this workflow for this task. The harness is tailor-made.

The reason the dynamic version wins isn't the search step - both can search. It's that the workflow gets to shape itself around your context: read your billing code, check each feature against the actual new provider docs, price at your transaction volume, and run an adversarial "why not to migrate" pass against its own emerging answer. A static harness can't do this because it doesn't know your code exists.

Part 2: The Core API

4. agent(), parallel(), pipeline()

Three functions do most of the work in a workflow. Knowing them is enough to read any workflow Claude writes for you and to nudge Claude when you want a specific shape.

parallel() is a barrier: it fans out, then waits for everything before returning. pipeline() is streaming: each item flows through every stage independently.

Pick by the question: do I need all results before I can do anything next? Yes, use parallel. No, use pipeline (cheaper, faster overall).

Part 3: The Patterns

5. Classify-and-act: route the work before doing it

A classifier agent decides on the type of task, then the workflow routes to different agents or behaviors based on the answer. Or a classifier runs at the end, sorting raw outputs into buckets for whatever comes next.

When this pattern earns its keep:

The task is heterogeneous - different sub-types need different treatment.
You want to spend the expensive model only where complexity demands it (classifier on cheap, then route to Opus only when needed).
The decomposition of work is itself non-trivial and benefits from a model deciding the shape.

Example: "Explain how the auth module works." A classifier subagent reads the codebase first, estimates complexity, then routes the actual explanation task to Sonnet for a 10-file module or Opus for a 100-file one. The right model for the job, decided after the work is understood.

6. Fan-out-and-synthesize: many small steps, one merged result

Split a task into many smaller steps. Run an agent on each step in parallel. Synthesize the results into one answer.

The synthesize step is a barrier: it waits for every fan-out agent, then merges their structured outputs. Why this pattern dominates in practice: it solves the "too many things at once" failure of single-context work. Each subagent sees only its piece. The orchestrator never gets distracted by 50 unrelated details.

Use this when:

You have a clearly enumerable list of work items (50 files, 200 endpoints, 100 reviews).
Each item is independent - no item needs another's output to begin.
You want a single consolidated answer at the end, not a pile of partial reports.

// Fan out: one agent per file. Barrier: wait for all.
const reviews = await parallel(
  files.map(file => () => agent(
    `Review ${file} for security issues`,
    { model: "haiku", schema: IssueList }
  ))
)

// Synthesize: one Opus agent merges everything.
const report = await agent(
  `Merge these reviews into one prioritized report:\n${JSON.stringify(reviews)}`,
  { model: "opus" }
)

7. Adversarial verification

This is the structural fix for self-preferential bias. For each spawned agent, run a separate spawned agent that adversarially verifies its output against a rubric. The verifier has never seen the original work; it can't favor it.

The pattern matters most for:

Claim-checking - every factual statement in a report gets its own verifier subagent, checking against the original source.
Code review - the author agent writes the fix, the reviewer agent (separate context) reviews it. Never the same Claude judging itself.
Quality gates - before any artifact ships, an adversary tries to find the weakest case against it. If the adversary can't, you ship.

The pairing rule: the verifier should know only the rubric and the artifact, not who produced it. Otherwise self-preference creeps back in through hints in the prompt.

8. Generate-and-filter

Generate a number of ideas on a topic, then filter them by a rubric or by verification. Dedupe duplicates. Return only the highest quality, tested ideas.

Where this pattern shines:

Brainstorming - 30 product names, then a verifier kills cliches, trademark conflicts, and weak phonetics. You see 3.
Hypothesis generation - 5 different approaches to a problem, then each gets scored against your constraints. The winner has earned it.
Solution design - 5 different approaches to a problem, then each gets scored against your constraints. The winner has earned it.

The opposite of asking Claude for "the best answer." Asking for the best answer makes Claude commit early. Generate-and-filter makes Claude commit late, after every option has been challenged.

9. Tournament: pairwise comparison beats absolute scoring

Instead of dividing the work, have agents compete on it. Spawn N agents that each attempt the same task using different approaches, then judge the results in pairwise fashion until one wins.

Comparative judgment is more reliable than absolute scoring, especially for taste-based work. Why this beats sort-by-score: trying to sort 1,000 items in one prompt fails on two fronts - quality degrades, and it won't fit in context. A tournament splits the bracket across fresh agents, each comparing just two items.

The bracket itself lives in deterministic loop code, not in context. Each comparison is fast, fair, and isolated. The same idea works for taste-based ranking: design choices, candidate selection, content prioritization.

10. Loop until done

For tasks with an unknown amount of work, loop spawning agents until a stop condition is met - no new findings, no more errors in the logs, theory verified - instead of running a fixed number of passes.

This pattern is the answer to "keep going until it's actually done":

Flaky test debugging - reproduce, form theories, test them, until one theory holds.
Bug hunting - keep finding bugs until a full pass returns zero.
Mining for patterns - cluster, identify rules, until no new clusters appear.

Pair this pattern with /goal to set a hard completion requirement ("don't stop until one theory works") and with /loop if you want the entire workflow itself to run on a recurring schedule. The bracket and the stop condition live in code; only the active iteration stays in context.

11. Compose patterns for real use cases

The patterns rarely appear alone. A real workflow composes 2 to 4 of them. The matrix below pairs each use case with the patterns it tends to use:

Migrations and refactors. Fan-out (one agent per callsite/failing test in a worktree), then adversarial verification (a separate agent reviews each fix), then loop until done. This is the pattern Anthropic used to rewrite Bun from Zig to Rust.
Deep research (the /deep-research skill). Fan-out (parallel web searches), then adversarial verification (each claim verified independently), then synthesize (one cited report).
Deep verification of a draft. Identify all factual claims (one agent), then fan-out (one verifier per claim, each agent checks against source), then a meta-verifier (checks the verifier's sources are high quality).
Sorting 1,000+ items. Tournament - pairwise comparison, bucket-rank, or bracket. Comparative judgment, never absolute scoring.
Memory and rule adherence. Verifier per rule (fan-out), then a skeptic persona reviews the rules themselves to avoid false positives.
Root-cause investigation. Generate theories from disjoint evidence (different agents read logs, files, data), then a panel of verifiers and refuters for each theory, then loop until one survives.
Triage at scale. Classify-and-act, then dedupe against existing tickets, then either attempt the fix or escalate. Pair with /loop for continuous triage.
Exploration and taste (design, naming, UI choices). Generate-and-filter (5 to 20 options), then a tournament with a rubric, then rank or pick.
Lightweight evals. Run the candidate in a worktree, comparison agents grade against rubric, then refine and re-grade. Same shape as a tournament but for grading, not ranking.

The right way to internalize these: identify which failure mode your current task is failing under, then pick the pattern that structurally prevents it. Drift, use fan-out. Self-preference, use adversarial verification. Open-ended, use loop until done. Hard-to-score, use a tournament.

Part 4: Cost, Safety, and Reuse

12. Pair with /goal, /loop, and token budgets

Workflows can be expensive. Three controls turn them from "cool but costly" into "a tool I run unattended."

/goal sets a hard completion requirement. Pair it with the loop pattern: "don't stop until one theory works." Without /goal, a workflow stops at a soft completion point. With /goal, it iterates until the actual end condition is met.
/loop runs the entire workflow on a recurring schedule. Use it for workflows you want running continuously: triage, weekly research updates, recurring verification.
Explicit token budgets. Tell Claude in the prompt: "use 10k tokens." This sets a cap on the workflow run. Without a cap, an ambitious workflow can balloon to 5-10x the tokens you expected.

> ultracode quick adversarial review of this assumption:
  "moving to Postgres eliminates our shard rebalancing."
  Use 5k tokens. /goal don't stop until you have either
  a counterexample or three independent confirmations.

Quoting the Claude Code team directly: "Best practices are still developing. Dynamic workflows often use more tokens, so think carefully about when and how to use them." Most traditional coding tasks do not need a panel of 5 reviewers. Ask yourself: does this task really need more compute? If a regular Claude Code session would finish it in five minutes, you don't need a workflow.

13. Use the quarantine pattern for untrusted input

Any workflow that reads untrusted public content - support tickets, bug reports, user feedback, scraped data - needs to assume that content might contain prompt injection.

The fix: quarantine. Bar the agents that read the untrusted content from taking any high-privilege actions. Separate agents, with no exposure to the raw content, do the acting.

This applies to any workflow that processes user-submitted content (support tickets, bug reports, customer feedback, social media), scrapes public web pages, or runs against output from a third-party API. If the input wasn't written by you or a trusted teammate, quarantine it. A 30-line read-only reader agent costs almost nothing and removes an entire class of prompt injection risk.

14. Save workflows, ship them as Skills

Once a workflow works, save it: press s in the workflow menu. Saved workflows go to ~/.claude/workflows. From there you have two paths:

Keep it local - reuse it across your own projects.
Ship it as a Skill - bundle the JavaScript file inside a Skill folder, reference it in SKILL.md, and anyone who installs the Skill runs the same workflow.

One practical nuance worth knowing: when you package a workflow into a Skill, prompt Claude to treat the workflow as a template, not a script to run verbatim. That leaves room for Claude to adapt the workflow shape to the specific task at hand while keeping the overall structure intact. Especially useful for workflows like "deep verification" or "triage" that need to flex per use case.

The mistakes that waste tokens on workflows

Reaching for a workflow when a regular Claude Code session would do. Most traditional coding tasks don't need a panel of 5 reviewers.
No token budget. Ambitious workflows balloon to 5-10x what you expected without an explicit cap.
One agent doing both the work and the verification. Self-preferential bias makes the verifier favor the worker. They must be separate.
Treating parallel() and pipeline() as interchangeable. The barrier matters: parallel waits for all, pipeline streams.
Skipping /goal on loop patterns. The workflow stops early at the first soft completion point. /goal forces hard completion.
Letting untrusted content reach the actor. Quarantine isn't optional once you process anything user-submitted.
Sorting with absolute scores. Comparative judgment is more reliable. Use a tournament.
Never saving working workflows. Re-prompting the same shape every week. Save with s, ship as a Skill.