Loop Your Coding Agent for Hours
Ralph Wigum is a devilishly simple approach to running coding agents using a basic for loop that allows AI to work through a backlog of tasks autonomously, p...
By Sean WeldonLoop Your Coding Agent for Hours: The Ralph Wigum Approach to Autonomous AI Development
TL;DR
Ralph Wigum is a devilishly simple approach to autonomous AI coding that uses a bash for loop to run coding agents through a task backlog iteratively. Instead of complex multi-agent orchestration or rigid multi-phase plans, Ralph completes one small task per iteration with robust feedback loops including TypeScript type checking and unit tests, producing working code that passes CI on every git commit.
Key Takeaways
Running multiple AI agents simultaneously creates merge conflicts and dependency issues, while single-pass approaches fail when task scope exceeds context window limits—Ralph solves this by processing one small task per iteration.
Ralph uses two simple files: prd.json (product requirements with passes flags) and progress.txt (LLM sprint memory), with each iteration producing one git commit representing a completed feature.
LLMs degrade significantly as token count increases, making small uniformly-sized tasks critical for maintaining code quality and leaving context window budget for verification and testing.
Enforcing TypeScript type checking and unit tests on every iteration prevents LLMs from committing broken code and losing context about failure origins—strong typing is essential for reliability.
Ralph shifts focus from "how it's going to be done" to "what needs to be done," putting users in the role of requirements gatherer rather than anal retentive planner with rigid multi-phase dependencies.
What Problems Do Traditional AI Coding Approaches Face?
Running 16 AI agents simultaneously on different tasks sounds productive in theory, but the reality involves merge conflicts and dependency nightmares. Each agent makes changes that conflict with others, creating integration chaos that requires constant human intervention.
The alternative—having a single AI work through all tasks collectively—fails for different reasons. Tasks often exceed what fits in a single context window, causing the LLM to lose track of requirements and implementation details. The model simply can't hold enough information to complete comprehensive changes effectively.
Multi-phase plans present their own challenges. Adding new requirements means recalculating dependencies and squeezing items between existing phases. This planning-heavy approach contradicts how engineers actually work: real developers pull tasks iteratively from a backlog rather than following predetermined sequences that become outdated the moment requirements change.
How Does the Ralph Loop Actually Work?
Ralph executes as a bash script with the syntax ralph.sh [max_iterations], running until tasks complete or hitting the iteration limit as a backstop. What if I told you that the way to get this to work is with a for loop? That's exactly what Ralph is—a bash loop that runs a coding agent repeatedly until completion.
The system relies on two key files. prd.json serves as the product requirements document, containing a JSON array of user stories with boolean passes flags tracking completion status. progress.txt functions as the LLM's sprint memory, appended (never rewritten) each iteration to maintain context across the loop.
Each iteration follows a fixed sequence:
- The LLM selects the highest priority incomplete feature (not necessarily the first in the list)
- Works exclusively on that single feature
- Updates the PRD with completion status
- Appends progress notes to progress.txt
- Makes a git commit with all changes
The loop exits when the LLM outputs "promise complete here" after detecting all PRD items have their passes flags set to true. Each git commit represents one completed feature, creating queryable git history the LLM can reference for context in future iterations.
Why Does Task Sizing Matter So Much?
Small, uniformly-sized tasks prevent the LLM from biting off more than it can chew. LLMs get really stupid as you add more tokens to the context window and you'd produce crappier code as a result. This isn't a minor degradation—it's a fundamental limitation of how these models process information.
All tasks in the PRD should be similarly sized to avoid one enormous task swallowing the entire context window. Large tasks consume available context, leaving insufficient budget for verification, testing, and viewing implementation details. Working on a single feature per iteration produces better code quality and allows focused testing.
Small changes leave context window budget for the LLM to actually verify the code works through testing. The constraint forces meaningful scope limitation rather than attempting comprehensive changes that exceed model capabilities. This approach mirrors how human developers work best—completing focused changes that can be thoroughly reviewed and tested before moving forward.
What Feedback Loops Make Ralph Reliable?
TypeScript type checking via pnpm type check and unit tests via pnpm test must pass on every commit. These quality gates prevent the LLM from committing broken code and losing memory of where problems originated. CI must stay green—non-negotiable.
Browser automation tools like Playwright's MCP Server enable end-to-end testing as a human user would experience the application. Robust feedback loops prove essential because Claude's tendency to mark a feature as complete without proper testing undermines the entire process. But Claude did much better at verifying features end to end once explicitly prompted to use browser automation tools and do all testing as a human user would.
Strong typing is critical—you want types, types, types, types, types. Absolutely. You want the strongest types you can get. TypeScript catches errors at compile time that would otherwise slip through as runtime failures, providing immediate feedback the LLM can use to correct course before committing changes.
What Are the Different Ways to Run Ralph?
AFK Ralph (away from keyboard) runs overnight autonomously through the entire backlog with a maximum iteration limit. You launch it before bed and wake up to completed features—or at least as many as the model could complete within the iteration budget. A WhatsApp notification CLI sends a message when AFK Ralph completes after X iterations.
Human-in-the-loop Ralph (ralph_once.sh) runs a single iteration in an interactive terminal for difficult features requiring steering. This version proves useful for learning Ralph's capabilities and understanding what it's doing. You can observe the decision-making process, see which feature it prioritizes, and understand how it approaches implementation.
Both versions use essentially the same prompt, just different execution modes. The human-in-the-loop version makes you more productive even when you're present than creating multi-phase plans. Ralph shifts focus from "how it's going to be done" to "what needs to be done and how it should behave," putting you in the seat of requirements gatherer and product designer rather than anal retentive planner.
What Makes Ralph More Usable Than Alternatives?
Adding new tasks is simple—just add a specification for what the feature should look like at the end. No dependency mapping, no phase insertion, no recalculating the entire plan. The loop concept of taking stuff off the board feels familiar and intuitive compared to multi-phase plans that require extensive upfront planning.
Ralph puts you in the role of product designer rather than implementation planner. You describe desired behavior and acceptance criteria without prescribing the exact implementation path. The LLM figures out how to achieve the requirements, leveraging its understanding of the codebase and best practices.
The prompt explicitly tells the LLM to choose the highest priority feature, not necessarily the first in the list. This prevents the model from always choosing the first item and allows proper prioritization based on dependencies and importance. The flexibility makes Ralph adaptable to changing requirements without restructuring the entire approach.
What Technical Requirements Does Ralph Need?
Ralph requires a really good underlying coding model—Opus 4.5 and GPT 5.2 make simpler approaches viable. Earlier models might struggle with the autonomous decision-making and quality standards Ralph demands. The implementation uses Claude Code invoked via CLI, but works with any LLM coding agent including OpenAI Code and Codex.
The script implements as a bash for loop with error mode set to throw on errors. Any failure in type checking or testing halts execution, preventing the LLM from proceeding with broken code. The PRD format uses JSON with objects containing user stories and boolean passes flags to track completion.
The prompt instructs "append your progress" rather than "update" to prevent the LLM from rewriting the entire progress.txt file. This preserves the complete history of the sprint rather than having the model summarize or lose context. The exit condition checks if output contains the string "promise complete here," signaling all requirements have been satisfied.
What the Experts Say
"LLMs get really stupid as you add more tokens to the context window and you'd produce crappier code as a result."
This observation captures why task sizing matters so fundamentally to Ralph's success. Context window management isn't optional—it's the difference between coherent, working code and confused implementations that fail basic requirements.
"Claude's tendency to mark a feature as complete without proper testing. But it did much better at verifying features end to end once explicitly prompted to use browser automation tools and do all testing as a human user would."
This quote highlights why feedback loops and explicit testing instructions are non-negotiable. LLMs need guardrails and clear expectations about verification standards, not just implementation instructions.
Frequently Asked Questions
Q: How is Ralph different from running multiple AI agents in parallel?
Ralph runs a single agent iteratively through tasks one at a time, avoiding the merge conflicts and dependency issues that plague multi-agent approaches. Each iteration produces one git commit with a completed feature, maintaining clean history and preventing integration nightmares that require constant human intervention to resolve.
Q: What files does Ralph use to track progress and requirements?
Ralph uses two key files: prd.json containing a JSON array of user stories with boolean passes flags, and progress.txt serving as the LLM's sprint memory. The PRD tracks what needs to be done and completion status, while progress.txt maintains context across iterations by appending (never rewriting) notes from each loop.
Q: Why does Ralph focus on small tasks instead of comprehensive changes?
LLMs degrade significantly as token count increases, making small uniformly-sized tasks critical for code quality. Large tasks consume the entire context window, leaving no budget for verification, testing, and viewing implementation details. Small changes allow the LLM to thoroughly test and verify each feature works correctly before committing.
Q: What quality checks does Ralph enforce on every iteration?
Ralph requires TypeScript type checking via pnpm type check and unit tests via pnpm test to pass before any commit. CI must stay green, preventing the LLM from committing broken code and losing context about failure origins. Browser automation tools like Playwright enable end-to-end testing from a user perspective.
Q: Can I run Ralph overnight without supervision?
Yes, AFK Ralph runs autonomously through your entire backlog with a maximum iteration limit as a backstop. You can launch it before bed and receive a WhatsApp notification when it completes. The robust feedback loops ensure CI stays green, though you should review the commits to understand what was implemented.
Q: How does Ralph decide which task to work on next?
The prompt explicitly instructs the LLM to choose the highest priority feature, not necessarily the first in the list. This prevents always choosing the first item and allows proper prioritization based on dependencies and importance. The LLM evaluates incomplete tasks and selects the most critical one each iteration.
Q: What happens if Ralph fails to complete all tasks within the iteration limit?
The loop exits when hitting max_iterations, leaving remaining tasks incomplete with their passes flags set to false. You can review progress.txt to understand what was accomplished, examine the git commits to see completed features, and either restart Ralph with a higher iteration limit or switch to human-in-the-loop mode for difficult remaining tasks.
Q: Do I need to use TypeScript and specific testing tools for Ralph to work?
While the example uses TypeScript and pnpm, Ralph works with any language and tooling that provides strong typing and automated testing. The critical requirement is robust feedback loops that catch errors before committing. Strong typing is essential—the stronger the types, the better Ralph performs at catching errors during development.
The Bottom Line
Ralph Wigum proves that autonomous AI coding doesn't require complex orchestration systems or rigid multi-phase plans—just a simple bash for loop with robust feedback mechanisms. The approach works because it mirrors how engineers actually work: pulling tasks iteratively from a backlog, completing focused changes, and verifying each works before moving forward.
This matters because it shifts your role from implementation planner to product designer. You describe what features should look like at the end rather than prescribing exact implementation paths. Ralph handles the how while you focus on the what, making you more productive even in human-in-the-loop mode than creating detailed multi-phase plans.
Start with human-in-the-loop Ralph to understand how it works and what it's capable of completing autonomously. Once you're comfortable with its decision-making and output quality, graduate to AFK Ralph for overnight development sessions. The key is strong typing, robust testing, and small uniformly-sized tasks—get those right, and you can loop your coding agent for hours.
About the Author
Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.