"I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."
That's Boris Cherny, head of Claude Code at Anthropic. Peter Steinberger, creator of OpenClaw, put it even more directly: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."
For two years, the way you got value from a coding agent was manual. You typed a prompt, read the diff, typed the next prompt. The agent was a tool and you held it the entire time.
That posture is shifting. Nine out of ten developers have never written a single loop. The highest-leverage work moved from writing individual prompts to designing the system that generates and verifies them automatically.
This is loop engineering. It's not a product you buy. It's a shift in where you put your effort: from typing prompts to designing the system that types them. In this tutorial, you'll learn exactly what makes a loop worth building, the five building blocks (plus the sixth everyone forgets), and you'll write the code for a minimum viable loop that actually ships work while you sleep.
What Is a Loop, Actually?
A loop is not a long prompt. It's a small system with six parts that finds work, hands it to an agent, checks the result, records what happened, and decides what to do next — without you in the chair.
Google engineer Addy Osmani, who popularized the term, defines it simply: "Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead."
The model already runs an inner loop on every turn: it reasons, acts, observes, and reasons again. Loop engineering sits one floor above that. You build an outer loop that runs on a schedule, spawns helpers, and feeds itself across many sessions.
The economics are serious. Agents consume roughly four times as many tokens as standard chat interactions, and up to fifteen times as many in multi-agent setups. A badly designed loop doesn't just waste tokens — it ships code you don't understand, creating comprehension debt: the growing gap between what the repository contains and what you actually understand.
Before you write a single file, run the test.
The 4-Condition Test: Does This Task Deserve a Loop?
Loops earn their cost under four conditions. Miss one and the loop costs more than it returns.
1. The task repeats. A loop amortizes its setup cost across many runs. For a one-time job, a good prompt is faster and cheaper. If the work doesn't recur at least weekly, you don't have a loop — you have a script you ran once.
2. Verification is automated. The loop needs something that can reject bad output without you in the room. A test suite, a type checker, a linter, or a build step. No automated gate means the agent grades its own homework.
3. Your token budget can absorb the waste. Loops re-read context, retry, and explore. That burns tokens whether or not the run ships anything. On a metered plan with hard limits, a heavy verification loop can hit your cap before it delivers value.
4. The agent has a senior engineer's tools. It needs logs, a reproduction environment, and the ability to run the code it changes. Without these, the loop iterates blind.
If your task passes all four, continue. If not, stick to manual prompts. A single well-aimed prompt still wins for one-off tasks, exploratory work, or anything where "done" is a judgment call.
The Five Building Blocks (Plus the Sixth Everyone Forgets)
A working loop needs five capabilities and one place to remember state.
-
Automations — the heartbeat. They fire on a schedule, discover work, and hand it to the agent. Without a schedule, you have a one-off script. With one, you have a loop.
-
Worktrees — isolation. A
git worktreegives each agent its own checkout on its own branch. Two agents literally cannot overwrite each other. Your review bandwidth is still the ceiling, but worktrees remove the mechanical collision. -
Skills — codified project knowledge. A
SKILL.mdfile tells the agent your conventions, build steps, and the "we don't do it like this because of that one incident" lore. Written once, read by every run. -
Connectors (MCP) — the loop needs to touch real tools: GitHub for PRs, Linear for tickets, Slack for notifications. MCP servers plug the agent into your actual environment, not just the filesystem.
-
Sub-agents — the maker-checker split. The agent that wrote the code is "way too nice grading its own homework." A separate verifier sub-agent with different instructions catches what the first one talked itself into.
6. Memory (the state file). The model's context window resets every session. A STATE.md file or a Linear board holds what's done and what's next. Tomorrow's run resumes instead of restarting. This sounds too dumb to matter — it's actually the spine of every working loop.
Good Loops vs. Bad Loops
Good first loops:
- CI failure triage — nightly scan of failures, classify causes, draft fix PRs for the easy ones
- Dependency bump PRs — weekly scan for updates, test compatibility, open PRs
- Lint-and-fix passes — on every PR open event, apply style fixes automatically
- Flaky test reproduction — loop until a theory survives the test
- Issue-to-PR drafts on code with strong tests, where bad output gets rejected by the suite
Bad first loops (need a human in the chair):
- Architecture rewrites
- Auth or payments code
- Production deploys
- Vague product work
- Anything where "done" is a judgment call
The Ralph Wiggum Loop: What Failure Looks Like
Engineer Geoffrey Huntley documented this failure mode and named it. An agent meant to emit a completion token only when finished emits it early, and the loop exits on a half-done job.
The Ralph Wiggum loop happens when:
- No real verifier — just a second agent asked to "review" with no objective signal. Two optimists agreeing.
- Soft completion conditions — "done" defined by the agent's judgment, not by a test, build, or type check.
- No hard stops — the loop continues until something external kills it rather than until success is verified.
The fix is the gate: something objective that can fail the work. A test that passes or fails. A build that compiles or doesn't. Not a verifier that has an opinion.
Build Your First Loop: The Code
Let's build a morning lint-fix loop for an Express.js API. The loop scans for ESLint errors, attempts fixes, runs the test suite, and records results — all while you sleep.
Project Structure
my-api/
├── .claude/
│ └── skills/
│ └── lint-fix/
│ └── SKILL.md
├── loops/
│ ├── daily-lint.sh
│ ├── STATE.md
│ └── verify.sh
├── src/ (your existing code)
└── package.json
Step 1: The Skill File
Create .claude/skills/lint-fix/SKILL.md:
---
name: lint-fix
description: Run ESLint, attempt auto-fixes, run tests, record results.
---
## Project conventions
- ES modules ("type": "module" in package.json)
- Tests: `npm test` (Jest)
- Lint: `npm run lint` (ESLint)
## Workflow
1. Run `npm run lint` to see current errors
2. Run `npm run lint:fix` for safe auto-fixes
3. Run `npm test` to verify nothing broke
4. Record counts in `loops/STATE.md`
## Safety rules
- Never modify `.eslintrc.js` without human approval
- If tests fail after a fix, revert and try a narrower fix
- Stop after 3 failed attempts on the same file
The front matter gives the agent a handle. The body is the checklist you'd have typed manually. Now the system reads it automatically every run.
Step 2: The State File
Create loops/STATE.md:
# Loop State
## Last run
2026-06-15T09:00:00Z | status: success
Errors found: 12 | Fixed: 8 | Tests: 47/47
Notes: Fixed unused imports in auth.js, router.js
## In progress
- src/services/payment.js — camelcase warnings, legacy snake_case vars
## Lessons learned
- 2026-06-14: ESLint v9 flat config needs `eslint.config.mjs`, not `.eslintrc.js`
This is the loop's memory. Without it, every run is day one.
Step 3: The Automation Script
Create loops/daily-lint.sh:
#!/usr/bin/env bash
set -euo pipefail
PROJECT="$(cd "$(dirname "$0")/.." && pwd)"
WT="$PROJECT/../my-api-loop"
BRANCH="loop/lint-$(date +%Y%m%d-%H%M)"
# Create isolated worktree
git -C "$PROJECT" worktree add "$WT" -b "$BRANCH"
# Run the loop
cd "$WT"
echo "=== Loop run: $BRANCH ==="
# Invoke Claude Code with /goal — stops only when tests pass
claude --worktree "$WT" \
"/goal All ESLint errors fixed. npm test passes with zero failures." \
"Load skill lint-fix. Read loops/STATE.md. Run lint, fix errors, run tests."
# Verify the work (the gate)
bash "$PROJECT/loops/verify.sh"
# Cleanup
git -C "$PROJECT" worktree remove "$WT" --force
echo "Done: $(date)" >> "$PROJECT/loops/STATE.md"
Step 4: The Verification Gate
Create loops/verify.sh — the objective gate that prevents Ralph Wiggum loops:
#!/usr/bin/env bash
# Hard gate: lint must be clean, tests must pass
echo "--- Verification Gate ---"
npm run lint -- --max-warnings 0
LINT_EXIT=$?
npm test
TEST_EXIT=$?
if [ $LINT_EXIT -eq 0 ] && [ $TEST_EXIT -eq 0 ]; then
echo "✅ GATE PASSED: lint clean, tests green"
exit 0
else
echo "❌ GATE FAILED: lint=$LINT_EXIT tests=$TEST_EXIT"
exit 1
fi
This script returns a non-zero exit code if anything fails. The automation script checks this before cleaning up. No subjective "looks good to me" — just a binary pass/fail.
Step 5: Schedule It
Add to crontab for daily morning runs:
0 8 * * 1-5 /path/to/my-api/loops/daily-lint.sh >> /var/log/loop-lint.log 2>&1
For more reliability, wrap it in a systemd timer that survives reboots and captures logs properly.
Comprehension Debt: The Hidden Cost
The faster the loop ships code you didn't write, the larger the gap between what the repository contains and what you understand.
Mitigations:
- Read the diffs. If you don't read what the loop ships, you're renting comprehension debt at compound interest.
- Spot-check the gate. Pick a few PRs the loop opened and verify the test that approved them actually catches the failure mode you care about. Gates rot.
- Block the loop from architecture work. Keep it on small, machine-checkable changes. The moment you let it touch judgment calls, comprehension debt accelerates.
Should You Build a Loop?
Most developers don't need one yet. The honest version of this story is not that everyone should rush to build loops. The four-condition test exists for a reason. Miss one and the loop costs more than it returns.
But if your task repeats, your verification is automated, your budget can absorb the waste, and your agent has real tools — build small. One automation. One skill. One state file. One gate. Get a manual run reliable first, then wrap it in a loop. Order matters.
The leverage point moved. Your job did too, if you choose it.
Sources
- Addy Osmani, "Loop Engineering" — https://addyosmani.com/blog/loop-engineering/
- Anthropic Engineering, "Effective harnesses for coding agents" — https://www.anthropic.com/engineering/effective-harnesses-for-coding-agents
- 0xMovez, "Loop engineering: the 14-step roadmap from prompter to loop designer" — Substack
- Geoffrey Huntley, "The Ralph Wiggum loop" — on loop engineering failure modes