Loop Engineering: Build Your First Autonomous AI Coding Loop

"I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."

That's Boris Cherny, head of Claude Code at Anthropic. Peter Steinberger, creator of OpenClaw, put it even more directly: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

For two years, the way you got value from a coding agent was manual. You typed a prompt, read the diff, typed the next prompt. The agent was a tool and you held it the entire time.

That posture is shifting. Nine out of ten developers have never written a single loop. The highest-leverage work moved from writing individual prompts to designing the system that generates and verifies them automatically.

This is loop engineering. It's not a product you buy. It's a shift in where you put your effort: from typing prompts to designing the system that types them. In this tutorial, you'll learn exactly what makes a loop worth building, the five building blocks (plus the sixth everyone forgets), and you'll write the code for a minimum viable loop that actually ships work while you sleep.

What Is a Loop, Actually?

A loop is not a long prompt. It's a small system with six parts that finds work, hands it to an agent, checks the result, records what happened, and decides what to do next — without you in the chair.

Google engineer Addy Osmani, who popularized the term, defines it simply: "Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead."

The model already runs an inner loop on every turn: it reasons, acts, observes, and reasons again. Loop engineering sits one floor above that. You build an outer loop that runs on a schedule, spawns helpers, and feeds itself across many sessions.

The economics are serious. Agents consume roughly four times as many tokens as standard chat interactions, and up to fifteen times as many in multi-agent setups. A badly designed loop doesn't just waste tokens — it ships code you don't understand, creating comprehension debt: the growing gap between what the repository contains and what you actually understand.

Before you write a single file, run the test.

The 4-Condition Test: Does This Task Deserve a Loop?

Loops earn their cost under four conditions. Miss one and the loop costs more than it returns.

1. The task repeats. A loop amortizes its setup cost across many runs. For a one-time job, a good prompt is faster and cheaper. If the work doesn't recur at least weekly, you don't have a loop — you have a script you ran once.

2. Verification is automated. The loop needs something that can reject bad output without you in the room. A test suite, a type checker, a linter, or a build step. No automated gate means the agent grades its own homework.

3. Your token budget can absorb the waste. Loops re-read context, retry, and explore. That burns tokens whether or not the run ships anything. On a metered plan with hard limits, a heavy verification loop can hit your cap before it delivers value.

4. The agent has a senior engineer's tools. It needs logs, a reproduction environment, and the ability to run the code it changes. Without these, the loop iterates blind.

If your task passes all four, continue. If not, stick to manual prompts. A single well-aimed prompt still wins for one-off tasks, exploratory work, or anything where "done" is a judgment call.

The Five Building Blocks (Plus the Sixth Everyone Forgets)

A working loop needs five capabilities and one place to remember state.

Automations — the heartbeat. They fire on a schedule, discover work, and hand it to the agent. Without a schedule, you have a one-off script. With one, you have a loop.
Worktrees — isolation. A git worktree gives each agent its own checkout on its own branch. Two agents literally cannot overwrite each other. Your review bandwidth is still the ceiling, but worktrees remove the mechanical collision.
Skills — codified project knowledge. A SKILL.md file tells the agent your conventions, build steps, and the "we don't do it like this because of that one incident" lore. Written once, read by every run.
Connectors (MCP) — the loop needs to touch real tools: GitHub for PRs, Linear for tickets, Slack for notifications. MCP servers plug the agent into your actual environment, not just the filesystem.
Sub-agents — the maker-checker split. The agent that wrote the code is "way too nice grading its own homework." A separate verifier sub-agent with different instructions catches what the first one talked itself into.

6. Memory (the state file). The model's context window resets every session. A STATE.md file or a Linear board holds what's done and what's next. Tomorrow's run resumes instead of restarting. This sounds too dumb to matter — it's actually the spine of every working loop.

Good Loops vs. Bad Loops

Good first loops:

CI failure triage — nightly scan of failures, classify causes, draft fix PRs for the easy ones
Dependency bump PRs — weekly scan for updates, test compatibility, open PRs
Lint-and-fix passes — on every PR open event, apply style fixes automatically
Flaky test reproduction — loop until a theory survives the test
Issue-to-PR drafts on code with strong tests, where bad output gets rejected by the suite

Bad first loops (need a human in the chair):

Architecture rewrites
Auth or payments code
Production deploys
Vague product work
Anything where "done" is a judgment call

The Ralph Wiggum Loop: What Failure Looks Like

Engineer Geoffrey Huntley documented this failure mode and named it. An agent meant to emit a completion token only when finished emits it early, and the loop exits on a half-done job.

The Ralph Wiggum loop happens when:

No real verifier — just a second agent asked to "review" with no objective signal. Two optimists agreeing.
Soft completion conditions — "done" defined by the agent's judgment, not by a test, build, or type check.
No hard stops — the loop continues until something external kills it rather than until success is verified.

The fix is the gate: something objective that can fail the work. A test that passes or fails. A build that compiles or doesn't. Not a verifier that has an opinion.

Build Your First Loop: The Code

Let's build a morning lint-fix loop for an Express.js API. The loop scans for ESLint errors, attempts fixes, runs the test suite, and records results — all while you sleep.

Project Structure

my-api/
├── .claude/
│   └── skills/
│       └── lint-fix/
│           └── SKILL.md
├── loops/
│   ├── daily-lint.sh
│   ├── STATE.md
│   └── verify.sh
├── src/              (your existing code)
└── package.json

Step 1: The Skill File

Create .claude/skills/lint-fix/SKILL.md:

---
name: lint-fix
description: Run ESLint, attempt auto-fixes, run tests, record results.
---

## Project conventions
- ES modules ("type": "module" in package.json)
- Tests: `npm test` (Jest)
- Lint: `npm run lint` (ESLint)

## Workflow
1. Run `npm run lint` to see current errors
2. Run `npm run lint:fix` for safe auto-fixes
3. Run `npm test` to verify nothing broke
4. Record counts in `loops/STATE.md`

## Safety rules
- Never modify `.eslintrc.js` without human approval
- If tests fail after a fix, revert and try a narrower fix
- Stop after 3 failed attempts on the same file

The front matter gives the agent a handle. The body is the checklist you'd have typed manually. Now the system reads it automatically every run.

Step 2: The State File

Create loops/STATE.md:

# Loop State

## Last run
2026-06-15T09:00:00Z | status: success
Errors found: 12 | Fixed: 8 | Tests: 47/47
Notes: Fixed unused imports in auth.js, router.js

## In progress
- src/services/payment.js — camelcase warnings, legacy snake_case vars

## Lessons learned
- 2026-06-14: ESLint v9 flat config needs `eslint.config.mjs`, not `.eslintrc.js`

This is the loop's memory. Without it, every run is day one.

Step 3: The Automation Script

Create loops/daily-lint.sh:

#!/usr/bin/env bash
set -euo pipefail

PROJECT="$(cd "$(dirname "$0")/.." && pwd)"
WT="$PROJECT/../my-api-loop"
BRANCH="loop/lint-$(date +%Y%m%d-%H%M)"

# Create isolated worktree
git -C "$PROJECT" worktree add "$WT" -b "$BRANCH"

# Run the loop
cd "$WT"
echo "=== Loop run: $BRANCH ==="

# Invoke Claude Code with /goal — stops only when tests pass
claude --worktree "$WT" \
  "/goal All ESLint errors fixed. npm test passes with zero failures." \
  "Load skill lint-fix. Read loops/STATE.md. Run lint, fix errors, run tests."

# Verify the work (the gate)
bash "$PROJECT/loops/verify.sh"

# Cleanup
git -C "$PROJECT" worktree remove "$WT" --force
echo "Done: $(date)" >> "$PROJECT/loops/STATE.md"

Step 4: The Verification Gate

Create loops/verify.sh — the objective gate that prevents Ralph Wiggum loops:

#!/usr/bin/env bash
# Hard gate: lint must be clean, tests must pass
echo "--- Verification Gate ---"

npm run lint -- --max-warnings 0
LINT_EXIT=$?

npm test
TEST_EXIT=$?

if [ $LINT_EXIT -eq 0 ] && [ $TEST_EXIT -eq 0 ]; then
  echo "✅ GATE PASSED: lint clean, tests green"
  exit 0
else
  echo "❌ GATE FAILED: lint=$LINT_EXIT tests=$TEST_EXIT"
  exit 1
fi

This script returns a non-zero exit code if anything fails. The automation script checks this before cleaning up. No subjective "looks good to me" — just a binary pass/fail.

Step 5: Schedule It

Add to crontab for daily morning runs:

0 8 * * 1-5 /path/to/my-api/loops/daily-lint.sh >> /var/log/loop-lint.log 2>&1

For more reliability, wrap it in a systemd timer that survives reboots and captures logs properly.

Comprehension Debt: The Hidden Cost

The faster the loop ships code you didn't write, the larger the gap between what the repository contains and what you understand.

Mitigations:

Read the diffs. If you don't read what the loop ships, you're renting comprehension debt at compound interest.
Spot-check the gate. Pick a few PRs the loop opened and verify the test that approved them actually catches the failure mode you care about. Gates rot.
Block the loop from architecture work. Keep it on small, machine-checkable changes. The moment you let it touch judgment calls, comprehension debt accelerates.

Should You Build a Loop?

Most developers don't need one yet. The honest version of this story is not that everyone should rush to build loops. The four-condition test exists for a reason. Miss one and the loop costs more than it returns.

But if your task repeats, your verification is automated, your budget can absorb the waste, and your agent has real tools — build small. One automation. One skill. One state file. One gate. Get a manual run reliable first, then wrap it in a loop. Order matters.

The leverage point moved. Your job did too, if you choose it.

Sources

Addy Osmani, "Loop Engineering" — https://addyosmani.com/blog/loop-engineering/
Anthropic Engineering, "Effective harnesses for coding agents" — https://www.anthropic.com/engineering/effective-harnesses-for-coding-agents
0xMovez, "Loop engineering: the 14-step roadmap from prompter to loop designer" — Substack
Geoffrey Huntley, "The Ralph Wiggum loop" — on loop engineering failure modes