claude-overnight

Run 10, 100, or 1000 Claude agents overnight. Come back to shipped work.

Describe what to build. Set a budget. The tool plans, explores your codebase, breaks the objective into tasks, launches parallel agents in isolated git worktrees, iterates toward quality, and handles rate limits automatically. You press Run once, then go to sleep.

Built on the Claude Agent SDK. Works with Claude Opus, Sonnet, and Haiku.

Install

npm install -g claude-overnight

Requires Node.js >= 20 and Claude authentication (claude auth login, or set ANTHROPIC_API_KEY).

Claude Code plugin

This repo also ships a Claude Code plugin so any Claude instance (inside this repo or any other) knows how to use, inspect, and resume claude-overnight runs:

/plugin marketplace add Fornace/claude-overnight
/plugin install claude-overnight

Quick start

claude-overnight

🌙  claude-overnight
────────────────────────────────────

① What should the agents do?
  > refactor auth, add tests, update docs

② Budget [10]: 200

③ Worker model:
  ● Sonnet — Sonnet 4.6 · Best for everyday tasks
  ○ Opus — Opus 4.6 · Most capable

④ Usage cap:
  ● 90% · leave 10% for other work

⑤ Allow extra usage (billed separately):
  ● No · stop when plan limits are reached

╭──────────────────────────────────────────────────╮
│  sonnet · budget 200 · 5× · flex · cap 90% · no extra  │
╰──────────────────────────────────────────────────╯

⠹ 8s · $0.04 · 12% · identifying themes   ← every phase shows cost + usage
✓ 5 themes → review, press Run, walk away

◆ Thinking: 5 agents exploring...         ← architects analyze your codebase
◆ Orchestrating plan...                   ← synthesizes 50 concrete tasks
◆ Wave 1 · 50 tasks · $4.20 spent        ← fully autonomous from here
  ↑ 1.2M in  ↓ 340K out  $4.20 / $4.24 total
◆ Assessing... how close to amazing?
◆ Wave 2 · 30 tasks · $18.50 spent       ← improvements from assessment
◆ Reflection: 2 agents reviewing          ← deep quality audit
◆ Wave 3 · 20 tasks · $31.00 spent       ← fixes from review findings
◆ Assessing... ✓ Vision met

You interact once (objective, budget, model, review themes), then everything runs autonomously — thinking, planning, executing, reflecting, steering. Rate-limited? It waits and retries. Crash? Resume where you left off. Capped at usage limit? Pick up next time with full context preserved.

How it works

1. Thinking wave

For budgets > 15, the tool launches architect agents that explore your codebase before any code is written. Each one gets a different research angle (architecture, data models, APIs, testing, etc.) and writes a structured design document. The number scales with budget: 5 for budget=50, 10 for budget=2000.

2. Orchestration

An orchestrator agent reads all design documents and synthesizes concrete execution tasks — grounded in real files and patterns the architects found. No guesswork. The task plan is also written to a file for resilience — if orchestration is interrupted, partial results survive.

3. Iterative execution

Tasks run in parallel (each agent in its own git worktree). After completing its task, each agent automatically runs a simplify pass — reviewing its own git diff for code reuse opportunities, quality issues, and inefficiencies, then fixing them before the framework commits.

After each wave, steering assesses: "how good is this?" — not "what's missing?" It can:

Execute more tasks to build features, fix bugs, polish UX
Reflect by spinning up 1-2 review agents for deep quality/architecture audits
Declare done when the vision is met at high quality

4. Goal refinement

The tool starts with your broad objective but evolves its definition of "amazing" as it learns your codebase. Steering refines the goal after each wave. Late waves are informed by early discoveries.

5. Three-layer context

Long runs stay sharp because steering maintains three layers of memory:

Status — a living project snapshot, updated every wave. Compressed, never truncated.
Milestones — strategic snapshots archived every ~5 waves. Long-term memory.
Goal — the evolving north star. What "amazing" means for this codebase.

Run history and resume

Every run gets its own folder in .claude-overnight/runs/. Nothing is ever overwritten.

.claude-overnight/
  runs/
    2026-04-04T18-52-49/     ← run A (done, $200, 200 tasks)
      run.json, status.md, goal.md, milestones/, sessions/
    2026-04-05T10-30-00/     ← run B (crashed)
      run.json, sessions/

Any run that stops before the steering system declares the objective complete — capped at usage limit, Ctrl+C, crash, rate limit timeout, steering failure — is automatically resumable:

claude-overnight

Popularity

What's Inside

README