Search everything...

Stats

Actions

Available In

farnsworth-loop

Name: farnsworth-loop
Author: robanderson

By robanderson

Run a Farnsworth Loop tournament in one of two modes. Single pass: produce N independent solutions in parallel, then a blind Opus reviewer scores them, lists pros and cons, ranks them, and names a winner. Two pass: the same first round, but the Opus reviewer also distils what worked and what failed into guidance; the losing attempts are discarded, a second round of N fresh attempts is run with that guidance (positives to emulate, pitfalls to avoid), the saved round one winner is added back, and a final Opus ranker picks the overall winner. First ask the user which model to use for the attempts: Anthropic Opus/Sonnet/Haiku, a GLM z.ai model (glm-5.2/glm-5.1/glm-4.7/glm-4.5-air, run via the glm CLI), a local on-device MLX model (free, via the omlx server; list is dynamic), an OpenAI model via the codex exec CLI (gpt-5.5, pick a reasoning effort), a MiniMax M-series model (minimax-m3 via the MiniMax endpoint), Top Mixed (an even split across opus/glm-5.2/codex-high), or Mixed per-attempt; the blind reviewer/ranker is always Anthropic Opus. Trigger on the sigil @@FL[:N][:M[:Z]] (e.g. @@FL:5, @@FL:5:2, bare @@FL) or the prose marker 'farnsworth loop:N[:M[:Z]]', case-insensitive with optional spaces. N (attempts/round) is optional and may be inferred from a prose model spec like '2 opus, 2 glm 5.2, 1 codex high' or the Top Mixed preset; M = passes (1 single, 2 two); Z = grand loops: Z=1 (or omitted) is the isolated tournament, Z>=2 (capped at Z_MAX=5) runs an UNATTENDED chain that per loop runs a tournament, implements the winning proposal into your real repo on a new FL-<loop>-<random7> branch via the Opus farnsworth-implementer agent, runs fail-closed verify, and opens one PR (draft+needs-human on failure) — never auto-merged. E.g. 'do abc :farnsworth loop:5' or 'do abc @@FL:5:2'.

npx claudepluginhub robanderson/farnsworth-loop --plugin farnsworth-loop

What's Inside

Agents8

farnsworth-codex

/farnsworth-codex

Farnsworth Loop CODEX worker for OpenAI models via the `codex exec` CLI. A command runner: it executes the single benign shell command handed to it (which writes a brief file and runs the bundled farnsworth-loop codex runner script, performing the attempt on an OpenAI model via `codex exec`) and relays the result. It NEVER solves the task itself. One generic agent handles every codex effort level — the exact model/effort is in the command. Invoked only by the farnsworth-loop tournament; not a general-purpose agent.

farnsworth-glm-4-5-air

/farnsworth-glm-4-5-air

Farnsworth Loop GLM worker for the z.ai model glm-4.5-air. A command runner: it executes the single benign shell command handed to it (which writes a brief file and runs the bundled farnsworth-loop GLM runner script, performing the attempt on glm-4.5-air via z.ai) and relays the result. It NEVER solves the task itself. Invoked only by the farnsworth-loop tournament; not a general-purpose agent.

farnsworth-glm-4-7

/farnsworth-glm-4-7

Farnsworth Loop GLM worker for the z.ai model glm-4.7. A command runner: it executes the single benign shell command handed to it (which writes a brief file and runs the bundled farnsworth-loop GLM runner script, performing the attempt on glm-4.7 via z.ai) and relays the result. It NEVER solves the task itself. Invoked only by the farnsworth-loop tournament; not a general-purpose agent.

farnsworth-glm-5-1

/farnsworth-glm-5-1

Farnsworth Loop GLM worker for the z.ai model glm-5.1. A command runner: it executes the single benign shell command handed to it (which writes a brief file and runs the bundled farnsworth-loop GLM runner script, performing the attempt on glm-5.1 via z.ai) and relays the result. It NEVER solves the task itself. Invoked only by the farnsworth-loop tournament; not a general-purpose agent.

farnsworth-glm-5-2

/farnsworth-glm-5-2

Farnsworth Loop GLM worker for the z.ai model glm-5.2. A command runner: it executes the single benign shell command handed to it (which writes a brief file and runs the bundled farnsworth-loop GLM runner script, performing the attempt on glm-5.2 via z.ai) and relays the result. It NEVER solves the task itself. Invoked only by the farnsworth-loop tournament; not a general-purpose agent.

Skills2

farnsworth-bench

/farnsworth-bench

Benchmark generation throughput (cold vs hot tok/s) for every model the farnsworth-loop system can call (Anthropic / GLM / local MLX / codex / MiniMax). Two workload profiles — light (tiny paragraph) and heavy (>5k-token input context + long >5k-token output, representative of coding/agentic work). Thin wrapper over bin/fl-bench.mjs. Use when the user asks to benchmark model speed, measure tokens/second, compare cold vs hot throughput across providers, or run /fl-bench.

farnsworth-loop

/farnsworth-loop

Run a Farnsworth Loop tournament in one of two modes. The sigil is @@FL[:N][:M[:Z]] — N (optional) = attempts per round, M = passes (1 single, 2 two), Z = grand loops (Z>=2 = an UNATTENDED chain that, per loop, runs a full tournament, implements the winning proposal into your real repo on a new FL-<loop>-<random7> branch, runs fail-closed verify, and opens one PR — never auto-merged; Z=1 or omitted = today's isolated tournament; Z capped at Z_MAX=5); N may be inferred from a prose model spec like '2 opus, 2 glm 5.2, 1 codex high' (sum of counts = N, the items become the per-attempt Mixed assignment) or the Top Mixed preset ('top mixed' + N spread over opus/glm-5.2/codex-high), and bare @@FL falls back to the interactive model gate. First ask the user which model quality to use for the attempts (Anthropic Opus, Sonnet, Haiku; a GLM z.ai model via the glm CLI; a free local on-device MLX model via the omlx server; or Mixed per-attempt). SINGLE PASS: produce N independent solutions in parallel, then a blind Opus reviewer scores them, lists pros and cons, ranks them, and names a winner. TWO PASS: the same first round, but the Opus reviewer also distils what worked and what failed into guidance; the losing attempts are discarded, a second round of N fresh attempts is run with that guidance (positives to emulate, pitfalls to avoid), the saved round one winner is added back, and a final Opus ranker picks the overall winner. Trigger whenever the user's message contains a sigil of the form @@FL:N:M (for example @@FL:5 , @@FL:5:2 , @@fl:7:2 ), where N is the number of attempts per round and M is the number of passes (omitted or 1 = single pass, 2 = two pass); the text before the sigil is the task. ALSO trigger on the prose marker 'farnsworth loop:N' (single pass) or 'farnsworth loop:N:2' (two pass), e.g. 'do abc :farnsworth loop:5' or 'do abc: farnsworth loop:5:2'. All forms are case-insensitive with optional spaces around the colons. Also trigger when the user clearly asks for a farnsworth loop / generate-and-rank tournament even without a marker.

README

Farnsworth Loop

"Good news, everyone!"

Farnsworth Loop in action

Farnsworth Loop is a Claude Code plugin that runs best-of-N tournaments. You hand it a task; it produces N independent attempts in parallel, then a blind Anthropic Opus reviewer scores them, lists pros and cons, ranks them, and names a winner. The attempts can come from any mix of providers (Anthropic, GLM, on-device MLX, OpenAI Codex, MiniMax); the judge is always Opus, held fixed so the comparison stays honest.

@@FL:5  Build a CLI that flattens nested JSON to dotted keys.

That one line triggers the loop: it asks which model(s) to run the 5 attempts on, you answer, it fans out 5 isolated workers, and a blind Opus reviewer crowns a winner. Add :2 for two passes (a guided second round), or a :Z for grand loops (an unattended chain that implements each winner into a real branch and opens a PR).

The core idea
Single pass vs two pass
Invoking it: the sigil and prose forms
Model providers
Diversity injection
Grand loops (Z >= 2)
The dogfood backlog
Installation & setup
The benchmarking system (fl-bench)
Repository layout
Honest limitations

The core idea

A single LLM attempt at a task is one sample from a noisy distribution. The Farnsworth Loop spends tokens to do better than one sample, in two specific ways:

Generate, don't iterate. Run N attempts in parallel, each a single-pass exploration — every attempt writes its solution once and stops. No attempt is told it's competing or being judged; none sees another's work. The refinement happens at the tournament level (many diverse one-shots → review), never inside a single attempt grinding "until it works." A rough or even failed attempt is useful signal, not a wasted slot.
Judge blind, with a fixed strong judge. One Anthropic Opus reviewer receives the deliverables labelled Candidate A, B, C, … with no model identities attached. It reads (and where feasible runs) each one, scores against task-appropriate criteria, lists concrete pros and cons, ranks them, and names a winner with reasoning. Because the judge never learns which model produced which candidate, a cheap model can win on merit — and the engine takes mechanical steps (below) to keep that blindness real.

The attempts are deliberately diverse: different model families, sampling stochasticity, and a per-attempt framing nudge (diversity injection) all push the N solutions apart so the review has genuinely different things to compare.

Single pass vs two pass

The two modes share one spine. Two pass is single pass plus a learning step in the middle.

Phase	Single pass	Two pass
Round 1 attempts	N parallel, isolated, one diversity nudge each	identical
Blind Opus review	scores, ranks, names winner → this is the result	scores, ranks, names round-1 winner and distils guidance
Carry / discard	—	save the winner's deliverable; discard every other artifact, keep only the distilled lessons
Round 2 attempts	—	N fresh attempts given the task + guidance (positives to emulate, pitfalls to avoid) — but never round-1 code
Final rank	—	pool = N round-2 attempts + the saved round-1 winner (N+1), re-labelled blind, one Opus ranker picks overall winner

The distilled guidance is two short lists — positives to consider and challenges to avoid — phrased as generic principles, each tagged [strong] (held up repeatedly) or [tentative] (a single sighting), with no candidate-specific code.

Why two pass discards the losing code but keeps the lessons: re-using a winner's code would make round 2 copy it and collapse the diversity that makes the loop work. Re-using the distilled pros and cons keeps diversity while raising the floor. The saved round-1 champion then competes blind in the final pool on the merits — it produced no worse work for not having seen the guidance, and if a guided round-2 attempt is genuinely better, it should win.

SINGLE PASS
  task ──▶ [N attempts] ──▶ blind Opus review ──▶ winner ✓

View full README on GitHub

Similar Plugins

prompts.chat

163.4k·411·

Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.

4mo

v1.0.0

claude-code-toolkit

1.7k·186·

Complete developer toolkit for Claude Code

1mo

v1.0.0

rohitg00

drawio-diagramming

12·108·

Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.

2mo

v2.0.0

markus41

anthropic-essentials

24.3k·104·

Feature development with code-architect/explorer/reviewer agents, CLAUDE.md audit and session learnings, and Agent Skills creation with eval benchmarking from Anthropic.

v1.0.0

anthropics

agent-teams

35.7k·89·

Orchestrate multi-agent teams for parallel code review, hypothesis-driven debugging, and coordinated feature development using Claude Code's Agent Teams

v1.0.2

wshobson

agent-skills

60.6k·70·

Production-grade engineering skills for AI coding agents — covering the full software development lifecycle from spec to ship.

v0.6.2

addyosmani

farnsworth-loop

Popularity

Health & Quality

Confidence

What's Inside

README

Farnsworth Loop

Contents

The core idea

Single pass vs two pass

Similar Plugins

prompts.chat

claude-code-toolkit

drawio-diagramming

anthropic-essentials

agent-teams

agent-skills

Popularity

Health & Quality

Similar Plugins

prompts.chat

claude-code-toolkit

drawio-diagramming

anthropic-essentials

agent-teams

agent-skills