From farnsworth-loop
Benchmark generation throughput (cold vs hot tok/s) for every model the farnsworth-loop system can call (Anthropic / GLM / local MLX / codex / MiniMax). Two workload profiles — light (tiny paragraph) and heavy (>5k-token input context + long >5k-token output, representative of coding/agentic work). Thin wrapper over bin/fl-bench.mjs. Use when the user asks to benchmark model speed, measure tokens/second, compare cold vs hot throughput across providers, or run /fl-bench.
How this skill is triggered — by the user, by Claude, or both
Slash command
/farnsworth-loop:farnsworth-benchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Thin wrapper over `bin/fl-bench.mjs`. It measures **tokens/second** for each
Thin wrapper over bin/fl-bench.mjs. It measures tokens/second for each
selected model on a cold call and an immediate hot call, prints a table,
and appends every result to <plugin>/.bench/results.jsonl.
Resolve the plugin root (the dir containing plugin.json for farnsworth-loop),
then run the benchmark script with node. Pass the user's selection through
verbatim; default to --models all.
node "<plugin-root>/bin/fl-bench.mjs" --models <selection> [--profile light|heavy]
<selection> (comma-separated, de-duped):
all — every callable model (local MLX list discovered live). Default.anthropic | glm | local | codex | minimax<provider>:<id> — e.g. glm:glm-5.1, codex:codex-high, anthropic:opus, local:<omlx-id>opus, glm-5.2, minimax-m3, codex-high, a local idProfiles (--profile, default light; shorthand --heavy / --light):
light — a ~200-word paragraph; fast/cheap throughput smoke (output cap 2048).heavy — a representative coding/agentic workload: a fixed >5k-token input
context plus an instruction that elicits a long structured deliverable
(>5k-token decode), output cap 8192, longer timeouts. Use this when the
light profile's few-hundred-token decode is too small to characterise real
coding throughput. The profile name is stored on every result row.Useful flags: --list (dry-run; prints the resolved plan + profile, makes NO
model calls — cheap way to confirm the selection before spending),
--timeout <secs>, --help.
--list with the same --models selection
and show the user the plan before the real (paid) sweep.ZAI_API_KEY, MINIMAX_API_KEY, OMLX_AUTH_TOKEN; Anthropic uses the
session's own auth; codex uses ~/.codex/auth.json). A provider whose key is
unset is recorded as a failed row and the sweep continues — surface those rows.<plugin>/.bench/results.jsonl for history.*
estimated-token note (codex fallback). The table shows cIn (cold input
tokens), cOut/hOut (cold/hot output tokens) — under --profile heavy
confirm cIn and the output columns are both comfortably over 5k.See bin/README.fl-bench.md for the full usage and results-format reference.
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub robanderson/farnsworth-loop --plugin farnsworth-loop