From code-crew
Use the famous-programmer Code Crew for code review, design critique, refactoring judgment, and multi-lens engineering decisions. Default crew Knuth+Hickey+Torvalds with mandatory diff-grounded verifier between blind passes and synthesis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/code-crew:code-crewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use Code Crew when the user asks for a serious software review, architecture critique, refactoring decision, or "what would the crew say" analysis.
Use Code Crew when the user asks for a serious software review, architecture critique, refactoring decision, or "what would the crew say" analysis.
Invocation note: when a user selects this skill from Codex or writes $code-crew ..., Code Crew is already invoked. Do not search for a separate Code Crew callable tool, and do not report "no callable Code Crew tool" as a blocker. Code Crew is a skill/procedure package: use the host's generic subagent facility when it exists, such as Codex spawn_agent or Claude Code's Agent tool. If the host lacks subagent dispatch, label the result as a single-context approximation and continue with the verifier and synthesis rules that can be applied locally.
Code Crew uses reasoning archetypes inspired by famous programmers and computer scientists. These are not impersonations, endorsements, or claims to represent the real people. They are named review contracts: each lens focuses attention on a different kind of engineering failure.
1. triage → procedures/triage.md ← which crew?
2. discipline → procedures/implementation-discipline.md ← only when changing code
3. briefs → briefs/<persona>_agent.md ← per-persona blind passes
4. verify → procedures/verify.md ← drop fabrications against the diff
5. synth → procedures/synthesis.md ← decision-changing findings only
6. artifact → procedures/artifact-format.md ← only if asked
Each step is in a file. Load it when you reach that step. Do not skip the verifier — prior runs measured ~64% of K+H+T synthesis findings as FABRICATED by the judge on SWE-PRBench, and the verifier is the precision gate designed to drop them before synthesis.
When the user just says "review this" without naming a lens, use K+H+T:
In our SWE-PRBench experiments (n=50 PRs, paired binomial test) this 3-persona crew outperformed the full 6-persona sextet on raw recall (+6.4pp, p=0.047), precision (0.106 vs 0.084), and fabrication rate (0.645 vs 0.679). It also held its lead against 8 other tested triples. Treat that as evidence for this preset, not proof famous names alone improve outputs.
The skill ships these files; load them on demand, not always-on:
Persona briefs (briefs/): the full archetypes used in the experiments.
briefs/knuth_agent.mdbriefs/hickey_agent.mdbriefs/torvalds_agent.mdbriefs/dijkstra_agent.mdbriefs/liskov_agent.mdbriefs/pike_agent.mdProcedures (procedures/): the steps of a review.
procedures/triage.md — pick the crew before any pass runsprocedures/implementation-discipline.md — scope, assumptions, and verification loop for code-changing follow-upsprocedures/verify.md — diff-grounded gate; mandatory between blind passes and synthesisprocedures/synthesis.md — final review writeup with severity rankingprocedures/artifact-format.md — runs/YYYY-MM-DD-topic_host/ layout if persistence is requestedExamples (examples/): optional calibration material.
examples/review-examples.md — good/bad examples for crew runs, findings, verifier output, and implementation follow-upFor a default K+H+T run, you will load: triage.md (briefly), 3 briefs (one per persona ≈ 25 KB combined), verify.md, synthesis.md. Total ≈34 KB read on-demand. Always-on cost of this skill is just this SKILL.md (~100 tok per session per Claude Code's projection); the on-demand content stays out of the always-on context.
These are not stylistic guidance — they are conditions for calling the output a Code Crew review.
file:line and a quote (or visible diff line). Bare claims like "consider adding tests" are not findings.verify.md. Candidate findings from the persona passes do not appear in the final review unless they pass the verifier. Default to reject when uncertain.reasoning_effort: xhigh on each spawn_agent call. In Claude Code, prefer the highest-thinking model variant or the maximum thinking.budget_tokens the host allows. If the host exposes no such control, proceed at the default and note "ran at host-default reasoning budget" in the Verification block. A K+H+T crew run at minimal-thinking budgets is not what the experiments measured and should not be labelled a formal crew run.procedures/implementation-discipline.md before editing. The implementation must state assumptions, keep the diff tied to the request, and verify with concrete checks.When the user explicitly asks for one persona:
Solo passes still run the verifier. Label single-lens output clearly; do not present it as a crew synthesis.
The final review emitted by procedures/synthesis.md:
## Findings
- [Critical] file:line — Finding. Evidence: <quoted line or citation>. Lens: Knuth/Hickey/Torvalds.
- [High] file:line — ...
## Disagreement
Material disagreement only. Omit when the lenses agree.
## Recommendation
LAND / LAND WITH FIXES / REQUEST CHANGES / REDESIGN / NEEDS MORE EVIDENCE
## Verification
What was actually checked. "Not run" with reason if nothing ran.
This skill does not authorize destructive operations, commits, pushes, deploys, external comments, or package publication. Ask for explicit user approval before any irreversible or external action.
This skill is prompt-only. It does not install dependencies, contact services, run code, or modify files outside what the user explicitly asks for as implementation work.
Empirically tested scope: PR / diff review. The K+H+T result and the 64%-fabrication number both come from SWE-PRBench experiments at n=50 PRs with paired sign tests, all on PR-review tasks. The reported recall numbers are raw recall; the pre-registered fixed-precision metric was not computable because all arms scored far below the 0.70 precision threshold under the skeptic judge. K+H+T is the best tested default we have for PR review, not a proven global optimum (10 of 20 possible triples were tested).
Broader uses (design critique, file review, refactoring judgment) are supported by the persona briefs and run in source-mode of the verifier, but the empirical claims do not transfer. We have no measurement of K+H+T vs other crews on design critique or file review, and the source-mode verifier is a logical adaptation of the diff-mode rubric, not an independently tested procedure.
The verifier procedure is recommended on the basis of the same experiments showing a 64% fabrication rate without it; the empirical effect of adding this verifier on that rate has not yet been independently measured in this repo.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub gmentat/code-crew --plugin code-crew