Code Crew
A downloadable famous-programmer code review crew for Codex, Claude Code, Hermes, Cursor, and AgentSkills-compatible agents.
Code Crew packages named software-engineering reasoning lenses for working through code quality, system design, and code improvement. Think of it as a reusable review discipline you can ask for when a diff deserves more than one opinion. Foreman dispatches the right persona specs as independent subagents, each follows its process, and Foreman synthesizes the returned reports.
This is a review-and-improvement shop for software. The crew is designed to attack a piece of code or a system from multiple angles, with built-in disagreement so you don't get a comfortable consensus that misses what's wrong.
Problems Code Crew Solves
| Problem | Code Crew response |
|---|
| Single-pass reviews miss classes of defects | Knuth + Hickey + Torvalds (K+H+T: Donald Knuth, Rich Hickey, Linus Torvalds) blind passes cover rigor, simplicity, and maintainer reality separately |
| Persona roleplay fabricates plausible findings | Mandatory diff-grounded verifier drops uncited claims before synthesis |
| Bigger crews feel thorough but degrade output | Default stays at the measured 3-persona preset; sextet is opt-in |
| Agent edits drift into drive-by refactors | Implementation discipline requires stated assumptions, surgical scope, and concrete verification |
Install
The canonical distributable package is plugins/code-crew/. It is prompt-only: it adds reusable skill instructions and persona briefs, not running software.
Codex:
codex plugin marketplace add gmentat/code-crew
codex plugin add code-crew@code-crew
The first command tells Codex where this repo's plugin marketplace lives. Most users run it once per Codex profile or machine; after that, the second command installs code-crew from that marketplace.
Codex installs plugins/code-crew/: a .codex-plugin/plugin.json manifest, the code-crew skill, and the full core persona briefs bundled under skills/code-crew/briefs/.
Claude Code:
claude plugin marketplace add gmentat/code-crew --sparse .claude-plugin plugins
claude plugin install code-crew
Hermes:
hermes skills install https://raw.githubusercontent.com/gmentat/code-crew/main/plugins/code-crew/skills/code-crew/SKILL.md \
--category software-development \
--name code-crew \
--yes
OpenClaw-compatible runtimes: the package includes plugins/code-crew/openclaw.plugin.json, but this repository does not yet publish a locally verified OpenClaw CLI install command.
Cursor: this repo includes a project rule at .cursor/rules/code-crew.mdc. The plugin package also includes a copyable template at plugins/code-crew/cursor/code-crew.mdc. It is an explicit-invocation rule, not an always-on reviewer. See CURSOR.md.
For local development and fallback manual installs, see INSTALL.md.
After install
Installing Code Crew makes the code-crew skill available to the host agent. It does not run in the background or review every change automatically; you ask for it when you want the crew.
In Codex, use it in one of three ways:
- Type
/skills, choose code-crew, then write the review request.
- Invoke it explicitly in the prompt:
$code-crew review the current diff.
- Ask naturally:
Use Code Crew to review this PR with the default Knuth + Hickey + Torvalds crew.
In fresh Codex sessions the installed skill may appear as code-crew:code-crew. That is expected. Code Crew is not exposed as a callable review tool; it is a skill the host agent loads when selected or invoked.
Codex may route clearly matching review or architecture prompts to the skill based on its SKILL.md name and description, but users should not rely on it running automatically for every coding task. The predictable path is explicit invocation.
Code Crew does not currently provide /code-crew commands. If command support becomes useful, it should be a thin convenience layer over the same skill, not a separate behavior path.
What's measured
Headlines from experiments/ (read each run's SUMMARY.md for full details):
pp means percentage points. For example, +7pp means recall moved from roughly 20% to 27%, not a 7% relative increase.
The recall headlines below are raw recall. The original fixed-precision primary metric was not computable in the main run because all arms scored far below the 0.70 precision threshold under the skeptic judge, so the public claims use the raw recall analysis and carry that caveat.