From crucible
Multi-LLM adversarial project review. Evaluates progress, tech stack, code quality, CI/CD, pipeline, and plan drift through parallel critique from multiple AI models. Use when user says "crucible", "project review", "adversarial review", "roast my project", "health check", "plan drift", "tech audit", "code audit".
How this skill is triggered — by the user, by Claude, or both
Slash command
/crucible:crucibleThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
**If the user asks to see the banner, NFO, or splash screen, display the following:**
If the user asks to see the banner, NFO, or splash screen, display the following:
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░ ░
░ ▄████▄ ██▀███ █ ██ ▄████▄ ██▓ ▄▄▄▄ ██▓ ░
░ ▒██▀ ▀█ ▓██ ▒ ██▒██ ▓██▒██▀ ▀█ ▓██▒▓█████▄▓██▒ ░
░ ▒▓█ ▄▓██ ░▄█ ▒██ ▒██▒▓█ ▄▒██▒▒██▒ ▄█▒██░ ░
░ ▒▓▓▄ ▄██▒██▀▀█▄ ░██ ░██▒▓▓▄ ▄██░██░▒██░█▀ ▒██░ ░
░ ▒ ▓███▀ ░██▓ ▒██░ ████▓▒▒ ▓███▀ ░██░░▓█ ▀█░██████░
░ ░ ░▒ ▒ ░ ▒▓ ░▒▓░ ▒░▒░▒░░ ░▒ ▒ ░▓ ░▒▓███▀░ ▒░▓ ░
░ ░ ▒ ░▒ ░ ▒░ ░ ▒ ▒░ ░ ▒ ▒ ░▒░▒ ░░ ░ ▒ ░
░ ░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
╔═══════════════════════════════════════════════════════╗
║ RELEASE INFORMATION ║
╠═══════════════════════════════════════════════════════╣
║ ║
║ Skill.......: crucible ║
║ Author......: machug (hughtec.com) ║
║ Version.....: 0.1.0 ║
║ Released....: 2026 ║
║ License.....: MIT ║
║ Requires....: Python 3.10+, litellm ║
║ ║
╠═══════════════════════════════════════════════════════╣
║ PIPELINE OVERVIEW ║
╠═══════════════════════════════════════════════════════╣
║ ║
║ gather ──> review ──> challenge ──> synthesize ║
║ ║
║ Adversarial project review via multi-LLM panel. ║
║ Each model plays a different adversarial persona. ║
║ N models enter. 1 honest report leaves. ║
║ ║
╚═══════════════════════════════════════════════════════╝
Multi-LLM adversarial project review. Gathers project context and sends it to multiple AI models, each reviewing through a different adversarial persona and dimension. Produces a structured report with severity ratings and actionable recommendations.
Important: Claude is an active participant in this review, not just an orchestrator. You (Claude) will provide your own independent assessment alongside the external models, challenge findings you disagree with, and synthesize the final report.
litellm package installedIMPORTANT: Do NOT install the llm package. This skill uses litellm for API providers.
| Provider | API Key Env Var | Example Models |
|---|---|---|
| OpenAI | OPENAI_API_KEY | gpt-5.4, gpt-5.4-pro, o3-pro, o4-mini |
| Anthropic | ANTHROPIC_API_KEY | claude-opus-4-6, claude-sonnet-4-6 |
GEMINI_API_KEY | gemini/gemini-3.1-pro-preview, gemini/gemini-2.5-pro | |
| xAI | XAI_API_KEY | xai/grok-4.20-0309-reasoning, xai/grok-4-0709 |
| Azure AI | AZURE_AI_API_KEY | foundry/<deployment-name> |
| Mistral | MISTRAL_API_KEY | mistral/mistral-large |
| Groq | GROQ_API_KEY | groq/llama-3.3-70b-versatile |
| OpenRouter | OPENROUTER_API_KEY | openrouter/openai/gpt-5.2-pro |
| Deepseek | DEEPSEEK_API_KEY | deepseek/deepseek-chat |
| ZAI (GLM) | ZAI_API_KEY | zai/glm-5.1, zai/glm-5-turbo |
| Moonshot | MOONSHOT_API_KEY | moonshot/kimi-k2.5 |
| Codex CLI | (ChatGPT subscription) | codex/gpt-5.3-codex |
| Gemini CLI | (Google account) | gemini-cli/gemini-3.1-pro-preview |
Discover latest models: Run cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py discover-models
Run cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py providers to see which keys are set.
Each dimension is a focused lens for project evaluation:
| Dimension | Focus |
|---|---|
progress | Plan drift, milestone tracking, velocity, scope creep |
tech-stack | Framework choices, architecture, dependencies, complexity |
code-quality | Type safety, test coverage, dead code, error handling |
pipeline | CI/CD health, deploy frequency, build reproducibility |
security | Dependency vulns, secret management, OWASP risks |
devex | Onboarding friction, README accuracy, project structure |
Each model is assigned a different adversarial persona for maximum critique diversity:
| Persona | Perspective |
|---|---|
investor | Skeptical VC doing due diligence — progress vs burn, PMF evidence |
staff-engineer | Seen it all — architecture sustainability, tech debt, scaling risks |
devops-lead | Will be paged at 3am — deploy reliability, observability, rollback |
qa-skeptic | Zero tolerance — test coverage, edge cases, regression prevention |
new-hire | Day one developer — setup friction, documentation accuracy, tribal knowledge |
Automatically collect project context. This is the critical input — the richer the context, the better the review. Gather ALL of the following that exist:
Git & History:
git log --oneline -30
git log --format="%h %s" --since="2 weeks ago"
git diff --stat HEAD~10..HEAD
git branch -a
Project Structure:
ls -la
find . -name "*.json" -maxdepth 2 -not -path "*/node_modules/*" | head -20
Dependencies (read the relevant files):
package.json / requirements.txt / Cargo.toml / go.modCI/CD Configuration (read if they exist):
.github/workflows/ / Jenkinsfile / .gitlab-ci.yml / vercel.jsonIssue Tracker:
bd list --status=open and bd statsCode Metrics (sample):
find . -name "*.ts" -not -path "*/node_modules/*" | wc -lfind . -name "*.test.*" -o -name "*.spec.*" | head -10tsconfig.json, mypy.ini, .eslintrcBuild & Deploy:
npm run / yarn / make to see available scriptsCompile ALL gathered context into a single JSON structure:
{
"project_name": "...",
"gathered_at": "ISO timestamp",
"git": {
"recent_commits": "...",
"branches": "...",
"recent_diff_stats": "..."
},
"structure": {
"top_level_files": "...",
"file_counts_by_type": {}
},
"dependencies": {
"package_json": "... or null",
"lock_file_exists": true
},
"ci_cd": {
"github_actions": "... or null",
"deploy_config": "... or null"
},
"issues": {
"open_count": 0,
"in_progress": [],
"recent_closed": []
},
"code_quality": {
"test_files_count": 0,
"type_config_exists": true,
"lint_config_exists": true
}
}
Write this context to a temporary file (e.g., /tmp/crucible-context-{project}.json).
First, discover available models:
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py discover-models
If that fails, fall back to:
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py providers
Present available models to the user using AskUserQuestion with multiSelect. Build options from discover-models output. Recommend at least 3 models for good adversarial coverage.
Also ask which dimensions to review (default: all 6), or let the user focus on specific areas.
Send the context to all selected models in parallel, each assigned a different persona and dimension:
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py review --models MODEL1,MODEL2,MODEL3 --context /tmp/crucible-context-{project}.json
The script outputs JSON with all reviews. Parse it and present each review to the user.
Claude's Own Review: After receiving model reviews, provide YOUR OWN independent assessment. You have advantages the models don't: you can read the actual code, run commands, and verify claims. Use this to:
Ask the user: "Want to run a cross-examination round where models challenge each other's findings?"
If yes, save the reviews to a file and run:
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py challenge --models MODEL1,MODEL2 --reviews /tmp/crucible-reviews-{project}.json
Consolidate all reviews (model + Claude's own) into a structured final report:
# Crucible Report: {Project Name}
**Date:** {date}
**Models:** {list}
**Dimensions reviewed:** {list}
## Executive Summary
{2-3 sentence overall assessment}
## Scorecard
| Dimension | Score | Severity | Key Finding |
|-----------|-------|----------|-------------|
| Progress | X/10 | ... | ... |
| Tech Stack| X/10 | ... | ... |
| ... | ... | ... | ... |
**Overall: X/10**
## Critical Findings
{Findings rated CRITICAL by any reviewer, with cross-examination results}
## Warnings
{Findings rated WARNING}
## Healthy Areas
{What's working well — important for morale}
## Recommended Actions
{Prioritized list of concrete next steps}
## Reviewer Disagreements
{Areas where models disagreed — these often reveal the most interesting insights}
## Cost Summary
{Token usage and cost breakdown from the review}
Write the report to ./crucible-report-{date}.md in the project directory.
Ask the user if they want to export critical findings and recommendations as issues:
bd create# Run review
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py review --models gpt-5.4,xai/grok-4-0709 --context context.json
# Specific dimensions only
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py review --models gpt-5.4 --context ctx.json --dimensions progress,code-quality
# Cross-examination round
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py challenge --models gpt-5.4,xai/grok-4-0709 --reviews reviews.json
# List providers
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py providers
# Discover models
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py discover-models
# List dimensions
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py dimensions
# List personas
cd ${CLAUDE_PLUGIN_ROOT}/skills/crucible/scripts && python3 crucible.py personas
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub machug/marketplace --plugin crucible