Marketplace for installing the EvalSmith forensic LLM evaluation plugin.
npx claudepluginhub gangj277/evalsmithGrounded forensic evaluation engineering for prompts, context, retrieval, tools, and multi-stage LLM workflows.
Grounded forensic evaluation for prompts, context, retrieval, tools, and multi-stage LLM workflows.
EvalSmith is an open-source coding-agent skill for grounded prompt and workflow evaluation inside real LLM product repositories.
The canonical direct-install skill package lives at skills/evalsmith/.
The repository root is also packaged as a Claude Code plugin and marketplace.
Instead of giving generic prompt advice, EvalSmith is designed to help an agent:
Most teams still improve LLM features with ad hoc prompt edits and a few manual spot checks. That breaks down once the feature depends on retrieval, tools, structured outputs, or multi-stage orchestration.
EvalSmith exists to make prompt iteration look more like engineering:
For serious workflows, that means more than a benchmark score. EvalSmith is designed to force:
.
├── .claude-plugin/
│ ├── marketplace.json
│ └── plugin.json
├── SKILL.md
├── agents/
│ └── openai.yaml
├── .github/
│ ├── readme/
│ │ └── evalsmith-hero-web.png
│ └── workflows/
│ └── ci.yml
├── docs/
│ ├── marketplace-readiness.md
│ └── plans/
├── examples/
│ └── claude-settings.evalsmith.json
├── PRD.md
├── README.md
├── references/
│ ├── artifact-contract.md
│ ├── evalsmith-method.md
│ ├── forensic-analysis.md
│ └── research-foundations.md
├── scripts/
│ ├── bootstrap_evalsmith.py
│ ├── install_evalsmith.py
│ └── validate_packaged_skill.py
├── skills/
│ └── evalsmith/
│ ├── SKILL.md
│ ├── agents/
│ ├── references/
│ └── scripts/
└── tests/
├── test_bootstrap_evalsmith.py
└── test_install_evalsmith.py
Recommended install path:
/plugin marketplace add gangj277/EvalSmith
/plugin install evalsmith@evalsmith
This repo publishes a marketplace named evalsmith and a plugin named evalsmith.
After install, Claude Code can auto-invoke the skill when the task matches, and you can explicitly test it with /evalsmith:evalsmith.
The repository is also prepared for Anthropic's official plugin submission flow. Details and the submission links are in docs/marketplace-readiness.md.
Native public install path:
CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
python "$CODEX_HOME/skills/.system/skill-installer/scripts/install-skill-from-github.py" \
--repo gangj277/EvalSmith \
--path skills/evalsmith
Clone-based fallback:
python scripts/install_evalsmith.py --target codex
This installs the packaged skill into ~/.codex/skills/evalsmith.
This repository does not claim a public Codex marketplace listing; the native verified path is GitHub skill installation.
If you want a direct standalone skill install instead of the marketplace plugin:
python scripts/install_evalsmith.py --target claude
This installs the packaged skill into ~/.claude/skills/evalsmith.
Claude Code also supports project-local skills at .claude/skills/<skill-name>/SKILL.md and personal skills at ~/.claude/skills/<skill-name>/SKILL.md. This repo packages the exact folder you would copy there: skills/evalsmith/.
For local plugin development, you can also point Claude Code directly at this repo root:
claude --plugin-dir /path/to/EvalSmith
For team-wide rollout, start from examples/claude-settings.evalsmith.json and merge it into your repository's .claude/settings.json.
Example invocation:
Use $evalsmith to inspect this repo's LLM feature, create the right eval plan, scaffold repo-native evals, and propose evidence-backed prompt fixes.
If you want a starter artifact layout in a target repository after installation:
python scripts/bootstrap_evalsmith.py /path/to/target-repo \
--feature-name "Support Copilot" \
--workflow-type rag
Development marketplace for Superpowers core skills library
Harness-native ECC skills, hooks, rules, MCP conventions, and operator workflows
Open Design — local-first design app exposed to coding agents over MCP. Install once with your agent's plugin command and projects/files/skills are reachable through stdio.