Marketplace

evalsmith

Marketplace for installing the EvalSmith forensic LLM evaluation plugin.

npx claudepluginhub gangj277/evalsmith

README

View full README on GitHub

1 Plugin

evalsmith

1·

Grounded forensic evaluation engineering for prompts, context, retrieval, tools, and multi-stage LLM workflows.

2mo

v0.1.1

gangj277

Stats

Plugins1

Stars1

UpdatedMar 29, 2026

Links

View on GitHub View Marketplace JSON

EvalSmith

EvalSmith forensic evaluation workflow banner

Grounded forensic evaluation for prompts, context, retrieval, tools, and multi-stage LLM workflows.

EvalSmith is an open-source coding-agent skill for grounded prompt and workflow evaluation inside real LLM product repositories.

The canonical direct-install skill package lives at skills/evalsmith/. The repository root is also packaged as a Claude Code plugin and marketplace.

Instead of giving generic prompt advice, EvalSmith is designed to help an agent:

inspect the repository and infer the real LLM workflow
map evaluation surfaces across prompts, retrieval, tools, and output contracts
scaffold repo-native eval artifacts
run or wire up a benchmark loop
diagnose failures by stage
perform attribution-driven forensic analysis across traces, prompt clauses, context blocks, tools, and parsers
propose evidence-backed prompt or workflow changes

Why This Exists

Most teams still improve LLM features with ad hoc prompt edits and a few manual spot checks. That breaks down once the feature depends on retrieval, tools, structured outputs, or multi-stage orchestration.

EvalSmith exists to make prompt iteration look more like engineering:

Understand the workflow.
Define what good looks like.
Scaffold runnable evals.
Run them.
Diagnose failures.
Change the system with evidence.

For serious workflows, that means more than a benchmark score. EvalSmith is designed to force:

exact trace capture
component inventories for prompts and context
controlled ablations and counterfactual reruns
component-to-behavior attribution ledgers
optimization only after the attribution evidence is strong enough

Repository Layout

.
├── .claude-plugin/
│   ├── marketplace.json
│   └── plugin.json
├── SKILL.md
├── agents/
│   └── openai.yaml
├── .github/
│   ├── readme/
│   │   └── evalsmith-hero-web.png
│   └── workflows/
│       └── ci.yml
├── docs/
│   ├── marketplace-readiness.md
│   └── plans/
├── examples/
│   └── claude-settings.evalsmith.json
├── PRD.md
├── README.md
├── references/
│   ├── artifact-contract.md
│   ├── evalsmith-method.md
│   ├── forensic-analysis.md
│   └── research-foundations.md
├── scripts/
│   ├── bootstrap_evalsmith.py
│   ├── install_evalsmith.py
│   └── validate_packaged_skill.py
├── skills/
│   └── evalsmith/
│       ├── SKILL.md
│       ├── agents/
│       ├── references/
│       └── scripts/
└── tests/
    ├── test_bootstrap_evalsmith.py
    └── test_install_evalsmith.py

Install

Claude Code Marketplace

Recommended install path:

/plugin marketplace add gangj277/EvalSmith
/plugin install evalsmith@evalsmith

This repo publishes a marketplace named evalsmith and a plugin named evalsmith. After install, Claude Code can auto-invoke the skill when the task matches, and you can explicitly test it with /evalsmith:evalsmith. The repository is also prepared for Anthropic's official plugin submission flow. Details and the submission links are in docs/marketplace-readiness.md.

Codex

Native public install path:

CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
python "$CODEX_HOME/skills/.system/skill-installer/scripts/install-skill-from-github.py" \
  --repo gangj277/EvalSmith \
  --path skills/evalsmith

Clone-based fallback:

python scripts/install_evalsmith.py --target codex

This installs the packaged skill into ~/.codex/skills/evalsmith. This repository does not claim a public Codex marketplace listing; the native verified path is GitHub skill installation.

Claude Code

If you want a direct standalone skill install instead of the marketplace plugin:

python scripts/install_evalsmith.py --target claude

This installs the packaged skill into ~/.claude/skills/evalsmith.

Claude Code also supports project-local skills at .claude/skills/<skill-name>/SKILL.md and personal skills at ~/.claude/skills/<skill-name>/SKILL.md. This repo packages the exact folder you would copy there: skills/evalsmith/. For local plugin development, you can also point Claude Code directly at this repo root:

claude --plugin-dir /path/to/EvalSmith

For team-wide rollout, start from examples/claude-settings.evalsmith.json and merge it into your repository's .claude/settings.json.

Use

Example invocation:

Use $evalsmith to inspect this repo's LLM feature, create the right eval plan, scaffold repo-native evals, and propose evidence-backed prompt fixes.

If you want a starter artifact layout in a target repository after installation:

python scripts/bootstrap_evalsmith.py /path/to/target-repo \
  --feature-name "Support Copilot" \
  --workflow-type rag

evalsmith

README

1 Plugin

evalsmith

evalsmith

README

EvalSmith

Why This Exists

Repository Layout

Install

Claude Code Marketplace

Codex

Claude Code

Use

1 Plugin

evalsmith

Related Marketplaces

superpowers-dev

ecc

open-design

EvalSmith

Why This Exists

Repository Layout

Install

Claude Code Marketplace

Codex

Claude Code

Use

Related Marketplaces

superpowers-dev

ecc

open-design