Development plugins with eval harnesses — a reference implementation
npx claudepluginhub bailejl/dev-pluginsReact component scaffolding, a11y audits, responsive checks, refactoring, and design system compliance
Assess repo and git history for AI-coding assistant readiness — audits, code review, security, testing, architecture, and API design
A Claude Code plugin marketplace for development tooling — with built-in evaluation harnesses for each plugin.
Designed as a reference implementation demonstrating how to build Claude Code plugins with rigorous, eval-driven development.
# 1. Install dependencies
npm install
# 2. Set your API key (used by eval harness)
echo "ANTHROPIC_API_KEY=your-key-here" > .env
# 3. Run evals for one plugin and view results
npm run eval:readiness
npx promptfoo view
┌──────────┐ ┌───────────┐ ┌──────────────────┐ ┌─────────┐
│ Task │───▶│ Trial │───▶│ Graders │───▶│ Outcome │
│ (test │ │ (single │ │ • deterministic │ │ pass@k │
│ case in │ │ prompt- │ │ • llm-rubric │ │ pass^k │
│ suite) │ │ foo run) │ │ • transcript │ │ scores │
└──────────┘ └───────────┘ └──────────────────┘ └─────────┘
See BASELINE.md for current eval metrics and docs/EVAL_TAXONOMY.md for how our eval concepts map to the Anthropic "Demystifying Evals" article.
React component scaffolding, accessibility audits, responsive design checks, component refactoring, and design system compliance.
Commands:
/frontend-dev:scaffold-component — Scaffold a React component with props, types, tests, and story/frontend-dev:a11y-audit — WCAG 2.1 AA compliance audit using axe-core patterns/frontend-dev:responsive-check — Responsive design audit (media queries, viewport, touch targets)/frontend-dev:refactor — React component refactoring (decompose, extract hooks, reduce complexity)/frontend-dev:design-system — Design system compliance (tokens vs hardcoded values)Assess a repository and its git history for AI-coding assistant readiness — comprehensive audits covering code quality, security, testing, architecture, git health, and API design.
Commands:
/ai-readiness:full-audit — 10-section comprehensive AI readiness audit/ai-readiness:git-health — 71 git anti-patterns with DORA-based severity scoring/ai-readiness:code-review — 7-category weighted code review and static analysis/ai-readiness:architecture — 6-category architecture review with SOLID principles/ai-readiness:security — 6-category security review (OWASP, auto-fail on critical)/ai-readiness:testing — Test quality: patterns, desiderata, pyramid analysis/ai-readiness:api-review — 7-category API design and contract reviewdev-plugins/
├── plugins/ # What ships to users (commands, skills, agents, hooks)
│ ├── frontend-dev/
│ └── ai-readiness/
├── evals/ # Per-plugin eval suites, graders, fixtures (stays in repo)
│ ├── frontend-dev/
│ └── ai-readiness/
├── eval-infra/ # Shared eval utilities, scripts, rubric templates
└── docs/ # Contributor and learner guides
# Install dependencies
npm install
# Set your Anthropic API key in .env (gitignored)
echo "ANTHROPIC_API_KEY=your-key-here" > .env
# Single plugin
npm run eval:frontend
npm run eval:readiness
# All plugins
npm run eval:all
# Interactive web viewer
npx promptfoo view
# Compute pass@k metrics
python eval-infra/scripts/compute-pass-at-k.py --results evals/ai-readiness/.promptfoo/output.json --k 1 3 5
See docs/GETTING_STARTED.md for detailed setup instructions.
| Tool | Purpose |
|---|---|
| Promptfoo | Eval harness + LLM grading |
| ESLint | Code-based grading (lint) |
| Prettier | Code-based grading (format) |
| axe-core | Accessibility assertion engine |
| Vite | Test fixture builds (frontend-dev) |
MIT
Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.
Production-ready workflow orchestration with 84 marketplace plugins, 192 local specialized agents, and 156 local skills - optimized for granular installation and minimal token usage
Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations