benchmark-suite

From agent-harness-kit

Run Mini SWE-bench style harness regression tasks and A/B comparisons to measure harness improvement objectively.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agent-harness-kit:benchmark-suite

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadBash(node .harness/scripts/bench-runner.mjs:*)Bash(node .harness/scripts/bench-compare.mjs:*)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this when evaluating whether a harness change improved or regressed behavior.

SKILL.md

29 lines · ~182 tokens

Stats

LanguageJavaScript

Stars3

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

benchmark-suite

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

benchmark-suite

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Benchmark Suite

Commands

Output contract

Similar Skills

Benchmark Suite

Commands

Output contract

Similar Skills