Skill

report

Displays a terminal scorecard of benchmark results with pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or SDK performance.

developer-tools

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentic-usability:report [project-directory] [--json] [--run runId]

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Argument hint[project-directory] [--json] [--run runId]

Tool Access

This skill is limited to the following tools:

Bash(agentic-usability *)ReadGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Display the benchmark scorecard for the pipeline.

SKILL.md

56 lines · ~483 tokens

Stats

LanguageTypeScript

Stars15

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Benchmark Report

Display the benchmark scorecard for the pipeline.

agentic-usability report -p $ARGUMENTS

Options

--json: Output raw structured JSON instead of the colored table
--run <runId>: Show results for a specific run (default: latest)

Where Results Live

results/<runId>/
  report.json                        # Aggregate scorecard
  <targetName>/<testId>/
    judge.json                       # Per-test judge scores
    generated-solution.json          # Agent's solution
    agent-notes.md                   # Agent's working notes

Scoring Dimensions

Dimension	Range	What it measures
`apiDiscovery`	0-100	Found correct SDK endpoints/methods?
`callCorrectness`	0-100	API calls constructed correctly?
`completeness`	0-100	All requirements handled?
`functionalCorrectness`	0-100	Code runs and produces correct output?
`overallVerdict`	boolean	Solution works? (pass/fail)

The report aggregates these across all test cases and breaks them down by difficulty (easy/medium/hard).

Finding Runs

Runs are stored as subdirectories in results/ containing run.json:

{ "id": "run-2026-04-25T10-30-00-000Z", "createdAt": "...", "targets": [...], "testCount": 15, "label": "..." }

To list all runs, look for results/*/run.json files.

Present the results to the user. If they want deeper analysis, suggest using the insights skill.

For detailed file inventory, see pipeline-guide.md.

report

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

report

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Benchmark Report

Options

Where Results Live

Scoring Dimensions

Finding Runs

Similar Skills

Benchmark Report

Options

Where Results Live

Scoring Dimensions

Finding Runs

Similar Skills