Skill

insights

Analyzes SDK benchmark results to identify failure patterns, documentation gaps, and API design issues. Use when reviewing evaluation runs or improving SDK usability.

developer-tools

api-development

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentic-usability:insights [project-directory]

User invocable

Model invocable

Forked subagent

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Argument hint[project-directory]

Tool Access

This skill is limited to the following tools:

ReadGlobGrep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.

SKILL.md

42 lines · ~461 tokens

Stats

LanguageTypeScript

Stars15

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

SDK Usability Insights

You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.

Files Available for Deep Dives

Results are at results/<runId>/<target>/<testId>/:

File	Content
`judge.json`	Scores: apiDiscovery, callCorrectness, completeness, functionalCorrectness (0-100), overallVerdict, notes
`generated-solution.json`	Agent's solution `[{path, content}]`
`agent-notes.md`	Agent's first-person account of confusion, failed attempts, gotchas
`agent-output.log`	Raw agent stdout/stderr
`agent-session.jsonl`	Full agent conversation log
`agent-egress.log.json`	Network traffic (what URLs the agent accessed)
`judge-session.jsonl`	Judge conversation log
`judge-egress.log.json`	Judge network traffic
`workspace-snapshot.tar.gz`	Full sandbox state

The test suite with reference solutions is at suite.json in the project root.

Scoring Context

0-20: Fundamentally wrong — 21-40: Major issues — 41-60: Notable mistakes — 61-80: Minor issues — 81-100: Excellent
overallVerdict can be true even with low apiDiscovery (different but working approach)
DNF entries have all-zero scores

The Full Analyst Prompt

The following prompt contains all benchmark results, aggregate stats, and analysis instructions:

!agentic-usability insights --prompt-only -p $ARGUMENTS

insights

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

insights

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

SDK Usability Insights

Files Available for Deep Dives

Scoring Context

The Full Analyst Prompt

Similar Skills

SDK Usability Insights

Files Available for Deep Dives

Scoring Context

The Full Analyst Prompt

Similar Skills