From agentic-usability
Open the web UI to visually inspect, edit, and run the benchmark pipeline. Use when the user wants a visual interface for their pipeline.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-usability:inspect [project-directory] [--port 7373][project-directory] [--port 7373]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Launch the web-based inspector for the benchmark pipeline.
Launch the web-based inspector for the benchmark pipeline.
echo "Arguments: $ARGUMENTS"
--port <number>: Port for the local server (default: 7373)The web UI serves data from the project directory:
<project>/
config.json # Pipeline configuration
suite.json # Test suite (array of test cases)
results/
<runId>/ # e.g. run-2026-04-25T10-30-00-000Z
run.json # Run manifest (id, targets, testCount, label)
pipeline-state.json # Pipeline progress tracker
report.json # Aggregate scorecard (if pipeline completed)
<target>/<testId>/ # Per-test results
generated-solution.json # Agent's solution
judge.json # Judge scores
agent-notes.md # Agent's working notes
agent-output.log # Raw output
agent-session.jsonl # Agent conversation log
judge-session.jsonl # Judge conversation log
results/ with a run.json are runspipeline-state.json to see if a run is complete (stage: "report") or pausedRun agentic-usability inspect -p $ARGUMENTS to start the server. It opens the browser automatically. Press Ctrl+C to stop.
For the full file inventory, see pipeline-guide.md.
npx claudepluginhub pspdfkit-labs/agentic-usability --plugin agentic-usabilityDisplays a terminal scorecard of benchmark results with pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or SDK performance.
Opens local web dashboard to browse workflow runs, inspect step timelines, view artifacts, and diff executions side-by-side.
Runs AI-powered adversarial UI testing via the browse CLI — analyzes git diffs, explores full apps, and tests functional correctness, accessibility, responsive layout, and UX heuristics. Use for QA pull requests, auditing accessibility, or exploratory testing.