Skill

challenge

Use when testing debate agent bug-finding accuracy against curated code challenges — F1 scoring, 'test debate agents on challenges', 'benchmark agents'.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/autoimprove:challenge [--suite puzzles|all] [--language python|typescript|go|rust|all] [--difficulty easy|medium|hard|all] [--tags <tag>] [--id <challenge-id>] [--dry-run]

User invocable

Model invocable

Inline context

Default effort

Argument hint

[--suite puzzles|all] [--language python|typescript|go|rust|all] [--difficulty easy|medium|hard|all] [--tags <tag>] [--id <challenge-id>] [--dry-run]

Tool Access

This skill is limited to the following tools:

ReadBashGlobGrepAgent

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

<SKILL-GUARD>

Supporting Files

references/runbook.md

SKILL.md

209 lines · ~1.9k tokens

Stats

LanguageShell

Stars2

MaintenanceExcellent

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

challenge

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

challenge

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

1. 🔍 Parse Arguments

2. 📋 Load Manifest

3. 🔄 Run Each Challenge

3a. 🔍 Read Challenge Code

3b. 🛠️ Run Single-Pass Adversarial Review

3c. ✅ Score Against Answer Key

3d. 📋 Report Result

4. 📊 Aggregate and Report

5. 🏷️ Log Results

Final Step — Cleanup

Reference Files

Similar Skills

1. 🔍 Parse Arguments

2. 📋 Load Manifest

3. 🔄 Run Each Challenge

3a. 🔍 Read Challenge Code

3b. 🛠️ Run Single-Pass Adversarial Review

3c. ✅ Score Against Answer Key

3d. 📋 Report Result

4. 📊 Aggregate and Report

5. 🏷️ Log Results

Final Step — Cleanup

Reference Files

Similar Skills