From deepagents-builder
Add a single eval scenario to an existing dataset interactively or from a production trace.
How this command is triggered — by the user, by Claude, or both
Slash command
/deepagents-builder:add-scenario [--from-trace <path-or-id>]This command is limited to the following tools:
The summary Claude sees in its command listing — used to decide when to auto-load this command
# Add Eval Scenario Add a single scenario to an existing eval dataset. ## Workflow ### Step 1: Parse Arguments Check `$ARGUMENTS` for `--from-trace`: - **`--from-trace <local-path>`**: Load a local trace file (JSON) - **`--from-trace langsmith:<run_id>`**: Fetch trace from LangSmith - **No arguments**: Interactive mode — ask user to describe the scenario ### Step 2: Find Existing Dataset Locate the dataset to add to: 1. Search `evals/datasets/*.yaml` and `evals/datasets/*.json` 2. If multiple datasets exist, ask which one to add to 3. If no dataset exists, suggest running `/design-e...
Add a single scenario to an existing eval dataset.
Check $ARGUMENTS for --from-trace:
--from-trace <local-path>: Load a local trace file (JSON)--from-trace langsmith:<run_id>: Fetch trace from LangSmithLocate the dataset to add to:
evals/datasets/*.yaml and evals/datasets/*.json/design-evals firstIf --from-trace was provided:
expected_toolsmock_responses from actual tool responsessuccess_criteriaregression if it's a bug fix scenarioIf no --from-trace:
eval-designer agent in single-scenario modeScenario '{name}' added to evals/datasets/{file}.yaml
Tags: [{tags}]
Next: Run /eval to generate the initial snapshot for this scenario.
npx claudepluginhub spulido99/claude-toolkit --plugin deepagents-builder/scenarioGenerates use cases, edge cases, and derivative scenarios from a seed scenario using autonomous iterative exploration. Supports --depth, --domain, --format, --focus flags.
/self-testRuns behavioral tests comparing baseline Claude responses against stay-on-target enhanced prompts across scenarios, grading with rubrics, and producing Markdown results file with summary.
/gen-evalsGenerates synthetic evaluation test cases (EVAL-*.md) for a named agent by parsing its prompt definition. Accepts optional --count flag to specify number of test cases.
/test-promptTests an AI prompt against 5+ scenarios (happy path, edge/ambiguous/error/adversarial cases), scores outputs for quality/consistency, and outputs a report with failures and targeted improvements.
/create-testsGenerate chaos test scenarios — define failure modes, faults, and error injection rules for your agent