Search everything...

Stats

Actions

Available In

scenario-testing

Name: scenario-testing
Author: dlabs

By dlabs

Probabilistic scenario validation for agentic software. Replaces rigid tests with LLM-judged user-story scenarios, satisfaction scoring across observed trajectories, and a Digital Twin Universe for behavioral clones of third-party services. Designed for codebases where agents and LLMs are first-class design primitives.

npx claudepluginhub dlabs/claude-marketplace --plugin scenario-testing

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands10

st:catalog

/catalog

List, search, and manage the scenario catalog — view scenarios by domain, check twin coverage, and show catalog health

st:chaos

/chaos

Configure chaos injection for a digital twin — set failure probabilities, choose profiles, and customize failure modes

st:init

/init

Initialize the scenario catalog, config, and directory structure for a new project

st:report

/report

Display the satisfaction report — per-scenario scores, threshold status, trend analysis, and comparison to previous runs

st:review

/review

Review and refine an existing scenario — check quality, update criteria, bump version

Agents6

satisfaction-judge

/satisfaction-judge

Evaluates individual trajectories against scenario satisfaction criteria and anti-patterns. Produces binary satisfactory/unsatisfactory judgments with reasoning. Uses LLM-as-judge methodology.

scenario-author

/scenario-author

Translates natural-language user stories into structured YAML scenarios with personas, satisfaction criteria, and anti-patterns. References existing scenarios to avoid duplication.

scenario-reviewer

/scenario-reviewer

Reviews authored scenarios for completeness, ambiguity, and testability. Ensures satisfaction criteria are specific enough for LLM judgment. Flags vague or overly rigid scenarios.

trajectory-runner

/trajectory-runner

Executes scenarios against the codebase with digital twins substituted for real services. Records every action, state transition, and outcome as a trajectory. Manages parallel execution for volume runs.

twin-builder

/twin-builder

Analyzes the codebase to discover third-party API usage and generates behavioral clones (digital twins) with API surfaces, state machines, and chaos injection configurations.

Skills3

digital-twin-universe

/digital-twin-universe

Framework for building behavioral clones (digital twins) of third-party services. Covers API surface replication, state machines, chaos injection, and twin composition for validating at volumes and rates impossible against live services.

satisfaction-metrics

/satisfaction-metrics

Framework for measuring, aggregating, and trending satisfaction scores across scenarios. Covers LLM-as-judge methodology, trajectory evaluation, threshold configuration, comparison analysis, and reporting.

scenario-methodology

/scenario-methodology

Framework for authoring, structuring, and managing scenarios — end-to-end user stories validated probabilistically by LLM-as-judge. Covers the holdout principle, scenario anatomy, versioning, composition, and anti-reward-hacking patterns.

Hooks1

Event Hooks

1 hook across 1 event

Stats

Version0.1.0

LanguageShell

Stars1

Forks1

MaintenanceGood

LicenseMIT

Last CommitFeb 21, 2026

AddedMar 10, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

dlabs-marketplace1

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

d.labs Claude Marketplace

A collection of Claude Code plugins by d.labs.

Available Plugins

Plugin	Description	Version
blueprint-dev	Planning-first, design-driven development workflow with A/B design variants, architecture robustness checks, trunk-based development enforcement, agent team swarms, compound knowledge accumulation, browser testing, git worktree management, lightweight fast-lane workflows, code simplification, parallel batch operations, and skill eval framework.	2.0.0
design-studio	Code-first design exploration workflow. Product planning, section screen design, brand discovery, variant generation, iterative refinement, token extraction, and production Next.js components using your project's component libraries (shadcn/ui, Radix UI).	0.5.0
scenario-testing	Probabilistic scenario validation for agentic software. LLM-judged user-story scenarios, satisfaction scoring across observed trajectories, and a Digital Twin Universe for behavioral clones of third-party services.	0.1.0

Installation

Step 1: Add the marketplace:

claude plugin marketplace add https://github.com/dlabs/claude-marketplace

Step 2: Install a plugin:

claude plugin install blueprint-dev
claude plugin install design-studio
claude plugin install scenario-testing

This installs with user scope by default (available across all projects). To install for a single project only:

claude plugin install blueprint-dev --scope project

Step 3: Restart Claude Code for the plugin to take effect.

blueprint-dev

Planning-first, design-driven development workflow for Claude Code.

26 specialized agents — from architecture review to compound knowledge extraction
21 slash commands — full pipeline from /discover to /compound, plus lightweight fast-lane, batch operations, browser testing, and video walkthroughs
15 skills — reference knowledge for planning, A/B testing, trunk-based dev, browser automation, git worktrees, eval framework, and more
5 eval suites — skill quality benchmarking with prompt-criteria tests
1 hook — automatic stack detection on session start

/blueprint-dev:bp:discover              # Detect your stack
/blueprint-dev:bp:plan "feature"        # Plan a feature
/blueprint-dev:bp:lfg "feature"         # Full pipeline
/blueprint-dev:bp:go "small task"       # Fast lane for small work
/blueprint-dev:bp:batch "change"        # Parallel codebase-wide changes
/blueprint-dev:bp:test-browser          # Browser tests on affected pages

See the full Blueprint-Dev Guide for detailed usage.

design-studio

Code-first design exploration — from product definition to production components.

10 specialized agents — product planner, brand builder, screen designer, variant generator, and more
12 slash commands — product planning, brand building, screen design, variant picking, token extraction, and Next.js conversion
2 skills — design system patterns and product planning methodology

/design-studio:product                  # Define your product
/design-studio:design-brand             # Build a Minimum Viable Brand
/design-studio:design "description"     # Generate HTML design variants
/design-studio:design-pick              # Pick a variant and extract tokens
/design-studio:design-ship              # Convert to Next.js components