By dlabs
Probabilistic scenario validation for agentic software. Replaces rigid tests with LLM-judged user-story scenarios, satisfaction scoring across observed trajectories, and a Digital Twin Universe for behavioral clones of third-party services. Designed for codebases where agents and LLMs are first-class design primitives.
List, search, and manage the scenario catalog — view scenarios by domain, check twin coverage, and show catalog health
Configure chaos injection for a digital twin — set failure probabilities, choose profiles, and customize failure modes
Initialize the scenario catalog, config, and directory structure for a new project
Display the satisfaction report — per-scenario scores, threshold status, trend analysis, and comparison to previous runs
Review and refine an existing scenario — check quality, update criteria, bump version
Evaluates individual trajectories against scenario satisfaction criteria and anti-patterns. Produces binary satisfactory/unsatisfactory judgments with reasoning. Uses LLM-as-judge methodology.
Translates natural-language user stories into structured YAML scenarios with personas, satisfaction criteria, and anti-patterns. References existing scenarios to avoid duplication.
Reviews authored scenarios for completeness, ambiguity, and testability. Ensures satisfaction criteria are specific enough for LLM judgment. Flags vague or overly rigid scenarios.
Executes scenarios against the codebase with digital twins substituted for real services. Records every action, state transition, and outcome as a trajectory. Manages parallel execution for volume runs.
Analyzes the codebase to discover third-party API usage and generates behavioral clones (digital twins) with API surfaces, state machines, and chaos injection configurations.
Framework for building behavioral clones (digital twins) of third-party services. Covers API surface replication, state machines, chaos injection, and twin composition for validating at volumes and rates impossible against live services.
Framework for measuring, aggregating, and trending satisfaction scores across scenarios. Covers LLM-as-judge methodology, trajectory evaluation, threshold configuration, comparison analysis, and reporting.
Framework for authoring, structuring, and managing scenarios — end-to-end user stories validated probabilistically by LLM-as-judge. Covers the holdout principle, scenario anatomy, versioning, composition, and anti-reward-hacking patterns.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A collection of Claude Code plugins by d.labs.
| Plugin | Description | Version |
|---|---|---|
| blueprint-dev | Planning-first, design-driven development workflow with A/B design variants, architecture robustness checks, trunk-based development enforcement, agent team swarms, compound knowledge accumulation, browser testing, git worktree management, lightweight fast-lane workflows, code simplification, parallel batch operations, and skill eval framework. | 2.0.0 |
| design-studio | Code-first design exploration workflow. Product planning, section screen design, brand discovery, variant generation, iterative refinement, token extraction, and production Next.js components using your project's component libraries (shadcn/ui, Radix UI). | 0.5.0 |
| scenario-testing | Probabilistic scenario validation for agentic software. LLM-judged user-story scenarios, satisfaction scoring across observed trajectories, and a Digital Twin Universe for behavioral clones of third-party services. | 0.1.0 |
Step 1: Add the marketplace:
claude plugin marketplace add https://github.com/dlabs/claude-marketplace
Step 2: Install a plugin:
claude plugin install blueprint-dev
claude plugin install design-studio
claude plugin install scenario-testing
This installs with user scope by default (available across all projects). To install for a single project only:
claude plugin install blueprint-dev --scope project
Step 3: Restart Claude Code for the plugin to take effect.
Planning-first, design-driven development workflow for Claude Code.
/discover to /compound, plus lightweight fast-lane, batch operations, browser testing, and video walkthroughs/blueprint-dev:bp:discover # Detect your stack
/blueprint-dev:bp:plan "feature" # Plan a feature
/blueprint-dev:bp:lfg "feature" # Full pipeline
/blueprint-dev:bp:go "small task" # Fast lane for small work
/blueprint-dev:bp:batch "change" # Parallel codebase-wide changes
/blueprint-dev:bp:test-browser # Browser tests on affected pages
See the full Blueprint-Dev Guide for detailed usage.
Code-first design exploration — from product definition to production components.
/design-studio:product # Define your product
/design-studio:design-brand # Build a Minimum Viable Brand
/design-studio:design "description" # Generate HTML design variants
/design-studio:design-pick # Pick a variant and extract tokens
/design-studio:design-ship # Convert to Next.js components
Probabilistic validation for agentic software — where traditional tests fall short.
/scenario-testing:author "user story" # Write a scenario from a user story
/scenario-testing:run # Execute scenarios against your system
/scenario-testing:score # Score satisfaction across trajectories
/scenario-testing:twin "service" # Create a behavioral clone of a service
MIT
npx claudepluginhub dlabs/claude-marketplace --plugin scenario-testingPlanning-first, design-driven development workflow with A/B design variants, architecture robustness checks, trunk-based development enforcement, predefined agent team swarms, compound knowledge accumulation, browser automation testing, feature video walkthroughs, git worktree management, lightweight fast-lane workflows, code simplification, parallel batch operations, and skill eval framework. Stack-agnostic with auto-detection.
Code-first design exploration workflow. Product planning → section screen design → brand discovery → variant generation → iterative refinement → token extraction → production components. Generate standalone HTML variants, design app screens scoped to sections, refine picked designs with feedback, extract design tokens, and ship as Next.js components using your project's component libraries (shadcn/ui, Radix UI).
Commands for scenario simulation and decision modeling
Prompt engineering techniques for accurate, grounded Claude responses — anti-hallucination workflow with citation-backed analysis
Verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine (Investigate → Design → Implement → Verify → Ship), 8 specialist agents, an interactive setup wizard. Every skill has rationalizations + evidence requirements. Built for senior ICs and tech leads.
Comprehensive Behavior-Driven Development principles, practices, and collaboration patterns.
Four-layer test framework for Claude Code plugin skills — structure validation, trigger accuracy, session testing, and skill value comparison
Use /ultrathink <TASK_DESCRIPTION> to launch a Coordinator Agent that directs four specialist sub-agents—Architect, Research, Coder, and Tester—to analyze, design, implement, and validate your coding task. The process breaks the task into clear steps, gathers insights, and synthesizes a cohesive solution with actionable outputs. Relevant files can be referenced ad-hoc using @ filename syntax.