Skill

nw-agent-testing

From nw

Provides 5-layer testing framework for AI agents: output quality, integration validation, adversarial review, peer critique, and security checks including prompt injection resistance.

Anthropic

testing

security

Popularity

Parent stars

500

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/nw:nw-agent-testing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Validate agent produces correct, well-structured outputs for typical inputs.

SKILL.md

85 lines · ~882 tokens

Stats

LanguagePython

Parent stars500

Parent forks51

MaintenanceExcellent

Last CommitApr 1, 2026

Actions

View Source View Plugin View on GitHub View README

Agent Testing Framework

5-Layer Testing Approach

Layer 1: Output Quality (Unit-Level)

Validate agent produces correct, well-structured outputs for typical inputs.

Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds

How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.

Layer 2: Integration / Handoff Validation

Validate correct input/output between agents in workflows.

Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)

How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).

Layer 3: Adversarial Output Validation

Challenge validity of agent outputs rather than accepting at face value.

Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)

How: Peer review by -reviewer agent using structured critique dimensions.

Layer 4: Adversarial Verification (Peer Review)

Independent review to catch biases and blind spots in agent design.

Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?

How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.

Layer 5: Security Validation

Test resilience against misuse and prompt injection.

Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope

How: Frontmatter fields enforce at platform level. Verify configuration.

Prompt Injection Resistance

Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter tools | Permission modes via permissionMode | Hook-based validation (PreToolUse, PostToolUse)

Do NOT add prose-based injection defense. Configure platform features:

---
tools: Read, Glob, Grep           # Only tools this agent needs
maxTurns: 30                       # Prevents runaway execution
permissionMode: default            # User approves dangerous actions
---

Security Validation Checklist

tools restricted to minimum necessary (least privilege)
maxTurns set to prevent runaway execution
permissionMode appropriate for risk level
No Bash unless agent requires command execution
No Write unless agent creates/modifies files
Description accurately describes scope
Subagent mode handles autonomous execution correctly
No sensitive data hardcoded in definition

Testing Workflow for New Agents

Create with minimal definition
Layer 1: Invoke with 2-3 representative inputs, check outputs
Layer 2: Run in workflow chain if applicable
Fix failures observed
Validate: Run 11-point checklist
Iterate: Add instructions only for observed failure modes

nw-agent-testing

Popularity

Invocation

Context Preview

SKILL.md

nw-agent-testing

Popularity

Invocation

Context Preview

SKILL.md

Agent Testing Framework

5-Layer Testing Approach

Layer 1: Output Quality (Unit-Level)

Layer 2: Integration / Handoff Validation

Layer 3: Adversarial Output Validation

Layer 4: Adversarial Verification (Peer Review)

Layer 5: Security Validation

Prompt Injection Resistance

Security Validation Checklist

Testing Workflow for New Agents

Similar Skills

Agent Testing Framework

5-Layer Testing Approach

Layer 1: Output Quality (Unit-Level)

Layer 2: Integration / Handoff Validation

Layer 3: Adversarial Output Validation

Layer 4: Adversarial Verification (Peer Review)

Layer 5: Security Validation

Prompt Injection Resistance

Security Validation Checklist

Testing Workflow for New Agents

Similar Skills