From nlpm
Sets up TDD workflows for NL artifacts: write specs in .nlpm-test/, run /nlpm:test for RED/GREEN cycle, and score with /nlpm:score.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nlpm:testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```
1. Write spec (.nlpm-test/artifact-name.spec.md) — define expectations
2. /nlpm:test — RED: spec fails (artifact doesn't exist)
3. Write the artifact — create the NL artifact
4. /nlpm:test — check if it passes
5. /nlpm:score — check quality score
6. Iterate until GREEN: all specs pass + score ≥ threshold
Location: .nlpm-test/ directory in the project root (or alongside the artifact).
Filename convention: <artifact-name>.spec.md — matches the artifact filename without path.
---
artifact: agents/my-agent.md # path to the artifact being tested
type: agent # agent | skill | command | rule | hook | prompt
min_score: 85 # minimum /nlpm:score threshold for this artifact
---
Body sections (all optional — include what matters for this artifact):
## Triggers On
Queries that SHOULD trigger this artifact:
- "review my database migrations before deploying"
- "check if these schema changes are safe"
- "audit the migration for breaking changes"
## Does Not Trigger On
Queries that should NOT trigger this artifact:
- "write a migration for adding a users table"
- "help me with CSS styling"
- "deploy to production"
## Output Contains
Expected elements in the output:
- "## Migration Review" (heading present)
- "| Table | Change | Risk |" (table structure)
- severity classification (CRITICAL/HIGH/MEDIUM/LOW)
## Output Format
The output should be a markdown report with:
1. Summary section with counts
2. Findings table with columns: File, Issue, Severity
3. Action items list
## Handles Input
| Input | Expected Behavior |
|-------|------------------|
| (empty) | Score all artifacts in cwd |
| directory path | Score artifacts in that directory |
| nonexistent path | Error: "Directory not found: {path}" |
| file path | Score that single file |
## Follows Rules
Code that SHOULD comply:
```python
result: Result[User, AppError] = get_user(id)
Code that SHOULD violate:
user = get_user(id).unwrap() # should be flagged
### frontmatter_valid (all types)
```markdown
## Frontmatter Valid
Required fields:
- description: present and trigger-style ("Use when...")
- model: sonnet
- tools: [Read, Glob, Grep]
- skills: [nlpm:conventions, nlpm:scoring]
NLPM Test Report
Spec Artifact Result Details
─────────────────────────────────────────────────────────────────────────────────
my-agent.spec.md agents/my-agent.md PASS 5/5 checks
my-skill.spec.md skills/core/SKILL.md FAIL 3/5 checks
✗ Trigger: "optimize React hooks" → predicted NO trigger (expected YES)
✗ Score: 68/100 (min: 85)
Overall: 1 passed, 1 failed (50%)
RED items (fix these):
1. skills/core/SKILL.md — trigger gap: "optimize React hooks" not covered by description
2. skills/core/SKILL.md — score 68 < min 85: missing <example> blocks (-15)
handles_input for commandsmin_score should match your project's threshold (default 85 for new artifacts, 70 for legacy)The tester discovers specs by:
.nlpm-test/ directorymy-agent.spec.md → agents/my-agent.md (uses the artifact: frontmatter field)artifact path doesn't exist → spec is RED by default (artifact not yet created — this is the TDD "write test first" state)RED — write the spec first. Create .nlpm-test/my-agent.spec.md with artifact: agents/my-agent.md and a triggers_on: block listing 5 user queries the agent should match. The artifact does not exist yet — /nlpm:test reports RED with one failure: artifact not found.
GREEN — write the artifact. Create agents/my-agent.md with frontmatter (name, description, model, tools) and a body. The description must be specific enough that the listed trigger queries land on it. Re-run /nlpm:test: the agent now exists, the tester predicts triggers, and the spec passes.
REFACTOR — change the body, re-run the spec. Edit the artifact's behavior. The same spec runs unchanged; if a refactor breaks a trigger query or violates a handles_input case, the test goes RED. Fix forward, re-run.
This is the natural-language analogue of pytest or jest — specs are the contract; artifacts are the implementation.
This skill covers the spec format and runner contract for NL-TDD. For the
scoring rubric the tester compares against, see nlpm:scoring. For the
schemas of artifact frontmatter the spec validates, see nlpm:conventions.
npx claudepluginhub xiaolai/nlpm --plugin nlpmChecks NL artifacts (skills, agents, prompts) for anti-patterns like vague triggers, prohibitions without alternatives, oversized skills, and monolithic prompts. Use when authoring or reviewing files.
Generates EvalView test cases from SKILL.md files using LLM, captures real agent interactions as tests, or creates individual test YAMLs manually.
Tests Claude Code skills before deployment using TDD RED-GREEN-REFACTOR: baseline failures without skill, write fixes, iterate pressure scenarios to resist rationalizations.