Skill

auditing-subagents

ALWAYS invoke this skill when auditing, reviewing, or evaluating subagent configuration files. NEVER audit subagents without this skill.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/develop:auditing-subagents <subagent-path>

User invocable

Model invocable

Inline context

Default effort

Argument hint<subagent-path>

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Invoke the `develop:standardizing-agent-prompts` skill before proceeding. If that skill is unavailable, report the missing skill and continue with the closest available workflow.

SKILL.md

299 lines · ~2.9k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Tags

Invoke the develop:standardizing-agent-prompts skill before proceeding. If that skill is unavailable, report the missing skill and continue with the closest available workflow.

Invoke the develop:creating-subagents skill before proceeding. If that skill is unavailable, report the missing skill and continue with the closest available workflow.

Evaluate subagent configuration files against best practices for role definition, prompt quality, tool selection, model appropriateness, and effectiveness. Provide actionable findings with contextual judgment, not arbitrary scores.

This ensures subagents follow proper structure, configuration, pure XML formatting, and implementation patterns.

<quick_start>

Read best practices from the creating-subagents skill and its reference files
Read the subagent configuration file at $ARGUMENTS
Evaluate against all areas: YAML, role, workflow, constraints, tools, XML structure
Report findings using the severity-based output format

</quick_start>

- MUST check for markdown headings (##, ###) in subagent body and flag as critical - MUST verify all XML tags are properly closed - MUST distinguish between functional deficiencies and style preferences - NEVER flag missing tag names if the content/function is present under a different name (e.g., `` vs ``) - ALWAYS verify information isn't present under a different tag name or format before flagging - DO NOT flag formatting preferences that don't impact effectiveness - MUST flag missing functionality, not missing exact tag names - ONLY flag issues that reduce actual effectiveness - ALWAYS apply contextual judgment based on subagent purpose and complexity

<critical_workflow> MANDATORY: Read best practices FIRST, before auditing:

Both skills are already injected above. Read the creating-subagents reference files:
- ${CLAUDE_SKILL_DIR}/../creating-subagents/references/subagents.md
- ${CLAUDE_SKILL_DIR}/../creating-subagents/references/writing-subagent-prompts.md
The standardizing-agent-prompts skill is already injected above — covers voice, description style, constraint language, and anti-patterns.
Before penalizing any missing section, search entire file for equivalent content under different tag names
Read the subagent configuration file at $ARGUMENTS
Evaluate against best practices from steps 1-3, focusing on functionality over formatting

Use ACTUAL patterns from references, not memory. </critical_workflow>

<evaluation_areas> These issues significantly hurt effectiveness - flag as critical:

yaml_frontmatter:

name: Lowercase-with-hyphens, unique, clear purpose
description: Includes BOTH what it does AND when to use it, specific trigger keywords

role_definition:

Does <role> section clearly define specialized expertise?
Anti-pattern: Generic helper descriptions ("helpful assistant", "helps with code")
Pass: Role specifies domain, expertise level, and specialization

workflow_specification:

Does prompt include workflow steps (under any tag like <workflow>, <approach>, <critical_workflow>, etc.)?
Anti-pattern: Vague instructions without clear procedure
Pass: Step-by-step workflow present and sequenced logically

constraints_definition:

Does prompt include constraints section with clear boundaries?
Anti-pattern: No constraints specified, allowing unsafe or out-of-scope actions
Pass: At least 3 constraints using strong modal verbs (MUST, NEVER, ALWAYS)

tool_access:

Are tools limited to minimum necessary for task?
Anti-pattern: All tools inherited without justification or over-permissioned access
Pass: Either justified "all tools" inheritance or explicit minimal list

xml_structure:

No markdown headings in body (##, ###) - use pure XML tags
All XML tags properly opened and closed
No hybrid XML/markdown structure
Note: Markdown formatting WITHIN content (bold, italic, lists, code blocks) is acceptable

Check against `/standardizing-agent-prompts` conventions:

Voice: Uses imperative mood for instructions, "Claude" for failure modes/tendencies. Never "the agent", "the model", or "you"
Description style: Directive pattern for description field. Matches user speech
Constraint language: Strong modal verbs (MUST/NEVER/ALWAYS) in constraint blocks
Anti-patterns: No banned phrases ("helpful assistant", "helps with", "please"). No structural anti-patterns (explaining Claude to Claude, motivational prose)

These improve quality - flag as recommendations:

focus_areas:

Does prompt include focus areas or equivalent specificity?
Pass: 3-6 specific focus areas listed somewhere in the prompt

output_format:

Does prompt define expected output structure?
Pass: <output_format> section with clear structure

model_selection:

Is model choice appropriate for task complexity?
Guidance: Simple/fast → Haiku, Complex/critical → Sonnet, Highest capability → Opus

success_criteria:

Does prompt define what success looks like?
Pass: Clear definition of successful task completion

error_handling:

Does prompt address failure scenarios?
Pass: Instructions for handling tool failures, missing data, unexpected inputs

examples:

Does prompt include concrete examples where helpful?
Pass: At least one illustrative example for complex behaviors

Note these as potential enhancements - don't flag if missing:

context_management: For long-running agents, context/memory strategy extended_thinking: For complex reasoning tasks, thinking approach guidance prompt_caching: For frequently invoked agents, cache-friendly structure testing_strategy: Test cases, validation criteria, edge cases observability: Logging/tracing guidance evaluation_metrics: Measurable success metrics

<contextual_judgment> Apply judgment based on subagent purpose and complexity:

Simple subagents (single task, minimal tools):

Focus areas may be implicit in role definition
Minimal examples acceptable
Light error handling sufficient

Complex subagents (multi-step, external systems, security concerns):

Missing constraints is a real issue
Comprehensive output format expected
Thorough error handling required

Delegation subagents (coordinate other subagents):

Context management becomes important
Success criteria should measure orchestration success

Always explain WHY something matters for this specific subagent, not just that it violates a rule. </contextual_judgment>

<anti_patterns> Flag these structural violations:

Using markdown headings (##, ###) for structure instead of XML tags.

Why this matters: Subagent.md files are consumed only by Claude, never read by humans. Pure XML structure provides ~25% better token efficiency and consistent parsing.

How to detect: Search file for ## or ### symbols outside code blocks/examples.

Fix: Convert to semantic XML tags (e.g., ## Workflow → <workflow>)

XML tags not properly closed or mismatched nesting.

Why this matters: Breaks parsing, creates ambiguous boundaries, harder for Claude to parse structure.

How to detect: Count opening/closing tags, verify each <tag> has </tag>.

Fix: Add missing closing tags, fix nesting order.

Mixing XML tags with markdown headings inconsistently.

Why this matters: Inconsistent structure makes parsing unpredictable, reduces token efficiency benefits.

How to detect: File has both XML tags (<role>) and markdown headings (## Workflow).

Fix: Convert all structural headings to pure XML.

Generic tag names like ``, ``, ``.

Why this matters: Tags should convey meaning, not just structure. Semantic tags improve readability and parsing.

How to detect: Tags with generic names instead of purpose-based names.

Fix: Use semantic tags (<workflow>, <constraints>, <validation>). </anti_patterns>

<output_format> Emit the verdict as JSON conforming to the canonical schema in plugins/spec-tree/skills/auditing/scripts/verdict.py. The skill's entire output is the JSON verdict. The caller captures the JSON and routes it through emit_verdict.py with the requested --format (defaulting to markdown+json for PR-comment delivery).

The skill's overall is PASS iff the critical-issues row has no findings with severity REJECT; FAIL if any critical finding is REJECT; UNKNOWN if the subagent file cannot be read or the audit cannot complete. Recommendations land as WARNING findings; strengths and quick fixes land as INFO findings.

{
  "schema_version": 1,
  "skill": "auditing-subagents",
  "target": "<subagent-path>",
  "overall": "PASS | FAIL | UNKNOWN",
  "rows": [
    {
      "name": "critical-issues",
      "status": "PASS | FAIL | UNKNOWN",
      "findings": [
        {
          "id": "f-001",
          "file": "<subagent-file>",
          "line": null,
          "rule": "<issue-category>",
          "severity": "REJECT",
          "message": "Current: <…>. Should be: <…>. Why it matters: <…>. Fix: <…>."
        }
      ]
    },
    { "name": "recommendations", "status": "PASS", "findings": [] },
    { "name": "strengths", "status": "PASS", "findings": [] },
    { "name": "quick-fixes", "status": "PASS", "findings": [] }
  ],
  "metadata": {
    "subagent_type": "simple | complex | delegation",
    "tool_access": "appropriate | over-permissioned | under-specified",
    "model_selection": "appropriate | reconsider"
  }
}

</output_format>

Before completing the audit, verify:

Completeness: All evaluation areas assessed
Precision: Every issue has file:line reference where applicable
Accuracy: Line numbers verified against actual file content
Actionability: Recommendations are specific and implementable
Fairness: Verified content isn't present under different tag names before flagging
Context: Applied appropriate judgment for subagent type and complexity
Examples: At least one concrete example given for major issues

<final_step> After presenting findings, offer:

Implement all fixes automatically
Show detailed examples for specific issues
Focus on critical issues only
Other

</final_step>

<success_criteria> A complete subagent audit includes:

Assessment summary (1-2 sentences on fitness for purpose)
Critical issues identified with file:line references
Recommendations listed with specific benefits
Strengths documented (what's working well)
Quick fixes enumerated
Context assessment (subagent type, tool access, model selection)
Estimated effort to fix
Post-audit options offered to user
Fair evaluation that distinguishes functional deficiencies from style preferences

</success_criteria>

auditing-subagents

Invocation

Context Preview

SKILL.md

auditing-subagents

Invocation

Context Preview

SKILL.md

Similar Skills

Similar Skills