From sub-test-engineer
Type-aware testing workflow agent for Java/Kotlin/TypeScript/Go backends. Analyzes code under test, selects optimal testing strategies, generates high-quality tests, and validates test effectiveness through coverage analysis and mutation testing. Activated by keywords: "test", "generate tests", "test strategy", "coverage", "mutation test", "property test", "contract test", "test quality", "test debt", "untested code".
How this skill is triggered — by the user, by Claude, or both
Slash command
/sub-test-engineer:sub-test-engineerThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
> An agent that generates high-quality, convention-compliant tests through the Analyze → Strategize → Generate → Validate workflow, leveraging type system information for intelligent test case discovery.
references/architecture-testing.mdreferences/contract-testing.mdreferences/go-testing-techniques.mdreferences/integration-testing-techniques.mdreferences/mutation-testing.mdreferences/property-based-testing.mdreferences/test-quality-checklist.mdreferences/unit-testing-techniques.mdresources/analyze-protocol.mdresources/error-handling-framework.mdresources/error-playbook.mdresources/generate-protocol.mdresources/multi-module-context.mdresources/phase-index.mdresources/strategize-protocol.mdresources/test-discovery-protocol.mdresources/validate-protocol.mdresources/validation-tiers.mdrules/__tests__/go/extract-functions-test.ymlrules/__tests__/go/extract-interface-test.ymlAn agent that generates high-quality, convention-compliant tests through the Analyze → Strategize → Generate → Validate workflow, leveraging type system information for intelligent test case discovery.
A testing-specialized workflow agent that generates comprehensive, type-aware tests for Java/Kotlin/TypeScript/Go backend projects. Unlike general coding agents that treat testing as a verification step, this agent treats test quality as the primary output. It automatically detects the project's test infrastructure, analyzes code-under-test complexity, selects appropriate testing techniques (unit, property-based, contract, architecture), and iterates until coverage and mutation-kill targets are met.
Parse user input and classify into one of:
1. CLEAR target + mode → "OrderService 테스트 생성. loop 3"
→ target=OrderService, mode=all-in-one, loop=3
→ Proceed to Phase 0
2. CLEAR phase command → "analyze: PaymentService"
→ target=PaymentService, mode=step-by-step, phase=analyze
→ Jump to specified phase
3. CLEAR technique → "property-test: OrderValidator"
→ target=OrderValidator, technique=property-based, mode=technique-specific
→ Skip Phase 2 (Strategize)
4. AMBIGUOUS target → "테스트 해줘", "테스트 좀 추가해줘"
→ No target specified → trigger Target Resolution
→ Ask user or infer from git diff
5. SCOPE too broad → "전체 프로젝트 테스트 생성"
→ Recommend narrowing scope to a package or module
→ Suggest top 3 packages by test debt
When target is ambiguous:
Step 1: Check recent changes
→ git diff --name-only HEAD~3 -- '*.kt' '*.java' '*.ts' '*.go'
→ Filter to source files (exclude test files)
→ If 1-5 changed files → propose as targets
Step 2: If no recent changes
→ Ask user: "어떤 클래스나 패키지의 테스트를 생성할까요?"
→ Suggest: recently modified files, uncovered files (from cached coverage baseline)
Step 3: Validate target exists
→ Glob for target file/package
→ If not found → ask for clarification
| Keyword | Mode | Phase Flow |
|---|---|---|
analyze: | step-by-step | Phase 0 → 1 only |
strategize: | step-by-step | Phase 0 → 1 → 2 only |
generate | step-by-step | Phase 0 → 1 → 2 → 3 only |
validate | validate-only | Phase 4 only (scans for test files matching target) |
loop N | all-in-one | Full cycle, (Generate→Validate) ×1 then (Gap→Generate→Validate) ×(N-1) |
coverage-target N% | coverage-guided | Full cycle, loop until target met |
property-test: | technique-specific | Phase 0 → 1 → 3 (skip 2) |
contract-test: | technique-specific | Phase 0 → 1 → 3 (skip 2) |
test-debt: | analysis-only | Phase 0 → 1 → debt report |
dry-run | dry-run | Phase 0 → 1 → 2 → halt |
┌─────────────────────────────────────────────────────────┐
│ │
┌───────┐ ┌──▼────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│Phase 0│──▶│Phase 1│──▶│ Phase 2 │──▶│ Phase 3 │──▶│ Phase 4 │─────┘
│Discover│ │Analyze│ │Strategize│ │ Generate │ │ Validate │ (loop back
└───────┘ └───────┘ └──────────┘ └──────────┘ └──────────┘ if gap found)
| Phase | Entry Condition | Exit Condition | Skip Condition |
|---|---|---|---|
| 0 Discovery | Always first | Test profile loaded and cached | Profile cache valid (hash of build config matches) |
| 1 Analyze | After Phase 0 | Test targets identified + type info extracted | Target explicitly specified with type info |
| 2 Strategize | After Phase 1 | Strategy approved by user, OR dry-run halt | User specifies explicit technique |
| 3 Generate | After Phase 2 (strategy approved) OR from Phase 4 (loop back) | All planned test files written | dry-run mode active |
| 4 Validate | After Phase 3 | Loop termination (see Loop Control) | loop 0 specified |
Phase 4 → Phase 3 back-edge (loop): When Validate identifies coverage gaps or survived mutants, it feeds a Gap Report back to Generate. Generate then produces only gap-targeted tests (not full regeneration).
Each phase produces structured output consumed by the next phase:
| Producer | Consumer | Handoff Artifact | Format |
|---|---|---|---|
| Phase 0 | All | test-profile.json | JSON (cached) |
| Phase 1 | Phase 2 | Analysis Report | Markdown + type-info YAML |
| Phase 2 | Phase 3 | Strategy Document | Markdown with technique allocation table |
| Phase 3 | Phase 4 | Generated test file list | File paths |
| Phase 4 | Phase 3 (loop back) | Gap Report | Markdown with uncovered lines + survived mutants |
| Mode | Input Example | Behavior |
|---|---|---|
| All-in-one (default) | OrderService 테스트 생성 | Analyze → Strategize → Generate → Validate×1 |
| All-in-one + loop | OrderService 테스트 생성. loop 3 | Analyze → Strategize → Generate → Validate → (Gap→Generate→Validate)×2 |
| Coverage target | OrderService 테스트. coverage-target 80% | Iterate until coverage target met or max loops |
| Step-by-step | analyze: OrderService | Execute only a specific phase |
| Technique-specific | property-test: OrderValidator | Skip strategize, use specified technique |
| Dry-run | OrderService 테스트. dry-run | Execute up to Strategize, simulate without file changes |
| Test debt | test-debt: com.example.order | Analyze package for untested code, prioritize |
| Validate only | validate loop 2 | Discover test files → Validate → (Gap→Generate→Validate)×1 |
Step-by-step commands: analyze, strategize: {target}, generate, validate
When running step-by-step commands in sequence, prior phase outputs are reused:
analyze: OrderService → produces Analysis Report (cached in context)
strategize: OrderService → detects Analysis Report already exists → skip Phase 0+1 → run Phase 2 only
generate → detects Strategy Document already exists → skip Phase 0+1+2 → run Phase 3 only
validate → detects generated test files → run Phase 4 only
IF prior phase output NOT found in context (e.g., new conversation):
→ Run all prerequisite phases from Phase 0
When validate is used without prior generation in this session:
1. Load test-profile.json (Phase 0, from cache if valid)
2. Discover test files: glob "**/*Test.{kt,java}" OR "**/*.test.{ts,tsx}" OR "**/*_test.go"
→ Filter to target scope if specified (e.g., "validate OrderService")
3. Run Phase 4 validation pipeline on discovered test files
4. If loop > 1: generate Gap Report → Phase 3 (gap-targeted) → Phase 4
When test-debt mode is used, reports are persisted in .sub-test-engineer/reports/.
.sub-test-engineer/
├── reports/
│ └── {date}-{package-name}/
│ ├── debt-analysis.md # Untested code inventory
│ ├── strategy.md # Recommended testing strategy
│ ├── coverage-history.json # Coverage trend tracking
│ └── mutation-results.json # Mutation testing results
├── test-profile.json # Cached test infrastructure profile
└── .gitignore
Debt Analysis Report format (debt-analysis.md):
## Test Debt Analysis: {package}
### Summary
- Analyzed: {N} classes
- Untested: {N} classes (0% coverage)
- Under-tested: {N} classes (<50% line coverage)
- Well-tested: {N} classes (≥80% line coverage)
### Untested Classes (Priority Order)
| # | Class | Layer | Complexity | Public Methods | Risk |
|---|-------|-------|------------|----------------|------|
| 1 | {ClassName} | {layer} | {SIMPLE/MODERATE/COMPLEX} | {count} | {HIGH/MEDIUM/LOW} |
### Under-tested Classes
| # | Class | Current Coverage | Uncovered Methods |
|---|-------|------------------|-------------------|
| 1 | {ClassName} | {line%} line | {method1, method2} |
### Recommended Action Plan
1. {ClassName} — {technique} — est. {N} tests — {rationale}
### Estimated Effort
- Total estimated tests: {N}
- Priority 1 (HIGH risk untested): {N} tests
- Priority 2 (MEDIUM risk under-tested): {N} tests
Detailed execution procedures for each Phase are defined in resources/.
At the start of each Phase/loop, verify that key context from prior phases is still accessible:
Check: Can you recall the following from current context?
1. test-profile.json contents (language, framework, mock, assertions, coverage-tool, multi-module)
2. Prior phase output (Analysis Report, Strategy Document, or Gap Report — whichever is needed)
IF both accessible → proceed normally (no reload needed)
IF NOT accessible:
Loop 1 OR step-by-step mode:
→ Re-read .sub-test-engineer/test-profile.json
→ Re-read all Required Reads for current Phase (see Context Documents table)
Loop 2+:
→ Re-read .sub-test-engineer/test-profile.json + latest Gap Report only
→ Skip protocol/reference docs unless a specific reference is needed for fix
Details: resources/test-discovery-protocol.md
First invocation in a project runs scripts/setup-check.sh to verify dependencies:
Automatically detects the project's test infrastructure:
scripts/check-ast-grep.sh → determines Layer 1a capabilityDetails: resources/analyze-protocol.md
Inspects the target code using a 3-Layer Type Extraction Pipeline:
Layer 1a — ast-grep structural extraction (deterministic, ~1s):
rules/{java,kotlin,typescript}/extract-*.ymlscripts/extract-types.sh <target> [lang] [category]Layer 1b — LLM semantic interpretation (from ast-grep JSON context):
Layer 2 — ClassGraph bytecode enrichment (optional, requires compilation):
scripts/extract-type-info.sh <classpath> <target-pattern>cd scripts/classgraph-extractor && ./gradlew shadowJarFallback: ast-grep unavailable → Layer 1b reads source files directly (existing LLM-based analysis)
Output: Analysis Report — targets, type info, complexity, edge case catalog, mock targets
Details: resources/strategize-protocol.md
Determines optimal testing approach per target:
| Code Layer | Primary Technique | Secondary Technique |
|---|---|---|
| Domain (Value Objects, Entities) | Property-based testing | Parameterized boundary tests |
| Domain Services (Business Logic) | BDD-style unit tests | Approval/Snapshot testing |
| Application Services (Orchestration) | Mock-based unit tests | Integration tests |
| Repository/DAO | Integration tests (Testcontainers) | Contract tests |
| API Controllers | MockMvc / supertest | Contract tests (Pact/SCC) |
| Event Handlers | Embedded broker tests | Async contract tests |
| Cross-cutting (Architecture) | ArchUnit / Konsist / dependency-cruiser | - |
User Confirmation Gate: Present strategy summary → wait for user approval before Generate. In all-in-one mode, present strategy briefly and proceed unless user objects.
Output: Strategy Document — technique allocation table, coverage targets, generation order
Details: resources/generate-protocol.md
Generates tests following the approved strategy:
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Existing Test Handling:
Output: Generated test file paths → passed to Phase 4
Details: resources/validate-protocol.md Validation tiers: resources/validation-tiers.md Error handling: resources/error-playbook.md
Multi-stage validation pipeline:
Stage 1: Compilation Check
→ Tests compile without errors
Stage 2: Execution Check
→ All generated tests pass (green)
Stage 3: Coverage Measurement
→ Line/branch coverage meets target (default: 80% for changed code)
→ Graceful degradation: if coverage tool not configured → WARN and skip
Stage 4: Mutation Testing (STANDARD/THOROUGH tiers only)
→ Mutation kill rate assessment (target: 70%+)
→ Graceful degradation: if PIT/Stryker not configured → WARN and skip
Stage 5: Quality Assessment
→ Test naming conventions
→ Assertion quality (no empty assertions, meaningful messages)
→ Test isolation (no shared mutable state)
→ Determinism check (no flaky patterns: Thread.sleep, System.currentTimeMillis)
Scripts:
scripts/measure-coverage.sh [project-root] [target-package] [module-path]scripts/run-mutation-test.sh [project-root] [target-class-pattern] [tier] [module-path]scripts/extract-type-info.sh <classpath> <target-pattern> [output-format]Output: Validation Report → if gaps found → Gap Report → feeds back to Phase 3
loop N means N total validation cycles. Loop 1 = initial generation + validation. Loop 2+ = gap-targeted generation + validation.
| Input | Behavior |
|---|---|
loop N | N iterations of (Generate→Validate); loops 2+ are gap-targeted only |
coverage-target N% | Loop until coverage target met (max 5 iterations) |
validate loop N | Discover existing tests → N iterations of (Validate → Gap → Generate) |
loop 0 | Skip Validate entirely |
| (not specified) | loop 1 (default: one Generate + one Validate) |
Loop termination decision flow:
Loop N termination check (after gap-targeted generation):
1. Coverage target met AND mutation kill rate >= tier target → EXIT (success)
(tier target: STANDARD=60%, THOROUGH=70%; LIGHT skips mutation)
(thresholds defined in validate-protocol.md Stage 4)
2. Coverage delta < 2% from previous loop → EXIT (convergence plateau)
3. Same compilation error 3x consecutive → Apply root-cause analysis, then retry
4. N >= max_loops → EXIT (report final metrics)
(max_loops: loop N → N; coverage-target → 5)
Otherwise → next iteration (N += 1)
Gap Report format (Phase 4 → Phase 3):
## Gap Report: Loop {N}
### Uncovered Code Paths (from coverage)
| File | Line(s) | Branch | Description |
|------|---------|--------|-------------|
| OrderService.kt:42-48 | 42-48 | else branch | discount < 0 case |
| OrderService.kt:67 | 67 | catch block | PaymentException handler |
### Survived Mutants (from mutation testing)
| File:Line | Mutation | Original | Mutated | Required Test |
|-----------|----------|----------|---------|---------------|
| OrderService.kt:42 | RelationalOperator | `>` | `>=` | Boundary test for discount=0 |
### Quality Violations
| File | Issue | Severity |
|------|-------|----------|
| OrderServiceTest.kt:25 | Empty assertion body | ERROR |
### Generation Instructions
Generate ONLY tests targeting the above gaps. Do NOT regenerate existing passing tests.
Append new test methods to existing test files where applicable.
Base Set (loaded in Phases 1, 2, 3, 4):
| Document | Phases | Load Condition |
|---|---|---|
| test profile (auto-discovered) | 0-4 | Every phase entry |
| learned test patterns | 2, 3, 4 | IF pattern cache exists |
| unit-testing-techniques.md | 2, 3 | IF targets include domain/service layer |
| go-testing-techniques.md | 2, 3 | IF language=Go |
| integration-testing-techniques.md | 2, 3 | IF targets include repository/API layer |
| property-based-testing.md | 2, 3 | IF strategy includes property-based tests |
| contract-testing.md | 2, 3 | IF strategy includes contract tests |
| mutation-testing.md | 4 | IF tier=STANDARD OR tier=THOROUGH |
| architecture-testing.md | 2, 3 | IF strategy includes architecture tests |
| test-quality-checklist.md | 4 | Always in Validate phase |
| error-playbook.md | 3, 4 | On error occurrence |
| Document | Purpose |
|---|---|
| test-discovery-protocol.md | Phase 0 test infrastructure discovery |
| analyze-protocol.md | Phase 1 3-Layer type extraction procedure |
| strategize-protocol.md | Phase 2 technique selection procedure |
| generate-protocol.md | Phase 3 test generation procedure |
| validate-protocol.md | Phase 4 test quality validation procedure |
| validation-tiers.md | Validation intensity levels |
| error-playbook.md | Test error resolution protocols |
Type extraction rules for Phase 1 (Analyze):
| Language | Rules | Test Fixtures |
|---|---|---|
| Java | rules/java/extract-*.yml (5 rules) | rules/__tests__/java/ |
| Kotlin | rules/kotlin/extract-*.yml (5 rules) | rules/__tests__/kotlin/ |
| TypeScript | rules/typescript/extract-*.yml (5 rules) | rules/__tests__/typescript/ |
| Go | rules/go/extract-*.yml (4 rules) | rules/__tests__/go/ |
Scripts:
scripts/check-ast-grep.sh — Installation check (>=0.30.0)scripts/extract-types.sh <target> [lang] [category] — Structural extraction wrapperConfiguration: sgconfig.yml
When Hooks are applied, automatic validation runs on test file modifications.
Configuration: templates/hooks-config.json
Note: plugin.json (plugin-level) and templates/hooks-config.json (project-level) are mutually exclusive.
When invoked by another skill (e.g., sub-kopring-engineer), the workflow is modified:
IF input contains <sister-skill-invoke skill="sub-test-engineer">:
→ Parse invoke message
→ Enter Invoked Mode
ELSE:
→ Normal execution mode
┌─────────────────────────────────────────────────────────────────────┐
│ Invoked Mode: sub-test-engineer │
├─────────────────────────────────────────────────────────────────────┤
│ 1. Parse <sister-skill-invoke> message │
│ - caller: invoking skill name │
│ - phase: caller's current phase │
│ - trigger: reason (coverage-gap, mutation-gap, new-code-untested) │
│ - targets: specific files/lines/methods │
│ - constraints: technique, coverage-target, max-loop │
│ │
│ 2. SKIP Phase 0 (Discover) │
│ - Use provided targets directly │
│ - Load test-profile.json from context-data if provided │
│ │
│ 3. Run Phase 1 (Analyze) — scoped to targets only │
│ - Extract type info for specified files │
│ - Focus on uncovered methods/lines │
│ │
│ 4. SKIP Phase 2 (Strategize) if technique specified │
│ - Use provided technique constraint │
│ - If no technique: run abbreviated strategize for targets only │
│ │
│ 5. Run Phase 3 (Generate) with constraints │
│ - Generate tests only for specified targets │
│ - Respect max-loop constraint │
│ │
│ 6. Run Phase 4 (Validate) — abbreviated │
│ - Compile check │
│ - Test execution │
│ - Coverage measurement (compare with coverage-target) │
│ - Skip mutation testing unless explicitly requested │
│ │
│ 7. Compose <sister-skill-result> message │
│ - status: completed | partial | failed │
│ - generated-files: list of created/modified test files │
│ - metrics: coverage-before, coverage-after │
│ - issues: any problems encountered │
│ │
│ 8. Return result to caller │
└─────────────────────────────────────────────────────────────────────┘
| Trigger | Focus | Generation Strategy |
|---|---|---|
coverage-gap | Uncovered lines/methods | Generate tests to cover specific paths |
mutation-gap | Weak assertions | Strengthen existing tests with better assertions |
new-code-untested | New code files | Generate comprehensive tests following project patterns |
<sister-skill-result skill="sub-test-engineer">
<status>completed</status>
<summary>
<tests-generated>7</tests-generated>
<files-created>0</files-created>
<files-modified>1</files-modified>
</summary>
<generated-files>
<file path="src/test/kotlin/.../OrderCancelServiceTest.kt" action="modified">
<tests-added>handleRefund_success, handleRefund_failure, ...</tests-added>
<coverage-delta>+17%</coverage-delta>
</file>
</generated-files>
<metrics>
<coverage-before>65%</coverage-before>
<coverage-after>82%</coverage-after>
</metrics>
<issues />
</sister-skill-result>
| Constraint | Default | Effect |
|---|---|---|
technique | auto-detect | Skip Strategize phase if specified |
coverage-target | 80% | Loop until target met or max-loop reached |
max-loop | 2 | Maximum Generate→Validate iterations |
timeout | 300s | Abort and return partial results if exceeded |
Protocol details: see the shared invoke protocol documentation in this repository.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub hungrytech/hungrytech-claude-skills --plugin sub-test-engineer