From kivi-claude-skills
Fully automated test-fix-retest loop for full-stack projects. Auto-detects tech stack (Java/Spring Boot, Go, Python, Rust, Node.js + Vue/React/Angular/Next.js/Svelte), runs tests in pyramid order (compile → backend unit → frontend typecheck → frontend unit → API E2E → browser E2E), analyzes failures with causal-chain debugging (symptom → proximate cause → root cause), clusters related failures, auto-fixes code, and iterates until all pass or max iterations reached. Also identifies missing test coverage gaps and auto-generates tests when none exist for a phase. Use this skill whenever the user wants to run tests, fix failing tests, or verify code changes haven't broken anything — including: "run tests", "run all tests", "make tests pass", "fix failing tests", "test loop", "fullstack test", "测试", "跑测试", "跑一下所有测试", "测试全挂了帮我修", "帮我修测试", or after code changes touching both frontend and backend (e.g., API contract changes, field naming migrations, new service dependencies). Proactively suggest when the user finishes implementing a feature that spans multiple layers, or when they mention NullPointerException in tests, type errors after refactoring, or cross-module contract mismatches.
How this skill is triggered — by the user, by Claude, or both
Slash command
/kivi-claude-skills:fullstack-test-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A fully automated closed-loop test runner for any full-stack project. It discovers your
evals/evals.jsonreferences/backend-frameworks.mdreferences/e2e-frameworks.mdreferences/failure-analysis.mdreferences/fix-strategies.mdreferences/frontend-frameworks.mdreferences/stack-detection.mdreferences/test-generation.mdreferences/test-persistence.mdreferences/visual-regression.mdreferences/vue-fix-safety.mdA fully automated closed-loop test runner for any full-stack project. It discovers your tech stack, runs every applicable test layer from fast to slow, diagnoses failures by tracing causal chains (not surface symptoms), fixes code, and re-runs until green — or until it knows it can't fix something and needs your input.
/fullstack-test-loop directly/fullstack-test-loop # Run all layers, auto-detect everything
/fullstack-test-loop backend # Backend only (compile + unit/integration)
/fullstack-test-loop frontend # Frontend only (typecheck + unit)
/fullstack-test-loop e2e # E2E only (API + browser)
/fullstack-test-loop browser # Browser E2E only (dev-browser)
/fullstack-test-loop --no-fix # Run + analyze only, no auto-fix
/fullstack-test-loop --skip-fix # Alias for --no-fix (run all phases, report only)
/fullstack-test-loop --max-iterations 3 # Override max iterations (default: 5)
/fullstack-test-loop --force-all # Run all layers even if earlier ones fail
/fullstack-test-loop --scenarios <file> # Use specific test-scenarios checklist file
Before running anything, understand what this project contains. Read references/stack-detection.md
for the full detection logic.
Quick summary:
backend/, frontend/, server/, client/, app/, web/, api/, src/)Output a mental StackProfile before proceeding:
Stack Detection:
Backend: Java/Maven (Spring Boot) @ backend/
Frontend: Vue 3 + TypeScript @ frontend/
E2E: Shell script @ e2e/e2e-test.sh
Test DB: H2 in-memory (application-test.yml)
Services: docker-compose.yml (postgres, minio)
Test Coverage Gaps:
Backend tests: FOUND (src/test/java/ — 11 test classes)
Frontend unit tests: MISSING — will auto-generate with Vitest
API E2E: FOUND (e2e/e2e-test.sh — 47 assertions)
Browser E2E: MISSING — will auto-generate flows from routes
Test Persistence:
Scenario checklist: {FOUND|MISSING} (frontend/tests/e2e/test-scenarios.md — N scenarios)
E2E test files: {FOUND|MISSING} (frontend/tests/e2e/ — N spec files)
Screenshot baselines: {FOUND|MISSING} (frontend/tests/e2e/screenshots/baseline/ — N images)
If detection is ambiguous (e.g., multiple backends), briefly note what was found and pick the most likely primary stack. If truly unclear, ask the user.
Test Coverage Gap Analysis: For each phase, check whether tests/scripts exist. If a phase
has NO tests, flag it as a gap and prepare for auto-generation in that phase. Read
references/test-generation.md for generation templates per framework.
The cheapest way to catch errors. If it doesn't compile, nothing else matters.
Run in order:
Backend compile — see references/backend-frameworks.md for per-framework commands
mvn compile -q -f <path>/pom.xmlcd <path> && ./gradlew compileJava -qcd <path> && go build ./...cd <path> && cargo checkcd <path> && python -m py_compile <changed_files> (or skip — Python is interpreted)Frontend typecheck — see references/frontend-frameworks.md
cd <path> && npx vue-tsc -b --noEmit (or npx vue-tsc --noEmit)cd <path> && npx tsc --noEmitcd <path> && npx ng build --configuration=developmentOn failure: Go directly to Phase 6 (Fix), then re-run Phase 1. Do not proceed to Phase 2
until compilation passes (unless --force-all).
Run the backend's test suite. Read references/backend-frameworks.md for framework-specific
commands, output parsing, and common failure patterns.
First: Check if backend tests exist.
Scan the standard test directories for the detected framework:
src/test/java/ or <module>/src/test/java/*_test.go files alongside source#[cfg(test)] modules or tests/ directorytests/, test_*.py, *_test.pyIf NO backend tests exist → Auto-generate. Read references/test-generation.md § Backend.
Commands by framework:
| Framework | Run All | Run Single Class | Verbose |
|---|---|---|---|
| Maven | mvn test | mvn test -Dtest=ClassName | mvn test -X |
| Gradle | ./gradlew test | ./gradlew test --tests ClassName | ./gradlew test --info |
| Go | go test ./... | go test ./path/to/package -run TestName | go test -v ./... |
| Rust | cargo test | cargo test test_name | cargo test -- --nocapture |
| Python | pytest | pytest path/to/test.py::TestClass | pytest -v |
On failure: Capture full output, proceed to Phase 6 (Analyze & Fix).
On >20 failures in a single run: Stop and report — this is likely a systemic issue (wrong config, missing dependency, broken migration), not individual test bugs.
First: Check if a test framework and tests exist.
Detection:
vitest, jest, @angular/cli, karma in devDependenciesvitest.config.*, jest.config.*, .mocharc.*"test" entry**/*.test.{ts,tsx,js}, **/*.spec.{ts,tsx,js}, **/__tests__/**If test framework exists AND tests exist → Run them:
| Framework | Command |
|---|---|
| Vitest | npx vitest run |
| Jest | npx jest |
| Angular | npx ng test --watch=false --browsers=ChromeHeadless |
| Karma | npx karma start --single-run |
If NO test framework OR NO test files → Auto-generate. Read references/test-generation.md § Frontend.
npm install -D vitest @vue/test-utils jsdom (Vue)npm install -D vitest @testing-library/react jsdom (React-Vite)Component.test.ts next to Component.vue)These test the running application's API endpoints. They require services to be up.
First: Check if E2E tests exist.
Scan for: e2e/*.sh, e2e/*.py, cypress.config.*, playwright.config.*,
docker-compose.e2e.yml, package.json "test:e2e" script, *.postman_collection.json.
If NO E2E tests exist → Auto-generate. Read references/test-generation.md § API E2E.
@GetMapping, @PostMapping, router.GET, etc.)openapi.yaml, swagger.json)e2e/e2e-test.sh) covering:
BASE_URL parameterPre-check: Are services running?
docker compose up -d and wait for healthRun E2E tests:
See references/e2e-frameworks.md for details.
bash e2e/e2e-test.sh [BASE_URL]docker compose -f docker-compose.e2e.yml up --abort-on-container-exitnpx cypress run"test:e2e" or "e2e" scriptsOn failure: Capture output, parse PASS/FAIL counts, proceed to Phase 6.
Phase 5 runs independently of Phases 2-4. It is ONLY blocked by Phase 1 failure (if the app doesn't compile, browser testing is meaningless). Unit test failures (Phase 3) do NOT block browser E2E — they test different concerns.
When browser subcommand is used, skip Phases 1-4 entirely.
When --force-all is used, all phases run regardless of prior failures.
Use dev-browser — never Playwright directly.
First: Check if browser test flows are defined.
If no browser test plan exists (no prior testloop runs, no cypress specs, no defined flows):
→ Auto-generate browser test flows from route discovery. Read references/test-generation.md § Browser E2E.
router/index.ts)Execute flows using dev-browser:
client.getAISnapshot()Determining the app URL:
http://localhost:5173 (Vite) or http://localhost:3000 (CRA/Next)On failure: Screenshot the issue, trace to source code, apply fix, re-run this phase.
This is the core intelligence of the skill. It runs whenever any phase reports failures.
Extract structured data from test output. See references/failure-analysis.md for
per-framework parsing patterns.
For each failure, extract:
Many test failures share a single root cause. Before fixing anything:
For EACH cluster, trace the full chain. This is critical — never fix surface symptoms.
SYMPTOM: 5 tests fail with "expected 200 but got 403"
PROXIMATE: SecurityContext.getCurrentUser() returns null in test environment
ROOT CAUSE: New test class missing @ActiveProfiles("test") annotation
FIX: Add @ActiveProfiles("test") to the test class
Read references/fix-strategies.md for the fix decision tree.
Before fixing any .vue file, read references/vue-fix-safety.md.
<script> / <script setup> modifications are allowed<template> and <style> changes are FORBIDDEN → mark as NEEDS_HUMAN_REVIEW.vue script fix, run visual regression check (references/visual-regression.md)Priority: fix the root cause that resolves the most failures first.
Decision tree:
Is the failure in TEST code (setup, mocks, assertions)?
→ Fix test code (mock setup, fixture update, assertion correction)
Is it a COMPILATION / TYPE error?
→ Fix the source type/signature, trace which change broke the contract
Is it a RUNTIME behavior bug (wrong result, NPE, 500 error)?
→ Read data flow: producer → consumer. Apply minimal fix at root cause.
Is it ENVIRONMENT / CONFIG (DB connection, missing env var, wrong port)?
→ Fix test config (application-test.yml, .env.test, docker-compose)
None of the above?
→ Flag as NEEDS_HUMAN_INPUT, explain what you found
Constraints:
After fixing, re-run ONLY the affected test layer (not the full pyramid) to quickly verify the fix works. If it passes, continue to the next failure cluster. If it fails, try a different approach or escalate.
Visual Regression Check: If the fix touched any .vue file's <script> block,
read references/visual-regression.md and run the visual regression workflow.
If regression detected, revert the fix before marking it as failed.
Before generating any test (Phase 3 or Phase 5), read references/test-persistence.md.
Key rules:
DASH-01, AUTH-03)it.skip() + NEEDS_HUMAN_REVIEW[ ] → [x]iteration = 0
max_iterations = 5 (or user override)
blocked_failures = []
LOOP:
iteration += 1
IF subcommand == "browser":
Run Phase 5 only
ELIF subcommand == "backend":
Run Phase 1 (backend compile) + Phase 2
ELIF subcommand == "frontend":
Run Phase 1 (frontend typecheck) + Phase 3
ELIF subcommand == "e2e":
Run Phase 4 + Phase 5
ELSE:
Run Phase 1 (compile)
IF Phase 1 fails AND NOT --force-all:
Run Phase 6, continue loop
Run Phase 2 (backend) — blocked only by backend compile failure
Run Phase 3 (frontend unit) — blocked only by frontend typecheck failure
Run Phase 4 (API E2E) — blocked only by backend compile failure
Run Phase 5 (browser E2E) — blocked only by frontend typecheck failure
Note: Phases 2-5 are INDEPENDENT of each other.
Phase 3 failure does NOT block Phase 5.
IF all pass:
→ Print SUCCESS report, exit loop
IF iteration >= max_iterations:
→ Print TIMEOUT report with remaining failures, exit loop
IF a failure persists unchanged across 3 consecutive iterations:
→ Add to blocked_failures, skip it in future iterations
IF a fix introduces NEW failures (regression):
→ Revert the fix, add original failure to blocked_failures
IF any single phase has >20 failures:
→ Stop that phase, report as SYSTEMIC issue
ELSE:
→ Run Phase 6 (Analyze & Fix), then continue loop
At the end of the loop (success or timeout), produce this report:
TEST LOOP REPORT
════════════════════════════════════════════
Stack: Java/Maven + Vue 3/TypeScript
Iterations: 3/5
Status: DONE ✓
Layer Results:
Backend compile: PASS
Backend tests: PASS (11/11) — fixed 2 in iteration 1
Frontend typecheck: PASS — fixed 1 in iteration 2
Frontend unit tests: PASS (24/24) — AUTO-GENERATED with Vitest
API E2E: PASS (47/47)
Browser E2E: PASS (5 flows verified) — 2 flows auto-generated
Fixes Applied:
1. fix(test): add @ActiveProfiles("test") to EventServiceTest
2. fix(service): null check in AggregationService.calculate()
3. fix(frontend): update DistrictDTO type to match API snake_case
Blocked: (none)
════════════════════════════════════════════
Status values:
When failures span both frontend and backend, check these common contract issues:
Field naming: Backend returns district_name but frontend expects districtName?
Check serialization config (Jackson, Gson) and TypeScript interfaces.
Response shape: API wraps in {code, message, data} but frontend reads raw response?
Check Axios interceptors and API response types.
Numeric precision: Backend sends "123.4567" (string) but frontend parses as number?
Check DTO types and JSON parsing.
Date formats: Backend sends ISO 8601 but frontend expects timestamp? Check serialization and dayjs/moment config.
Query parameter binding: Frontend sends group_id but backend controller uses
@RequestParam Long groupId without explicit name? Jackson SNAKE_CASE does NOT affect
@RequestParam — Spring binds from the literal HTTP parameter name. Verify all
@RequestParam have explicit value = "snake_case" when the API convention is snake_case.
Enum values: Backend enum changes but frontend still uses old values? Check both enum definitions.
These provide framework-specific details. Read them when you need the specifics:
| File | When to read |
|---|---|
references/stack-detection.md | Phase 0 — need detection logic for an unfamiliar project structure |
references/backend-frameworks.md | Phases 1-2 — need compile/test commands or failure parsing for a specific backend |
references/frontend-frameworks.md | Phases 1, 3 — need typecheck/test commands for a specific frontend |
references/e2e-frameworks.md | Phase 4 — need to run or parse E2E test output |
references/failure-analysis.md | Phase 6 Step 1 — need to parse test output from a specific framework |
references/fix-strategies.md | Phase 6 Step 4 — need guidance on fix patterns for specific failure types |
references/vue-fix-safety.md | Phase 6 Step 3.5 — Vue SFC fix boundary rules (script only, no template/style) |
references/visual-regression.md | Phase 6 Step 5 — screenshot comparison after .vue fixes |
references/test-persistence.md | Phases 3, 5 — incremental test generation, append-only updates |
references/test-generation.md | Phases 2-5 — need to auto-generate tests when none exist for a phase |
dev-browser for browser testing: Always use dev-browser, never Playwright directly.
Use client.getAISnapshot() for page state, client.selectSnapshotRef() for interaction.
In page.evaluate(), use plain JavaScript (no TypeScript annotations).
Debugging protocol: Never fix the first match. Always trace: symptom → proximate cause → root cause. For cross-module bugs, check the data contract between producer and consumer.
Minimal changes: Each fix should be the smallest change that resolves the failure. Do not refactor, add comments, or clean up adjacent code during test fixes.
Service lifecycle: If the skill starts Docker services for E2E tests, it should also offer to stop them when done (but not force-stop — the user might want them running).
npx claudepluginhub phoxiao/kivi-claude-skills --plugin kivi-claude-skillsEnforces test-driven development: write a failing test before implementing code. Use for new logic, bug fixes, or behavior changes.
Writes and runs unit, integration, e2e, performance, and contract tests to verify code functionality.