From claude-impl-tools
Performs comprehensive quality audits verifying planning conformance, DDD validation, security checks, tests, browser verification, and metrics before deployment or PR merge.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-impl-tools:quality-auditorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Purpose**: Comprehensive quality audit against planning documents + quantitative metric tracking + verification discipline enforcement.
Purpose: Comprehensive quality audit against planning documents + quantitative metric tracking + verification discipline enforcement.
v3.0.0: Absorbed
evaluation(metrics) andverification-before-completion(evidence discipline)
implementation agent's responsibilitydocs/planning/management/mini-prd.md and docs/planning/*.md are absent, run /governance-setup first1. Verify planning document existence (Mini-PRD or Socrates)
2. Load context (read reference documents)
3. Two-stage review (Spec Compliance → Code Quality)
4. DDD (Demo-Driven Development) validation
5. Security validation (invoke /security-review)
6. Dynamic validation (run tests)
7. UI/UX browser validation (agent-browser CLI + Lighthouse CLI)
8. Write quality report + provide fix guidance
Before starting the audit, verify the following when the skill is triggered.
management/mini-prd.md or docs/planning/*.md must be present.
/governance-setup first."Default assumption: "It doesn't work." Prove otherwise with evidence. (Inspired by Harness Evaluator pattern.)
| Rule | Violation Blocked |
|---|---|
| All scores require evidence (file:line, test output, screenshot) | Score without evidence = 0 |
| Console errors > 0 → Functionality capped at 7/10 | Ignoring console errors |
| Uncaught exceptions → auto FAIL for that route | Rationalizing "minor" exceptions |
| Untestable (server down) → Score 0, not skip | "Couldn't test, so pass" |
| BLOCKING issues must be fixed before deploy | Shipping with known blockers |
Issue Classification (applies to all audit output):
# One of the two options
ls management/mini-prd.md 2>/dev/null # Option A: Mini-PRD
ls docs/planning/*.md 2>/dev/null # Option B: Socrates
Mini-PRD required fields: purpose, features, tech_stack
Socrates required documents: 01-prd.md, 02-trd.md, 07-coding-convention.md
design//security-review --path src --summary
| Severity | Meaning | Deployable |
|---|---|---|
| CRITICAL | Immediate fix required | No — cannot deploy |
| HIGH | Fix recommended before deployment | Conditional |
| MEDIUM | Known issue | Yes — can deploy |
| Project Type | Test Command |
|---|---|
| Node.js | npm test |
| Python | pytest |
| Python (Poetry) | poetry run pytest |
Uses agent-browser CLI or Lighthouse CLI
# 1. Open page + take snapshot
agent-browser open http://localhost:3000
agent-browser snapshot # accessibility tree (@ref based)
agent-browser screenshot audit.png # visual capture
# 2. Lighthouse audit (accessibility + performance + SEO)
npx lighthouse http://localhost:3000 --output=json --quiet
# 3. Check console errors
agent-browser console # error/warning count
AI Slop Detection — Auto-deduct from visual score: ≥1 pattern → −1pt, ≥3 patterns → −2pt.
| # | Pattern | # | Pattern |
|---|---|---|---|
| 1 | Hero section with no real image (placeholder/gradient) | 6 | Generic icons only (no custom illustrations) |
| 2 | 3-column generic feature grid | 7 | Empty state not handled |
| 3 | Meaningless gradient decoration | 8 | Hardcoded demo data visible in UI |
| 4 | Lorem ipsum text remaining | 9 | Excessive rounded-xl on everything |
| 5 | Identical card layout repeated throughout | 10 | Purposeless animation/motion |
┌─────────────────────────────────────────┐
│ Quality Audit Result │
├─────────────────────────────────────────┤
│ Score: 85/100 │
│ Verdict: CAUTION │
│ │
│ ✅ Feature conformance: 95% │
│ ⚠️ Conventions: 75% │
│ Security: 88% (1 medium issue) │
│ ✅ Tests: passed (coverage 82%) │
└─────────────────────────────────────────┘
Verdict criteria:
| Score | Verdict | Meaning |
|---|---|---|
| 90+ | PASS | Ready to deploy immediately |
| 70–89 | CAUTION | Deploy after minor fixes |
| Below 70 | FAIL | Major fixes required |
| BLOCKING | Priority | Category | Description | Related File |
|---|---|---|---|---|
| ✅ YES | Critical | Security | Hardcoded API key | src/api/auth.py:23 |
| ✅ YES | High | Bug | Missing duplicate check | src/api/auth.py:45 |
| Audit Result | Recommended Skill |
|---|---|
| Spec mismatch | /agile iterate |
| Code quality issues | /checkpoint → re-audit |
| Security vulnerabilities | Re-run /security-review |
| Deep review needed | /multi-ai-review |
Automates the deployment approval process in collaboration with the QA Manager agent.
/audit → calculate quality score → request QA Manager approval
↓
✅ Approved → proceed to deployment
⚠️ Conditional → re-validate after fixing issues
❌ Rejected → send feedback to Specialist
Detailed integration patterns: see references/agent-integration.md
Absorbed from
verification-before-completion. Applies to ALL completion claims.
Iron Law: No claims without fresh evidence.
Before asserting any state ("tests pass", "bug fixed", "build succeeds"):
1. IDENTIFY — What command proves this claim?
2. RUN — Execute the full command (fresh, complete)
3. READ — Check full output, exit code, failure count
4. VERIFY — Does the output confirm the claim?
- NO → state actual status with evidence
- YES → state claim with evidence
5. ONLY THEN — Make the claim
| Claim | Required evidence | NOT sufficient |
|---|---|---|
| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
| Lint clean | Lint output: 0 errors | Partial check, inference |
| Build succeeds | Build command: exit 0 | Lint passed, logs look OK |
| Bug fixed | Original symptom test: passes | Code changed, assumed fixed |
Red flags — stop immediately if you catch yourself:
Absorbed from
evaluation. Optional — run when metrics tracking is needed.
| Metric | Command | Target | Warning |
|---|---|---|---|
| Test coverage | pytest --cov / vitest --coverage | ≥70% | <60% |
| Cyclomatic complexity | radon cc / eslint complexity | ≤10 | >15 |
| Code duplication | jscpd / pylint duplicate | ≤5% | >10% |
| Lint errors | ruff / eslint | 0 | >5 |
| Type errors | mypy / tsc | 0 | >0 |
| Security score | bandit / npm audit | 0 critical | any critical |
| Metric | Target |
|---|---|
| Task completion rate | ≥95% |
| Average retries per task | ≤2 |
| First-attempt success rate | ≥80% |
Store metrics in .claude/metrics/ for trend tracking across phases.
references/agent-integration.md — QA Manager integration patterns, feedback routingLast Updated: 2026-04-01 (v3.1.0 — Skeptical QA baseline + BLOCKING/NON-BLOCKING classification + AI Slop Detection)
핵심 원칙: Task 완료 ≠ Goal 달성
Goal (목표)
↓
Must-have (참이어야 할 것)
↓
Must-exist (존재해야 할 것)
↓
Must-wired (연결되어야 할 것)
↓
실제 코드베이스 검증
# TASKS.md에서 현재 Phase/작업의 목표 추출
GOAL=$(grep -A5 "## Phase" TASKS.md | grep -E "^>" | head -1)
목표에서 역산하여 필수 조건 도출:
Goal: "사용자가 채팅할 수 있어야 함"
↓
Must-have:
- 메시지 전송 가능
- 메시지 수신 가능
- 메시지 표시 가능
각 must-have에 대해 실제 코드 존재 확인:
# 예: 채팅 기능 검증
grep -r "sendMessage\\|ChatInput\\|MessageList" src/
컴포넌트 간 연결 확인:
# import/export 관계 확인
grep -r "import.*from.*chat" src/
# {Phase} - Verification Report
**검증일:** {date}
**상태:** {PASS|FAIL}
## Goal
{검증한 목표}
## Must-haves 검증
| ID | Must-have | Status | Evidence |
|----|-----------|--------|----------|
| M-01 | {항목} | ✅/❌ | {파일:라인} |
## Gaps (실패 시)
- {누락된 항목}
- {수정 필요 사항}
## 다음 단계
{PASS 시: 다음 Phase}
{FAIL 시: Gap 해결 작업}
npx claudepluginhub insightflo/claude-impl-tools --plugin claude-impl-toolsFinal code review skill: runs stack-specific tests/lints (Next.js, Python, Swift, Kotlin), security checks, verifies spec.md criteria, audits hub files, issues ship/no-go verdict after /build or /deploy.
Runs parallel specialized agents to verify implementations, run tests (unit/e2e/integration/perf/LLM), grade quality (0-10 scale), and suggest improvements. Use before merging.
Verifies completed implementations for quality via subagents: completeness, test execution, code review, pragmatic review, production readiness, reality assessment. Compiles read-only reports.