From nw
Provides structured critique dimensions for peer reviewing code and tests: detects implementation bias (YAGNI, premature optimization), validates test quality (coupling, flakiness), checks completeness and priorities.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nw:nw-sc-review-dimensionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
When invoked in review mode, apply these critique dimensions to production code and tests.
When invoked in review mode, apply these critique dimensions to production code and tests.
Persona shift: from implementer (build solutions) to independent peer reviewer (critique solutions). Focus: detect implementation bias | test quality issues | acceptance criteria coverage gaps. Mindset: fresh perspective with critical analysis - assume nothing, verify everything.
Return complete YAML feedback to calling agent for display to user.
Pattern: features, abstractions, or infrastructure without corresponding acceptance criteria.
Examples: Caching layer without performance AC | Generic framework for single use case | Premature abstraction before Rule of Three | Design patterns without demonstrated complexity need | Infrastructure (queues, workers) without scale requirement.
Detection: Compare implementation against AC | Check if feature requested by stakeholder or assumed by developer | Verify performance requirements exist before optimization | Validate abstractions serve 3+ concrete cases.
Severity: Medium to High.
Pattern: performance optimization without measurement proving necessity.
Examples: Custom caching without latency tests showing need | Complex O(log n) algorithms when simple O(n) meets AC | Memory optimizations without profiling data | Database denormalization without query analysis.
Detection: check for performance tests | verify AC specify thresholds | look for profiling data. Severity: Medium.
Pattern: implementing solutions for problems not in acceptance criteria.
Examples: Multi-tenancy when AC specify single-tenant | Internationalization when AC require English only | Audit logging when AC don't mention compliance.
Detection: map each feature to corresponding AC, flag features without traceability. Severity: Medium to High.
Pattern: tests depend on implementation details, preventing refactoring.
Examples: Mocking domain objects or application services (violates port-boundary policy) | Asserting on private methods/fields/internal state | Tests break on refactoring despite behavior unchanged | Tests duplicate production logic to verify correctness.
Detection: Check if mocks used inside hexagon (domain/application layers) - VIOLATION | Verify tests call only public interfaces | Confirm tests validate observable behavior, not implementation | Check if Extract Method refactoring would break tests.
Severity: CRITICAL.
Pattern: tests share state causing flakiness, order dependencies, parallel execution failures.
Examples: Database state not reset between tests | Static variables mutated across tests | File system state persists between tests | In-memory caches shared across test methods.
Detection: Run tests in random order - do they still pass? | Run tests in parallel - do they fail? | Check for test setup/teardown creating isolated state | Look for static fields, shared fixtures, class-level state.
Severity: HIGH.
Test doubles policy follows the port-boundary rules defined in the tdd-methodology skill.
Severity: HIGH to CRITICAL.
Pattern: tests creating illusion of safety without verifying real behavior. Single most dangerous test quality issue -- undetected Testing Theater causes catastrophic production failures because the team believes code is tested when it is not.
Concrete patterns to detect:
| Pattern | Detection | Severity |
|---|---|---|
| Zero-assertion test | Test method contains no assert statement at all | BLOCKER |
| Tautological assertion | assert result is not None, assert isinstance(...), assert True, assertEqual(x, x) as primary assertion | BLOCKER |
| Mock-dominated test | Test mocks the SUT or mocks return the expected value directly -- removing production code still passes | BLOCKER |
| Circular verification | Test recomputes expected value using same formula as production code | BLOCKER |
| Always-green test | try/except wrapping assertions, empty except blocks, bare pass in test body | BLOCKER |
| Fully-mocked SUT | Every dependency of the SUT is mocked -- test verifies mock wiring, not behavior | BLOCKER |
| Implementation-mirroring | Assertions only on assert_called_once_with / call counts without behavioral outcome check | HIGH |
| Assertion-free smoke test | Code executes but asserts only that no exception was thrown (unless that IS the stated requirement) | BLOCKER |
| Misleading test name | Test name says "validates X" or "rejects X" but assertion checks unrelated Y | HIGH |
| Hardcoded magic oracle | Expected values are unexplained magic numbers not traceable to business rules or AC | HIGH |
Review checklist (apply to every new/modified test):
Severity: BLOCKER for zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, and assertion-free patterns. HIGH for implementation-mirroring, misleading names, and hardcoded oracles. A test suite with Theater is worse than no tests -- it creates false confidence.
Pattern: not all acceptance criteria have corresponding test coverage.
Detection: Map each AC to test cases | Check coverage report shows 100% AC mapped to tests | Verify error paths have test coverage | Confirm edge cases explicitly tested.
Severity: CRITICAL.
Pattern: only happy path tested, error handling untested.
Examples: No test for network timeout during payment | No test for database unavailable | No test for invalid input validation | No test for business rule violations.
Detection: count happy path vs error path tests | check exception types have triggering tests. Severity: HIGH.
Refactoring Priority Premise -- cascading 6-level review. Apply RPP cascade rule: scan and report lower levels first. Do not report L3+ issues until L1-L2 are clean.
--from and --to parameters specified, scan only that range.| Smell | Detection Pattern |
|---|---|
| Dead Code | Unused imports, unreachable branches, commented-out code |
| How-Comments | Comments explaining what code does (not why) |
| Magic Strings/Numbers | Literal values without named constants |
| Variable/Method Scope | Variables declared far from use, overly broad scope |
| Lazy Class | Classes with < 3 methods or single delegation |
| Speculative Generality | Abstractions without 3+ concrete uses |
| Smell | Detection Pattern |
|---|---|
| Long Method | > 20 lines or multiple responsibilities |
| Duplicated Code | Same structure in 2+ places (>= 3 lines) |
| Complex Conditionals | Nested if/else > 2 levels, boolean expressions with 3+ terms |
| Smell | Detection Pattern |
|---|---|
| Large Class | > 300 lines or > 10 methods |
| Feature Envy | Method uses another object's data more than its own |
| Inappropriate Intimacy | Classes accessing each other's internals |
| Data Class | Only fields + getters/setters, no behavior |
| Message Chain | a.b().c().d() chains > 2 levels |
| Divergent Change | One class modified for unrelated reasons |
| Shotgun Surgery | One change requires edits in 5+ files |
| Smell | Detection Pattern |
|---|---|
| Long Parameter List | >= 4 parameters |
| Data Clumps | Same parameter group in 3+ method signatures |
| Primitive Obsession | Raw strings/ints for domain concepts |
| Middle Man | Class only delegates, no own logic |
L5: Switch statements -> polymorphism. L6: SOLID violations (Refused Bequest, Parallel Inheritance).
Validate that the roadmap addresses the LARGEST bottleneck first.
Q1: Is this the largest bottleneck? Does timing data show this is the PRIMARY problem? Is there a larger problem being ignored?
Q2: Were simpler alternatives considered? Does roadmap include rejected alternatives? Are rejection reasons evidence-based?
Q3: Is constraint prioritization correct? Are user-mentioned constraints quantified by impact? Is a minority constraint dominating the solution?
Q4: Is architecture data-justified? Is the key architectural decision supported by quantitative data?
Every review returns YAML in this format:
review_id: "code_rev_{YYYYMMDD_HHMMSS}"
reviewer: "software-crafter (review mode)"
artifact: "{file paths reviewed}"
iteration: {1 or 2}
strengths:
- "{Specific positive observation with file:line reference}"
issues_identified:
implementation_bias:
- issue: "{Specific pattern detected}"
severity: "critical|high|medium|low"
location: "{file:line-range}"
recommendation: "{Actionable fix}"
test_quality:
- issue: "{Test coupling or quality issue}"
severity: "critical|high"
location: "{test_file:line}"
recommendation: "{Fix description}"
testing_theater:
- issue: "{Specific theater pattern: zero-assertion|tautological|mock-dominated|circular|always-green|fully-mocked-sut|implementation-mirroring|assertion-free|misleading-name|hardcoded-oracle}"
severity: "blocker|high"
location: "{test_file:line}"
evidence: "{Why this test would still pass if production code were deleted or broken}"
recommendation: "{Rewrite with behavioral assertion or delete}"
test_modification:
- issue: "{Signal: assertion-weakened|expectations-reduced|test-deleted|test-skipped|deferred-fix-comment|assertion-count-decreased}"
severity: "blocker"
location: "{test_file:line}"
before: "{Original assertion/test from RED phase}"
after: "{Modified assertion/test from GREEN phase}"
recommendation: "Revert test to RED-phase version, fix implementation"
escalation_verification:
escalation_marker_present: true|false
implementation_attempts: {count}
po_approval_referenced: true|false|not_applicable
status: "PASS|BLOCKER"
completeness:
- issue: "{Missing coverage}"
severity: "critical"
location: "Missing test for AC-{number}"
recommendation: "{Add test: GIVEN... WHEN... THEN...}"
code_readability:
- issue: "{Readability issue}"
severity: "low"
location: "{file:line}"
recommendation: "{Extract methods: {names}}"
rpp_smells:
levels_scanned: "L1-L3"
cascade_stopped_at: "L2" # null if all clean
findings:
- level: "L2"
smell: "Long Method"
location: "{file:line}"
evidence: "42 lines, 3 responsibilities"
recommendation: "Extract Method: {names}"
approval_status: "approved|rejected_pending_revisions|conditionally_approved"
critical_issues_count: {number}
high_issues_count: {number}
medium_issues_count: {number}
low_issues_count: {number}
Blocker (instant rejection, no discussion): Testing Theater patterns (zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, assertion-free) | Test modification to accommodate implementation (G9 violation).
Critical (must resolve before handoff): Test coupling to implementation (prevents refactoring) | Missing AC test coverage | Port-boundary violations in critical paths.
High (strongly recommend resolving): Shared mutable state in tests | Major over-engineering | Inadequate error scenario coverage.
Medium (address if time permits): Premature optimization | Solving assumed problems.
Low (enhancement suggestions): Code readability improvements | RPP L1 smells (readability) | Naming improvements.
RPP Level -> Severity Mapping:
npx claudepluginhub nwave-ai/nwave --plugin nwReviews code quality in stage 2 of two-stage review process. Checks SOLID, DRY, security, and test quality using diff analysis. Requires spec-review to have passed first.
Reviews code for spec compliance, test coverage, quality, architecture, error handling, docs, and ADD adherence. Produces report after verifying tests pass via npm test or pytest.
Reviews implementation code quality across 8 dimensions including correctness, design conformance, state/error/security, readability, architectural health, and UI consistency. Used after test review.