From project-toolkit
Assesses code maintainability using 5 qualities (cohesion, coupling, encapsulation, testability, non-redundancy) with scoring rubrics across languages at method/class/module levels. Generates markdown reports with remediation guidance.
How this skill is triggered — by the user, by Claude, or both
Slash command
/project-toolkit:code-qualities-assessmentclaude-sonnet-4-6The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evaluate code maintainability using 5 timeless design qualities with quantifiable scoring rubrics.
Evaluate code maintainability using 5 timeless design qualities with quantifiable scoring rubrics.
assess code qualityevaluate maintainabilitycheck code qualitiestestability reviewrun quality assessment# Assess a single file
python3 scripts/assess.py --target src/services/auth.py
# Assess changed files only (CI mode)
python3 scripts/assess.py --target . --changed-only --format json
# Full module assessment with HTML report
python3 scripts/assess.py --target src/services/ --format html --output quality-report.html
| Quality | Question | Score 10 | Score 1-3 |
|---|---|---|---|
| Cohesion | How related are responsibilities? | Single, well-defined responsibility | Unrelated responsibilities jammed together |
| Coupling | How dependent on other code? | Minimal deps, depends on abstractions | Tightly coupled, hard-coded dependencies |
| Encapsulation | How well are internals hidden? | All internals private, minimal API | Everything public, no information hiding |
| Testability | How easily verified in isolation? | Pure functions, injected dependencies | Hard to test, requires full integration |
| Non-Redundancy | How unique is each piece of knowledge? | Zero duplication, appropriate abstractions | Pervasive copy-paste |
Use this skill when:
Use analyze instead when:
The skill runs automated assessment via scripts/assess.py:
Symbol Extraction
Quality Scoring
Comparison (if historical data)
Report Generation
Gate Enforcement (CI mode)
python3 scripts/assess.py --target <path> [options]
| Parameter | Required | Default | Description |
|---|---|---|---|
--target | Yes | - | File, directory, or glob pattern |
--context | No | production | production, test, or generated |
--changed-only | No | false | Only assess changed files (git diff) |
--format | No | markdown | markdown, json, or html |
--config | No | .qualityrc.json | Path to config file |
--output | No | stdout | Output file path |
--use-serena | No | auto | auto, yes, or no (Serena integration) |
| Code | Meaning |
|---|---|
| 0 | Assessment complete, all thresholds met |
| 10 | Quality degraded vs previous run |
| 11 | Quality below configured thresholds |
| 1 | Script error (invalid args, file not found) |
Create .qualityrc.json to customize thresholds:
{
"thresholds": {
"cohesion": { "min": 7, "warn": 5 },
"coupling": { "max": 3, "warn": 5 },
"encapsulation": { "min": 7, "warn": 5 },
"testability": { "min": 6, "warn": 4 },
"nonRedundancy": { "min": 8, "warn": 6 }
},
"context": {
"test": {
"testability": { "min": 3 }
}
},
"ignore": [
"**/generated/**",
"**/*.pb.py",
"**/migrations/**"
]
}
| Avoid | Why | Instead |
|---|---|---|
| Running on entire codebase every commit | Slow, noisy | Use --changed-only in CI |
| Using scores for performance reviews | Gaming the system | Focus on trend improvement |
| Blocking merges on absolute scores | Discourages refactoring old code | Block on regression only |
| Ignoring context (test vs production) | False positives | Use --context flag |
| Not configuring thresholds | One-size-fits-all does not fit | Customize .qualityrc.json |
After running assessment:
How strongly related are responsibilities within a boundary?
High cohesion = focused, understandable code. Low cohesion = "god objects" doing too much.
| Score | Description |
|---|---|
| 10 | Single, well-defined responsibility |
| 7-9 | Primary responsibility clear, minor supporting concerns |
| 4-6 | Multiple loosely related responsibilities |
| 1-3 | Unrelated responsibilities jammed together |
How dependent is this code on other code?
Loose coupling = independent evolution, easy testing. Tight coupling = fragile, hard to test.
| Score | Description |
|---|---|
| 10 | Minimal dependencies, depends on abstractions |
| 7-9 | Few dependencies, all explicit |
| 4-6 | Moderate dependencies, some global state |
| 1-3 | Tightly coupled, hard-coded dependencies |
How well are implementation details hidden?
Good encapsulation = freedom to change internals. Poor encapsulation = brittle API.
| Score | Description |
|---|---|
| 10 | All internals private, minimal public API |
| 7-9 | Mostly private, well-defined API |
| 4-6 | Some internals exposed |
| 1-3 | Everything public, no information hiding |
How easily can behavior be verified in isolation?
Testable code = fast feedback, confidence to refactor. Untestable code = fear of change.
| Score | Description |
|---|---|
| 10 | Pure functions, injected dependencies |
| 7-9 | Mostly testable, straightforward to mock |
| 4-6 | Moderately testable, requires setup |
| 1-3 | Hard to test, requires full integration |
How unique is each piece of knowledge?
DRY code = fix once, single source of truth. Duplication = fix N times, maintenance burden.
| Score | Description |
|---|---|
| 10 | Zero duplication, appropriate abstractions |
| 7-9 | Minimal duplication (intentional) |
| 4-6 | Moderate duplication, missed abstractions |
| 1-3 | Pervasive copy-paste |
python3 scripts/assess.py --target src/models/user.py
Output:
# Code Quality Assessment: src/models/user.py
## Summary
- **Cohesion**: 8/10
- **Coupling**: 4/10 (warning)
- **Encapsulation**: 9/10
- **Testability**: 7/10
- **Non-Redundancy**: 9/10
## Issues Found
### Coupling: 4/10 (Warning)
**Problem**: Direct instantiation of DatabaseConnection in constructor
**Impact**: Hard to test, tightly coupled to database layer
**Remediation**: Use dependency injection
- See: [Dependency Injection](references/patterns/dependency-injection.md)
- Related ADR: ADR-023 (Dependency Management)
Example Fix:
# Before
class User:
def __init__(self):
self.db = DatabaseConnection() # Hard-coded dependency
# After
class User:
def __init__(self, db: DatabaseInterface):
self.db = db # Injected dependency
# In CI pipeline
python3 scripts/assess.py --target . --changed-only --format json --output quality.json
# Exit code 10 = quality degraded, fail PR
# Exit code 0 = quality maintained, pass
python3 scripts/assess.py --target src/ --format html --output reports/quality.html
Opens dashboard showing:
# Identify refactoring targets
python3 scripts/assess.py --target src/ --format json | \
jq '.files | sort_by(.overall) | .[0:5]' > low-quality-files.json
# Feed to planner
planner --input low-quality-files.json --goal "Refactor lowest quality files"
When reviewing ADRs, include quality impact:
# Before implementing ADR
python3 scripts/assess.py --target affected-files.txt > baseline.md
# After implementing ADR
python3 scripts/assess.py --target affected-files.txt > post-implementation.md
# Compare
diff baseline.md post-implementation.md
Combine broad analysis with focused quality metrics:
# First: broad exploration
analyze --target src/
# Then: quality deep dive on problem areas
python3 scripts/assess.py --target src/services/auth.py
For detailed scoring methodology and examples:
| File | Content |
|---|---|
| dotnet-performance-patterns.md | Allocation-free .NET patterns with quality scoring calibration |
| Support Level | Languages |
|---|---|
| Full | Python (.py), TypeScript/JavaScript (.ts, .js, .tsx, .jsx), C# (.cs), Java (.java), Go (.go) |
| Partial (heuristic) | Ruby (.rb), Rust (.rs), PHP (.php), Kotlin (.kt) |
Serena integration improves accuracy when available.
This skill embodies "sergeant methods directing privates":
Each quality scorer is cohesive (single responsibility), loosely coupled (independent), and testable (pure calculation).
These 5 qualities are computer science fundamentals:
Language-agnostic design ensures longevity across technology shifts.
npx claudepluginhub rjmurillo/ai-agents --plugin project-toolkitAssesses code maintainability using 5 qualities: cohesion, coupling, encapsulation, testability, non-redundancy. Scores methods/classes/modules across languages; generates markdown/JSON/HTML reports with remediation guidance.
Generates one-time or comparative code quality scorecards with evidence-backed ratings across correctness, maintainability, and structure. Supports baseline, compare, review, and action queue modes.
Runs Agent-Ready Codebase Assessment scoring codebase across 8 dimensions with parallel agents, producing weighted 0-100 score, band rating, and improvement roadmap. Supports Ruby, Python, PHP, TypeScript, JavaScript, Go, Java, Scala, Rust.