From architecture
Run a repeatable, evidence-based architecture review of an existing codebase. Use when asked to assess modularity, coupling, cohesion, dependency direction, circular dependencies, blast radius, fragile seams, shallow modules, testability, ownership boundaries, architectural drift, structural risk, or fit between intended and observed architecture. Drives local search/read/grep, code graph, GitNexus/change-history, AST/LSP, language, and operational tool evidence; scores with the scorecard and writes cited findings. NOT for line-level code review, target architecture design (use architecture-design), or implementation sequencing (use architecture-plan after design approval).
How this skill is triggered — by the user, by Claude, or both
Slash command
/architecture:architecture-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The review loop, in order. Do not skip ahead — each step gates the next.
The review loop, in order. Do not skip ahead — each step gates the next.
Use when a user wants their architecture reviewed, audited, or scored; wants to
understand where modularity, coupling, dependency, blast-radius, or fragility
risk lives; or wants to compare intended and observed architecture. For comparing
two existing reports use architect-compare-reports. For a combined "review and
refactor" request, finish the read-only review first, then recommend exactly one
primary next skill: architecture-design when remediation needs a target state,
or architecture-plan only when an approved target design already exists. A
mutator or engineer applies changes after approval. For target architecture or
requirements-to-design work without review, use architecture-design instead.
architecture-design instead of reviewing implementation quality.architecture-review to compare intended architecture with
observed implementation. Treat design docs as intent only; actual code,
runtime config, and tests may have drifted.architecture-design when findings need target boundaries, contracts,
tests, or fitness checks;architecture-plan only when the target design is already approved and the
user asks for implementation sequencing.architecture-review again with comparable scope to
check whether the code now matches intent.Maintain a visible task list for the review flow. Track at least:
Keep task names outcome-based. Do not expose runtime-specific mechanics in the instructions or report.
Interview if context is missing. Do not score from a cold start. If you
lack the intended architecture, quality goals, volatile areas, or scope, run
the interview first. See references/interview.md. Inspect docs, ADRs, and
manifests before asking — never ask a question the repo already answers. Ask
only for missing context whose answer would change the architecture
assessment.
Non-interactive runs (CI, autonomous): when no user is reachable,
reconstruct intent from docs/ADRs/CLAUDE.md/changelog, label the context
reconstructed, and cap analysis_confidence accordingly. Never invent intent.
Validate the working model. Before scoring, surface your current
understanding for correction: system purpose, candidate units, responsibilities,
major integrations, domain classifications, ownership/deploy assumptions,
known pain, and doc-vs-code drift risks. Existing architecture docs describe
intended design, not necessarily current implementation. Ask only for
corrections that would change the assessment. In non-interactive runs, mark
this model as reconstructed and record unconfirmed assumptions in
missing_evidence.
Build the system map before judging quality. Establish what exists:
languages, package managers, units, deploy units, public interfaces, declared
modules (manifests/dirs) vs observed modules (graphs/imports/churn), high-risk
entrypoints, and missing evidence. This populates system_map in the report
frontmatter. Scoring before a map is forbidden. Scoring from directory shape
alone is explicitly forbidden — a directory tree is not an architecture.
Gather evidence across applicable dimensions. Use tools-code-search for
local search/read/grep, then the relevant specialized tool skills (ast-grep,
codegraph, GitNexus, LSP/tree-sitter, language and operational tool skills)
to cover discovery, structural, semantic, dependency, change, and operational
evidence. Cite tools and files you used. Record coverage — used, skipped,
missing, failed — per dimension, even where you find nothing wrong. Summarize
output; do not paste raw dumps.
gitnexus status). A stale index is a coverage gap, not evidence — record
it as tools_failed, do not score from it.tools_missing with explicit confidence_impact. Do not silently score a
dimension (e.g. dependency health) from imports alone without flagging the
gap and capping confidence.RUFF_CACHE_DIR=$TMPDIR/...); a sandboxed or
read-only target will otherwise fail the tool.git log --follow per file.Triage before scoring. Sort signal from noise: which observations are
facts, which are hypotheses, which actually bear on a score. See
references/triage.md.
Score with the scorecard skill. Use the architecture-scorecard skill for
every score. Read ../../templates/scorecard.yaml for dimensions, bands,
anchors, and rules — it is the source of truth. Each non-meta score needs at
least one evidence ref. Low confidence caps the quality claim.
Write the report from the template. Use ../../templates/report.md as the
skeleton. Fill frontmatter (interview context, system map, scores, findings,
evidence, tool coverage) and the prose sections. Findings carry stable IDs
and human-facing narratives: knowledge or boundary leakage, complexity impact,
cascading-change scenarios, recommendation, and trade-offs.
Recommend the next primary skill. If the user asks for review and immediate refactoring, do not edit source or mix audit with implementation. Finish the report, then choose one next skill:
architecture-design when the target boundaries, contracts, tests, or
fitness checks are not yet approved;architecture-plan when an approved design already exists and the user
wants executable sequencing;If the user asks for the full remediation pipeline, name the sequence:
architecture-review → architecture-design → architecture-plan →
implementation by a mutator/engineer → architecture-review re-check. The
final handoff must include finding/evidence IDs, design decision IDs, scoped
modules/files, incremental steps, verification checks, acceptance criteria,
risk/rollback notes, and an explicit mutator/engineer implementation step.
A completed review produces an architecture report using ../../templates/report.md.
The report must include interview_context, system_map, scores, findings,
evidence, and tool_coverage. Each finding must include a human-facing
narrative explaining the leak or drift, complexity impact, cascading-change
scenarios, recommendation, and trade-offs. If remediation is requested, recommend
exactly one primary next skill unless the user asks for the full pipeline:
architecture-design for target-state work, or architecture-plan only when an
approved design already exists and implementation sequencing is requested.
When asked to describe the review workflow, include these clauses explicitly:
When the user asks to review and refactor, separate the response into review,
scoring/recommendations, next-skill recommendation, and implementation handoff.
Use the exact skill name that applies next: architecture-design when target
state is missing, or architecture-plan when an approved design already exists.
State that the architect refuses source edits, and that the handoff includes
verification steps and acceptance criteria for the mutator or engineer.
Use references/interview.md for the full interview and fallback rules. In the
main review flow:
missing_evidence and lower
analysis_confidence.architecture-design, or route approved implementation
sequencing to a mutator/engineer via architecture-plan with
verification-backed acceptance criteria.npx claudepluginhub alexei-led/architect --plugin architectureCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.