From jewzaam-reviews
Reverse-engineer C4 architecture diagrams (Context, Container, Component) and a behavioral specification from a codebase. Produces documentation of WHAT the system does — not how it's coded.
How this skill is triggered — by the user, by Claude, or both
Slash command
/jewzaam-reviews:c4-reverse-engineerThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- Generated By: Claude Code (Claude Opus 4.6) -->
references/example-output.mdreferences/failure-modes.mdreferences/review-format.mdreferences/subagent-prompts.mdreferences/verification-templates.mdscripts/count_source_lines.pyscripts/find_external_calls.pyscripts/find_platform_conditionals.pyscripts/render-c4-reverse-engineer.pytests/fixtures/pre-render.sample.jsontests/test_count_source_lines.pytests/test_find_external_calls.pytests/test_find_platform_conditionals.pytests/test_render_c4_validation.pyProduce C4 architecture diagrams and a behavioral specification from a codebase. No code changes.
Script paths use ~: Use the Plugin Home path from the Pre-Fetch section (starts with ~) when constructing Bash commands for plugin scripts. Do not use absolute /home/... paths. Do not use && or || chaining — each script call must be a standalone Bash invocation.
Core principle: Subagents tell you WHAT to look at. Direct reads tell you what's TRUE.
Subagent exploration produces plausible-looking specs with subtle errors — wrong refresh rates, wrong event binding targets, configurable-vs-hardcoded confusion. The skill uses subagents for the survey (shape of the system) then verifies every specific claim with direct reads before writing it into a deliverable.
All output goes to docs/c4/ in the project directory:
| File | Content |
|---|---|
l1-c4-context.md | L1 — System boundary + all external actors |
l2-c4-container.md | L2 — Internal runtime containers (processes, threads, services) |
l3-c4-component.md | L3 — Component decomposition per container |
behavioral-spec.md | Full behavioral specification |
Findings-c4-reverse-engineer.md | Quality record documenting what was verified |
Diagrams use standard Mermaid flowchart notation with classDef styling to distinguish C4
element types. Include ASCII fallback for the primary data flow diagram. Use these class
definitions at the top of every diagram:
flowchart TD
classDef person fill:#08427b,color:#fff,stroke:#073b6f
classDef system fill:#1168bd,color:#fff,stroke:#0f5ca8
classDef external fill:#999,color:#fff,stroke:#8a8a8a
classDef container fill:#438dd5,color:#fff,stroke:#3c7ebc
classDef component fill:#85bbf0,color:#000,stroke:#78a8d8
classDef store fill:#438dd5,color:#fff,stroke:#3c7ebc,stroke-dasharray: 5 5
Apply classes to nodes: NodeId["Label<br/><i>description</i>"]:::person. Use <br/> and
<i> tags for multi-line labels with technology descriptions.
Plugin root with ~ prefix. Use this path in all Bash commands that invoke plugin scripts.
!bash ${CLAUDE_PLUGIN_ROOT}/scripts/print-plugin-home.sh ${CLAUDE_PLUGIN_ROOT}
Absolute path of the project root. The main agent MUST substitute this value for any ./.tmp-c4-reverse-engineer/... or docs/c4/... path it passes to a dispatched sub-agent, so the sub-agent has an unambiguous absolute Write target and cannot drift to /tmp/ or any other directory.
!pwd
Wipes and recreates ./.tmp-c4-reverse-engineer/ at the project root with a .gitignore of *. Phase 6 uses this dir for the validation pre-render JSON and any meta-issues.
!bash ${CLAUDE_PLUGIN_ROOT}/scripts/bootstrap-tmp.sh .tmp-c4-reverse-engineer
!bash ${CLAUDE_PLUGIN_ROOT}/scripts/print-handoff-contract.sh
Before running the 7-phase workflow, verify the target:
scripts/count_source_lines.py (also needed for Phase 1 tier sizing); if it reports zero source lines, stop and tell the user there's nothing to reverse-engineer.src/, the root, language-specific conventions). If everything is generated or vendored, tell the user the skill can't produce meaningful C4 artifacts on this shape of repo.Fail fast with a clear message when prerequisites aren't met — entering Phase 1 on an empty codebase wastes sub-agent dispatches and produces empty diagrams.
Before touching code, establish why the user wants this spec and which artifacts they need. Different purposes reweight what "important" means and what depth each section deserves. Running Phase 0 takes ~30 seconds and prevents you from writing the wrong spec confidently.
Ask the user two questions via AskUserQuestion in a single call:
Purpose — pick one:
| Purpose | Emphasis |
|---|---|
| Client/server split or protocol rewrite | Wire formats, state machines, persistence, external system contracts |
| Refactor preparation | Module boundaries, coupling, cross-module overrides, integration points |
| Developer onboarding | Vocabulary (enums/constants), entry points, mental model, happy-path data flow |
| Security/architecture audit | Trust boundaries, external calls, failure modes, degradation paths, authz checks |
| General architecture documentation | Balanced across all of the above |
Artifacts — pick a set:
| Set | Files produced |
|---|---|
| Full set (default) | c4-context + c4-container + c4-component + behavioral-spec + review |
| Diagrams only | c4-context + c4-container + c4-component + review |
| Spec only | behavioral-spec + review |
| One level | whichever of L1/L2/L3 the user names + review |
Record both answers and carry them forward:
Skip Phase 0 only if the user has already stated both answers in the conversation. In that case, acknowledge what you heard and proceed.
Read the project directly to establish context and vocabulary before dispatching subagents.
Read README, CLAUDE.md, or equivalent project docs
Measure project size to pick the tier. Use ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/count_source_lines.py
(see Reference Files) — it counts non-test, non-generated, non-vendored lines in the project's primary
languages. Without the script, estimate by globbing source files and summing wc -l
across them, excluding tests/vendored/generated trees.
| Tier | Source lines | Strategy |
|---|---|---|
| Small | < 3,000 | Direct reads for everything. Skip Phase 2 — the Phase 1 survey is the exploration. |
| Medium | 3,000 – 15,000 | Hybrid: direct reads for entry points and vocabulary, 2 subagents in Phase 2 for the remainder. |
| Large | > 15,000 | Subagent-heavy: 3 subagents in Phase 2, direct reads only for sampled verification in Phase 3. |
File count is a poor proxy — a repo with 200 JSON fixtures is not large, and a repo with
one 3,000-line controller is not small. Count lines, not files. Polyglot repos: sum lines
across all primary languages (e.g., Python backend + TypeScript frontend). See
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md for polyglot partition guidance.
Identify and read entry points — enumerate every CLI arg, HTTP endpoint, event handler
Read vocabulary files: enums, constants, settings/config dataclasses, shared types
Identify integration points: find file(s) with the most imports from other project modules. Read them in full. Common forms: orchestrator/controller, request handler, CLI entry point, application factory. There may be more than one
For non-obvious behaviors discovered in steps 3–5, check git log --follow <file> for the
commit that introduced them. Commit messages often explain why a guard, timeout, or
fallback exists — context the code alone doesn't reveal.
Small tier: Skip Phase 2 entirely. The Phase 1 survey covered the codebase directly; go to Phase 3 and verify the claims you already have.
Medium tier: Dispatch 2 subagents in parallel. Merge A and B into a single "Behavior + Interfaces" agent; keep C.
Large tier: Dispatch 3 subagents in parallel.
Before composing subagent prompts, read ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md
in full. Include all 7 instructions from that file in every subagent prompt.
Partition by concern, not by file:
| Agent | Scope |
|---|---|
| A — Internal behavior | Modules, state management, events, data flows, business logic |
| B — External-facing interfaces | UI, API, CLI, public API surface, user-visible interactions |
| C — Dependencies & infrastructure | External system calls, file I/O, subprocesses, config loading, platform adapters |
How the partition maps to architectures:
For polyglot repos, partition by language/stack instead of by concern — one subagent per major stack (e.g., Python backend, TypeScript frontend, Rust extension). Each subagent still receives the 7 prompt instructions, specialized for its stack via the framework calibration table.
Each subagent prompt must include all 7 instructions from ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md.
These instructions prevent the 9 known failure modes — omitting them produces specs that look
right but have hidden errors.
For each behavioral claim from the subagent reports, verify it against the code using
structured templates. Read ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/verification-templates.md for the
full template format, trigger-word heuristics, and priority tiers.
Key concept: Each template produces a claim AND its evidence simultaneously. Don't extract claims from subagent prose then verify separately — that rephrasing step introduces errors. Instead, read the code and fill in the template directly.
Priority: Verify every high-risk claim (timing, configurability, scope, cross-module overrides, entry point inventory). Sample medium-risk (interaction bindings, platform branches, event suppression, degradation paths). Skip low-risk (enum values, file paths, static labels).
For mechanical work in this phase, use the helper scripts:
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_platform_conditionals.py — enumerates every
sys.platform, platform.system(), os.name, and project-specific platform constant.
Feeds PLATFORM claims.${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_external_calls.py — lists every subprocess.,
urllib, requests, socket, open( call. Feeds DEGRADATION claims and the L1
external systems list.These scripts give you a deterministic inventory to diff subagent reports against — any item in the script output that isn't in a subagent report is a gap to close.
Gap-closure gate (before Phase 4): every script-detected item that is still not
accounted for in the working set of verified claims MUST be materialised as a
MISSING finding in the Phase 6 pre-render JSON. No silent drops — the script
output is authoritative for its category, so an omission means the diagram/spec
would underreport what the code actually does. Record the script path and line
number as the finding's primary location and find_external_calls.py /
find_platform_conditionals.py as the contributing source in evidence.
Write diagrams from verified claims only. Respect the artifact set from Phase 0 — skip any level the user didn't request:
Diagram styling conventions:
["Label<br/><i>tech/description</i>"] for multi-line node labels|label| syntax: A -->|"HTTPS, JSON"| Bsubgraph blocks for containers in L2/L3 diagramsapi, db, cli); put readable names in the labelSkip this phase if the Phase 0 artifact set doesn't include the behavioral spec.
Write the spec from verified claims. Weight each section's depth by the Phase 0 purpose (onboarding → vocabulary-heavy; audit → degradation-heavy; client rewrite → protocol-heavy). The structure below is the floor, not the ceiling. Every generated spec includes these 10 sections. Add sections when the system has substantial behavior outside them (e.g., agent tracking, ghost sessions, state persistence as standalone sections rather than squeezed into §3 or §8):
The spec documents what the code DOES, including when it does something surprising (like ignoring a configurable setting). It does not judge whether the code SHOULD behave differently.
Describe, don't evaluate. Behavioral descriptions are mechanical: what the code runs, what it returns, what it calls next. Never characterize code as "correct", "reliable", "optimal", "most", "best", or "successful". Those are judgments the skill cannot make — they rot fast and mislead worst.
| Bad (evaluative) | Good (mechanical) |
|---|---|
"Uses code <cwd> — most reliable on X" | "Runs code <cwd>. VS Code activates the window whose root folder is <cwd>, or opens a new window if none match." |
| "Foregrounds the correct window" | "Calls SetForegroundWindow(hwnd) on the window whose title contains the CWD basename." |
| "Safely handles missing files" | "Catches FileNotFoundError and returns an empty dict." |
| "Always validates input" | "Every caller at [file:line, file:line] routes through _validate() before write." |
If the source explicitly uses an evaluative word (e.g., a comment saying "// fast path"), quote it: "the code labels this 'fast path' (file:line)". Don't launder it into your own voice.
Run these consistency checks:
Fix-before-delivery: Fix Critical and Important findings in the artifacts (return to Phases 4–5), then re-run checks. Minor findings are documented but not fixed.
Produce the validation output via render-c4-reverse-engineer.py. Collect findings into ./.tmp-c4-reverse-engineer/pre-render.json using the pre-render shape documented in ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/review-format.md, then invoke:
python ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/render-c4-reverse-engineer.py \
--input ./.tmp-c4-reverse-engineer/pre-render.json \
--issues ./.tmp-c4-reverse-engineer/issues.json \
--out-dir <project root> \
--project-name <project name>
Skill-specific renderer behavior (on top of the shared handoff contract):
C0.., I0.., S0.. — note: suggestion replaces the old Minor tier).content_hash per finding.Findings-c4-reverse-engineer.json and Findings-c4-reverse-engineer.md at --out-dir.The validation documents the post-fix state as a quality record. Meta-issues from the run (sub-agent failures, missing artifacts, verification scripts that couldn't run) go in the shared issues[] via --issues. Supplementary metadata (the Summary table, Confirmed claims list) goes under supplementary in the pre-render JSON — it flows into the markdown and travels in the handoff JSON but is not part of the findings[] contract.
The Phase 6 "component in L3 isn't referenced anywhere" check is a doc-consistency check, not dead-code review: the concern is that the spec listed a component and then never used it. If the code itself has dead functions, that's for a separate review — this skill stays mute on it.
Read these during the phases that need them:
| File | When to read | Content |
|---|---|---|
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md | Phase 2, before dispatching subagents | 7 prompt instructions + framework calibration table |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/verification-templates.md | Phase 3, before verification | 8 templates, trigger-word heuristics, priority tiers |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/review-format.md | Phase 6, before writing pre-render JSON | Severity bucket mapping, pre-render JSON shape consumed by render-c4-reverse-engineer.py |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/failure-modes.md | When diagnosing a verification failure | 9 failure modes with root causes and detection methods |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/example-output.md | Phase 5, for phrasing calibration | Toy project with a mechanical-phrasing §9 Degradation excerpt and a side-by-side bad/good comparison |
Deterministic helpers for mechanical work. Run them from the project root; each prints results to stdout.
| Script | When to run | What it does |
|---|---|---|
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/count_source_lines.py | Phase 1, step 2 | Counts non-test, non-generated source lines per language and recommends a tier. |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_platform_conditionals.py | Phase 3, PLATFORM claims | Enumerates every sys.platform, platform.system(), os.name, IS_* constant, Go build tag, Rust cfg(target_os), and C/C++ platform ifdef. |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_external_calls.py | Phase 3, DEGRADATION claims and L1 external systems | Enumerates subprocess, network, and filesystem calls across Python/Node/Go/Rust/Java/C#. Use --group for a category-grouped report. Comment detection is line-based — multi-line comment blocks and docstrings may produce false positives. |
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/render-c4-reverse-engineer.py | Phase 6, after validating findings | Assigns stable IDs (C0.., I0.., S0..) per severity bucket, computes content hashes, builds the shared-schema envelope with source: "c4-reverse-engineer", validates against findings.schema.json, and writes Findings-c4-reverse-engineer.{json,md}. Exits non-zero if validation fails — no files are written on failure. |
Invoke directly: ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/<name>.py [project-root]. The scripts have
shebang lines and are executable. Defaults to the current working directory. All three
exclude tests and vendored code by default; pass --include-tests on the finder scripts
if you need the full picture.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub jewzaam/jewzaam-reviews --plugin jewzaam-reviews