Skill

c4-reverse-engineer

Reverse-engineer C4 architecture diagrams (Context, Container, Component) and a behavioral specification from a codebase. Produces documentation of WHAT the system does — not how it's coded.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/jewzaam-reviews:c4-reverse-engineer

User invocable

Model invocation disabled

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

Bash(bash ${CLAUDE_PLUGIN_ROOT}/**)Bash(python ${CLAUDE_PLUGIN_ROOT}/**)Bash(python3 ${CLAUDE_PLUGIN_ROOT}/**)Bash(pwd)Read(${CLAUDE_SKILL_DIR}/references/*)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

references/example-output.mdreferences/failure-modes.mdreferences/review-format.mdreferences/subagent-prompts.mdreferences/verification-templates.mdscripts/count_source_lines.pyscripts/find_external_calls.pyscripts/find_platform_conditionals.pyscripts/render-c4-reverse-engineer.pytests/fixtures/pre-render.sample.jsontests/test_count_source_lines.pytests/test_find_external_calls.pytests/test_find_platform_conditionals.pytests/test_render_c4_validation.py

SKILL.md

375 lines · ~5.5k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

C4 Reverse-Engineering

Produce C4 architecture diagrams and a behavioral specification from a codebase. No code changes.

Script paths use ~: Use the Plugin Home path from the Pre-Fetch section (starts with ~) when constructing Bash commands for plugin scripts. Do not use absolute /home/... paths. Do not use && or || chaining — each script call must be a standalone Bash invocation.

Core principle: Subagents tell you WHAT to look at. Direct reads tell you what's TRUE.

Subagent exploration produces plausible-looking specs with subtle errors — wrong refresh rates, wrong event binding targets, configurable-vs-hardcoded confusion. The skill uses subagents for the survey (shape of the system) then verifies every specific claim with direct reads before writing it into a deliverable.

Output

All output goes to docs/c4/ in the project directory:

File	Content
`l1-c4-context.md`	L1 — System boundary + all external actors
`l2-c4-container.md`	L2 — Internal runtime containers (processes, threads, services)
`l3-c4-component.md`	L3 — Component decomposition per container
`behavioral-spec.md`	Full behavioral specification
`Findings-c4-reverse-engineer.md`	Quality record documenting what was verified

Diagrams use standard Mermaid flowchart notation with classDef styling to distinguish C4 element types. Include ASCII fallback for the primary data flow diagram. Use these class definitions at the top of every diagram:

flowchart TD
    classDef person fill:#08427b,color:#fff,stroke:#073b6f
    classDef system fill:#1168bd,color:#fff,stroke:#0f5ca8
    classDef external fill:#999,color:#fff,stroke:#8a8a8a
    classDef container fill:#438dd5,color:#fff,stroke:#3c7ebc
    classDef component fill:#85bbf0,color:#000,stroke:#78a8d8
    classDef store fill:#438dd5,color:#fff,stroke:#3c7ebc,stroke-dasharray: 5 5

Apply classes to nodes: NodeId["Label description"]:::person. Use   and  tags for multi-line labels with technology descriptions.

Pre-Fetch

Plugin Home (auto-detected)

Plugin root with ~ prefix. Use this path in all Bash commands that invoke plugin scripts.

!bash ${CLAUDE_PLUGIN_ROOT}/scripts/print-plugin-home.sh ${CLAUDE_PLUGIN_ROOT}

Project Root (auto-detected)

Absolute path of the project root. The main agent MUST substitute this value for any ./.tmp-c4-reverse-engineer/... or docs/c4/... path it passes to a dispatched sub-agent, so the sub-agent has an unambiguous absolute Write target and cannot drift to /tmp/ or any other directory.

!pwd

Workspace Bootstrap (auto-executed)

Wipes and recreates ./.tmp-c4-reverse-engineer/ at the project root with a .gitignore of *. Phase 6 uses this dir for the validation pre-render JSON and any meta-issues.

!bash ${CLAUDE_PLUGIN_ROOT}/scripts/bootstrap-tmp.sh .tmp-c4-reverse-engineer

Shared Handoff Contract (auto-injected)

!bash ${CLAUDE_PLUGIN_ROOT}/scripts/print-handoff-contract.sh

Prerequisites

Before running the 7-phase workflow, verify the target:

Non-empty codebase. Use scripts/count_source_lines.py (also needed for Phase 1 tier sizing); if it reports zero source lines, stop and tell the user there's nothing to reverse-engineer.
Readable source tree. Glob typical source directories (src/, the root, language-specific conventions). If everything is generated or vendored, tell the user the skill can't produce meaningful C4 artifacts on this shape of repo.

Fail fast with a clear message when prerequisites aren't met — entering Phase 1 on an empty codebase wastes sub-agent dispatches and produces empty diagrams.

Workflow — 7 Phases

Phase 0: Intake

Before touching code, establish why the user wants this spec and which artifacts they need. Different purposes reweight what "important" means and what depth each section deserves. Running Phase 0 takes ~30 seconds and prevents you from writing the wrong spec confidently.

Ask the user two questions via AskUserQuestion in a single call:

Purpose — pick one:

Purpose	Emphasis
Client/server split or protocol rewrite	Wire formats, state machines, persistence, external system contracts
Refactor preparation	Module boundaries, coupling, cross-module overrides, integration points
Developer onboarding	Vocabulary (enums/constants), entry points, mental model, happy-path data flow
Security/architecture audit	Trust boundaries, external calls, failure modes, degradation paths, authz checks
General architecture documentation	Balanced across all of the above

Artifacts — pick a set:

Set	Files produced
Full set (default)	c4-context + c4-container + c4-component + behavioral-spec + review
Diagrams only	c4-context + c4-container + c4-component + review
Spec only	behavioral-spec + review
One level	whichever of L1/L2/L3 the user names + review

Record both answers and carry them forward:

The purpose biases Phase 5 emphasis: the spec's content is the same, but the depth of each section matches what the purpose needs. An onboarding spec spends more words on vocabulary and entry points; an audit spec spends more on degradation and trust boundaries. All 10 floor sections remain present.
The artifact set determines which Phase 4/5 files you produce. The review artifact is always produced — it's the quality record.

Skip Phase 0 only if the user has already stated both answers in the conversation. In that case, acknowledge what you heard and proceed.

Phase 1: Codebase Survey (direct reads)

Read the project directly to establish context and vocabulary before dispatching subagents.

Read README, CLAUDE.md, or equivalent project docs

Measure project size to pick the tier. Use ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/count_source_lines.py (see Reference Files) — it counts non-test, non-generated, non-vendored lines in the project's primary languages. Without the script, estimate by globbing source files and summing wc -l across them, excluding tests/vendored/generated trees.

Tier	Source lines	Strategy
Small	< 3,000	Direct reads for everything. Skip Phase 2 — the Phase 1 survey is the exploration.
Medium	3,000 – 15,000	Hybrid: direct reads for entry points and vocabulary, 2 subagents in Phase 2 for the remainder.
Large	> 15,000	Subagent-heavy: 3 subagents in Phase 2, direct reads only for sampled verification in Phase 3.

File count is a poor proxy — a repo with 200 JSON fixtures is not large, and a repo with one 3,000-line controller is not small. Count lines, not files. Polyglot repos: sum lines across all primary languages (e.g., Python backend + TypeScript frontend). See ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md for polyglot partition guidance.

Identify and read entry points — enumerate every CLI arg, HTTP endpoint, event handler
Read vocabulary files: enums, constants, settings/config dataclasses, shared types
Identify integration points: find file(s) with the most imports from other project modules. Read them in full. Common forms: orchestrator/controller, request handler, CLI entry point, application factory. There may be more than one
For non-obvious behaviors discovered in steps 3–5, check git log --follow <file> for the commit that introduced them. Commit messages often explain why a guard, timeout, or fallback exists — context the code alone doesn't reveal.

Phase 2: Parallel Exploration (tier-scaled)

Small tier: Skip Phase 2 entirely. The Phase 1 survey covered the codebase directly; go to Phase 3 and verify the claims you already have.

Medium tier: Dispatch 2 subagents in parallel. Merge A and B into a single "Behavior + Interfaces" agent; keep C.

Large tier: Dispatch 3 subagents in parallel.

Before composing subagent prompts, read ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md in full. Include all 7 instructions from that file in every subagent prompt.

Partition by concern, not by file:

Agent	Scope
A — Internal behavior	Modules, state management, events, data flows, business logic
B — External-facing interfaces	UI, API, CLI, public API surface, user-visible interactions
C — Dependencies & infrastructure	External system calls, file I/O, subprocesses, config loading, platform adapters

How the partition maps to architectures:

Desktop app: A = controller/model, B = UI/interaction layer, C = scripts/OS integration
REST API: A = domain logic/services, B = HTTP endpoints/middleware, C = DB/cache/queues
CLI tool: A = command logic, B = arg parsing/output formatting, C = file system/network
Library: A = core algorithms, B = public API surface, C = optional dependencies

For polyglot repos, partition by language/stack instead of by concern — one subagent per major stack (e.g., Python backend, TypeScript frontend, Rust extension). Each subagent still receives the 7 prompt instructions, specialized for its stack via the framework calibration table.

Each subagent prompt must include all 7 instructions from ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md. These instructions prevent the 9 known failure modes — omitting them produces specs that look right but have hidden errors.

Phase 3: Structured Verification (targeted reads + templates)

For each behavioral claim from the subagent reports, verify it against the code using structured templates. Read ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/verification-templates.md for the full template format, trigger-word heuristics, and priority tiers.

Key concept: Each template produces a claim AND its evidence simultaneously. Don't extract claims from subagent prose then verify separately — that rephrasing step introduces errors. Instead, read the code and fill in the template directly.

Priority: Verify every high-risk claim (timing, configurability, scope, cross-module overrides, entry point inventory). Sample medium-risk (interaction bindings, platform branches, event suppression, degradation paths). Skip low-risk (enum values, file paths, static labels).

For mechanical work in this phase, use the helper scripts:

${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_platform_conditionals.py — enumerates every sys.platform, platform.system(), os.name, and project-specific platform constant. Feeds PLATFORM claims.
${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_external_calls.py — lists every subprocess., urllib, requests, socket, open( call. Feeds DEGRADATION claims and the L1 external systems list.

These scripts give you a deterministic inventory to diff subagent reports against — any item in the script output that isn't in a subagent report is a gap to close.

Gap-closure gate (before Phase 4): every script-detected item that is still not accounted for in the working set of verified claims MUST be materialised as a MISSING finding in the Phase 6 pre-render JSON. No silent drops — the script output is authoritative for its category, so an omission means the diagram/spec would underreport what the code actually does. Record the script path and line number as the finding's primary location and find_external_calls.py / find_platform_conditionals.py as the contributing source in evidence.

Phase 4: C4 Diagram Generation

Write diagrams from verified claims only. Respect the artifact set from Phase 0 — skip any level the user didn't request:

L1 Context: The system as a single box. Every external actor/system it communicates with. Include a trust boundary note and an external systems summary table
L2 Container: Internal runtime boundaries — processes, threads, scripts, persistent stores. Include the threading/concurrency model
L3 Component: Behavioral decomposition per container. Include event-to-state mapping tables, data flow summaries

Diagram styling conventions:

Use ["Label tech/description"] for multi-line node labels
Edge labels go in |label| syntax: A -->|"HTTPS, JSON"| B
Group related nodes with subgraph blocks for containers in L2/L3 diagrams
Keep node IDs short and lowercase (api, db, cli); put readable names in the label

Phase 5: Behavioral Spec Generation

Skip this phase if the Phase 0 artifact set doesn't include the behavioral spec.

Write the spec from verified claims. Weight each section's depth by the Phase 0 purpose (onboarding → vocabulary-heavy; audit → degradation-heavy; client rewrite → protocol-heavy). The structure below is the floor, not the ceiling. Every generated spec includes these 10 sections. Add sections when the system has substantial behavior outside them (e.g., agent tracking, ghost sessions, state persistence as standalone sections rather than squeezed into §3 or §8):

Purpose and operational overview
Startup / initialization sequence
Core lifecycle (discovery, polling, event processing)
State machines and transitions (with guards)
Event flows and processing rules
User interactions (every click/action and its effect)
Integration behaviors (each external system)
Persistence and recovery
Degradation behavior — for each external dependency, what happens when it fails
Behavioral nuances — the non-obvious behaviors that would surprise someone reading the code

The spec documents what the code DOES, including when it does something surprising (like ignoring a configurable setting). It does not judge whether the code SHOULD behave differently.

Describe, don't evaluate. Behavioral descriptions are mechanical: what the code runs, what it returns, what it calls next. Never characterize code as "correct", "reliable", "optimal", "most", "best", or "successful". Those are judgments the skill cannot make — they rot fast and mislead worst.

Bad (evaluative)	Good (mechanical)
"Uses `code <cwd>` — most reliable on X"	"Runs `code <cwd>`. VS Code activates the window whose root folder is `<cwd>`, or opens a new window if none match."
"Foregrounds the correct window"	"Calls `SetForegroundWindow(hwnd)` on the window whose title contains the CWD basename."
"Safely handles missing files"	"Catches `FileNotFoundError` and returns an empty dict."
"Always validates input"	"Every caller at [file:line, file:line] routes through `_validate()` before write."

If the source explicitly uses an evaluative word (e.g., a comment saying "// fast path"), quote it: "the code labels this 'fast path' (file:line)". Don't launder it into your own voice.

Phase 6: Cross-Artifact Consistency Check & Review

Run these consistency checks:

Every external system in L1 appears in the behavioral spec
Every container in L2 maps to components in L3
Every component in L3 appears in the data flow summary or behavioral spec — a listed component nothing references is either dead code (flag it for the user) or missing coverage (trace how it's used and add the missing edge)
Every state transition has a documented trigger
Every timing/frequency claim is consistent across all artifacts
Every "configurable" claim uses the same term (settings field vs constant)
Every cross-module override in L3 is reflected in the behavioral spec
Platform-conditional behaviors documented for both platforms
Superlative scrub: grep the artifacts for "most", "best", "optimal", "always", "never", "correct", "successful", "reliable", "safe". Each hit is either quoted from source (keep, cite file:line) or the model editorializing (rewrite mechanically — see Phase 5 Describe-don't-evaluate table). No superlatives survive into the final output unquoted.

Fix-before-delivery: Fix Critical and Important findings in the artifacts (return to Phases 4–5), then re-run checks. Minor findings are documented but not fixed.

Produce the validation output via render-c4-reverse-engineer.py. Collect findings into ./.tmp-c4-reverse-engineer/pre-render.json using the pre-render shape documented in ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/review-format.md, then invoke:

python ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/render-c4-reverse-engineer.py \
  --input ./.tmp-c4-reverse-engineer/pre-render.json \
  --issues ./.tmp-c4-reverse-engineer/issues.json \
  --out-dir <project root> \
  --project-name <project name>

Skill-specific renderer behavior (on top of the shared handoff contract):

Assigns IDs per bucket (C0.., I0.., S0.. — note: suggestion replaces the old Minor tier).
Computes content_hash per finding.
Writes Findings-c4-reverse-engineer.json and Findings-c4-reverse-engineer.md at --out-dir.

The validation documents the post-fix state as a quality record. Meta-issues from the run (sub-agent failures, missing artifacts, verification scripts that couldn't run) go in the shared issues[] via --issues. Supplementary metadata (the Summary table, Confirmed claims list) goes under supplementary in the pre-render JSON — it flows into the markdown and travels in the handoff JSON but is not part of the findings[] contract.

What This Skill Is NOT

Not a code review — no quality judgments, no hunting for unused functions
Not a refactoring plan — no code changes proposed
Not a test plan
Not documentation of HOW code works — only WHAT it does

The Phase 6 "component in L3 isn't referenced anywhere" check is a doc-consistency check, not dead-code review: the concern is that the spec listed a component and then never used it. If the code itself has dead functions, that's for a separate review — this skill stays mute on it.

Reference Files

Read these during the phases that need them:

File	When to read	Content
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/subagent-prompts.md`	Phase 2, before dispatching subagents	7 prompt instructions + framework calibration table
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/verification-templates.md`	Phase 3, before verification	8 templates, trigger-word heuristics, priority tiers
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/review-format.md`	Phase 6, before writing pre-render JSON	Severity bucket mapping, pre-render JSON shape consumed by `render-c4-reverse-engineer.py`
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/failure-modes.md`	When diagnosing a verification failure	9 failure modes with root causes and detection methods
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/references/example-output.md`	Phase 5, for phrasing calibration	Toy project with a mechanical-phrasing §9 Degradation excerpt and a side-by-side bad/good comparison

Scripts

Deterministic helpers for mechanical work. Run them from the project root; each prints results to stdout.

Script	When to run	What it does
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/count_source_lines.py`	Phase 1, step 2	Counts non-test, non-generated source lines per language and recommends a tier.
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_platform_conditionals.py`	Phase 3, PLATFORM claims	Enumerates every `sys.platform`, `platform.system()`, `os.name`, `IS_*` constant, Go build tag, Rust `cfg(target_os)`, and C/C++ platform ifdef.
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/find_external_calls.py`	Phase 3, DEGRADATION claims and L1 external systems	Enumerates subprocess, network, and filesystem calls across Python/Node/Go/Rust/Java/C#. Use `--group` for a category-grouped report. Comment detection is line-based — multi-line comment blocks and docstrings may produce false positives.
`${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/render-c4-reverse-engineer.py`	Phase 6, after validating findings	Assigns stable IDs (`C0..`, `I0..`, `S0..`) per severity bucket, computes content hashes, builds the shared-schema envelope with `source: "c4-reverse-engineer"`, validates against `findings.schema.json`, and writes `Findings-c4-reverse-engineer.{json,md}`. Exits non-zero if validation fails — no files are written on failure.

Invoke directly: ${CLAUDE_PLUGIN_ROOT}/skills/c4-reverse-engineer/scripts/<name>.py [project-root]. The scripts have shebang lines and are executable. Defaults to the current working directory. All three exclude tests and vendored code by default; pass --include-tests on the finder scripts if you need the full picture.

c4-reverse-engineer

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

c4-reverse-engineer

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

C4 Reverse-Engineering

Output

Pre-Fetch

Plugin Home (auto-detected)

Project Root (auto-detected)

Workspace Bootstrap (auto-executed)

Shared Handoff Contract (auto-injected)

Prerequisites

Workflow — 7 Phases

Phase 0: Intake

Phase 1: Codebase Survey (direct reads)

Phase 2: Parallel Exploration (tier-scaled)

Phase 3: Structured Verification (targeted reads + templates)

Phase 4: C4 Diagram Generation

Phase 5: Behavioral Spec Generation

Phase 6: Cross-Artifact Consistency Check & Review

What This Skill Is NOT

Reference Files

Scripts

Similar Skills

C4 Reverse-Engineering

Output

Pre-Fetch

Plugin Home (auto-detected)

Project Root (auto-detected)

Workspace Bootstrap (auto-executed)

Shared Handoff Contract (auto-injected)

Prerequisites

Workflow — 7 Phases

Phase 0: Intake

Phase 1: Codebase Survey (direct reads)

Phase 2: Parallel Exploration (tier-scaled)

Phase 3: Structured Verification (targeted reads + templates)

Phase 4: C4 Diagram Generation

Phase 5: Behavioral Spec Generation

Phase 6: Cross-Artifact Consistency Check & Review

What This Skill Is NOT

Reference Files

Scripts

Similar Skills