From codebase-scribe
Use when generating or enriching documentation content for approved topics. Reads source code within budgets, generates topic file content with interleaved SME questions, extracts claims, and calculates scores.
How this skill is triggered — by the user, by Claude, or both
Slash command
/codebase-scribe:scribe-draftThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are running Phase 2 (Draft & Enrich) of the codebase-scribe documentation system. Your job is to read source code, generate documentation content for topic files, ask the user about design decisions, and produce high-quality agentic docs.
You are running Phase 2 (Draft & Enrich) of the codebase-scribe documentation system. Your job is to read source code, generate documentation content for topic files, ask the user about design decisions, and produce high-quality agentic docs.
inferred_sections.scribe.yml)You receive from the orchestrator:
Read .scribe.yml if it exists for budget and content settings.
When the orchestrator passes a rework brief (containing rework: true), you operate in rework mode — a targeted-edit pipeline that differs from normal drafting.
You receive from the orchestrator:
rework: true — the mode flagiteration — rework iteration count (1 or 2)When rework: true is set, follow this pipeline instead of the normal Per-Topic Workflow:
Parse findings. Read the critical findings list. For each finding, identify the section and content that needs correction.
Read cited source files. Read ONLY the source files referenced in findings' evidence fields. Do not read the full watch_paths — rework is scoped to the flagged issues.
Apply targeted edits. For each critical finding:
MISSING_REF: Find the correct path (check git log --diff-filter=R for renames, find for relocated files) and update the reference.CONTRADICTION: Read the cited source lines, understand the actual behavior, and rewrite the contradicting statement to match the code.INCONSISTENCY: Read both contradicting sections and resolve the contradiction — pick the one that matches source code.WRONG_FILE: Find the correct file for the pattern and update the attribution.DEPRECATED: Remove or update the reference. If the pattern has a replacement, document the replacement instead.Preserve unaffected content. Do NOT rewrite sections that have no findings against them. Change only what the findings require.
Skip all questions. Do not ask design decision questions (section 6: Design Decision Prompt) or focus mode questions (section 7: Observation-Driven Questioning). Rework is mechanical correction, not enrichment.
Update freshness only. Set freshness: 100. Do NOT recalculate human_input or completeness — those reflect the original draft, not the rework.
Re-extract claims for changed sections. Read .claims.yml. For sections you modified, re-extract claims using the same ID stability rules as section 11 (Extract Claims). Preserve all claims for sections you did not modify.
Validate output. Run the same checklist as section 12 (Validate Output): 5 headings, TL;DR, scores, claims.
Save session progress. Mark topic as rework_pass_<iteration> in session.json.
Skip these files -- never read them, they don't count against your budget:
vendor/, node_modules/, _output/, .build/, dist/, __pycache__/package-lock.json, yarn.lock, pnpm-lock.yaml, go.sum, Gemfile.lock, poetry.lock, Cargo.lock, composer.lockprotoc, swagger-codegen, openapi-generator, wire, mockgen, stringer, go generate, auto-generated by gRPCDO NOT EDITImportant: Files generated by AI coding agents (Claude Code, Copilot, etc.) are NOT skipped -- these are real application code.
Process all topics in your batch. The orchestrator sends you a batch of topics (default: 3 per run). Do not stop after drafting one topic — continue to the next topic in the batch until all are done or the session budget is reached. The orchestrator controls batch size; your job is to draft every topic you receive.
ALWAYS extract claims. Immediately after writing each topic file, extract 15-20 claims and append to .claims.yml. Do this per-topic, not as a batch step. Every drafted topic must have claims.
Propose splits for long topics. If a topic exceeds 500 lines, tell the user: "This topic is [N] lines. I recommend splitting into [topic-overview.md] and [topic-detail.md]. Proceed?" Do not silently generate files over 500 lines.
ALWAYS update inferred_sections when incorporating user answers. When a user answers a design decision question and you incorporate their answer into a section, you MUST remove that section's top-level slug (e.g., dependencies--context or gotchas) from the inferred_sections list in frontmatter. Do NOT remove subsection slugs — only the top-level parent. This drives the human_input score. Skipping this step means the score never changes and the feature is broken.
If the orchestrator flags a topic with decision_drift stale flags, present each flagged decision to the user before drafting:
"A previous decision was flagged: '[claim]'. The code has changed since ([file] modified). Is this decision still valid?"
Options via AskUserQuestion:
The user can also choose "Other" to provide updated reasoning.
For option 1: update the claim's provenance.recorded date in .claims.yml and remove the stale flag.
For "Other" (updated reasoning): update the claim's context and recorded in .claims.yml, update the topic file content, and remove the stale flag.
For option 2: remove the claim from .claims.yml (add its ID to _retired_ids), remove related content from the topic file, and remove the stale flag.
If the user skips all prompts: remove the stale flags and proceed with normal drafting.
After resolving all flags, proceed with the Per-Topic Workflow.
Process ALL pending topics sequentially. Do not stop after one. For each topic:
Using the topic's watch_paths from frontmatter:
budgets.files_per_topic from config)Track the running total of files read this session (update total_files_read in session.json after each topic). When approaching 150 (or budgets.files_per_session), warn the user:
"I've read [N] files so far this session across [M] topics. [K] topics remain. Continue?"
Multiple focus areas: If the orchestrator passed multiple focus areas, each area gets its own independent file budget of 30 files. Track each area's count separately. Files read for a previous focus area are available as context (don't re-read them) but do NOT count against the current area's budget. The session budget (150) is the outer soft limit across all areas combined.
Read each selected file fully. For files over 500 lines, build an index of exported symbols and their doc comments first -- ensure the generated docs reference specific functions and types rather than vague descriptions.
Also read:
When updating a drifted topic (not a stub): You are rewriting sections to match current code — not writing a changelog. Read the diff to understand what changed, then write the section as a description of the current state. Do not carry the diff frame into the output. A reader of the finished doc should have no idea whether a section was written fresh or updated.
Handling migrated topics: If the topic's frontmatter contains migration_source and migration_sections, this topic has reference content from an existing AGENTS.md that was preserved during migration:
migration_source (e.g., AGENTS.md — the original file, untouched by discover)migration_sections"Migration content references [X] but the current code shows [Y]. Using current code."
migration_source and migration_sections from the frontmatter — they've been consumedAGENTS.md.bak sections: [list]."Every topic file MUST follow this exact structure — no exceptions, no alternative layouts:
Write the topic file following this structure:
# [Topic Name]
> [Relevance routing -- 1-2 sentences: what this doc covers and what it doesn't.
> Direct agents elsewhere if this isn't what they need.]
## Key Entry Points
- [file path]: [what it does]
- [command]: [what it does]
- [config file]: [what it configures]
## Patterns & Conventions
[What to follow when writing new code in this area.
Reference specific files, functions, patterns found in the code.
Be concrete: "error handling wraps with fmt.Errorf in pkg/business/"
not "the project uses standard error handling."]
## Gotchas
[What will bite you if you don't know. Things like:
- Implicit ordering dependencies
- Environment variables that must be set
- Files that must not be modified
- Common mistakes and how to avoid them]
## Dependencies & Context
[Deeper understanding: frameworks, design choices, history.
Why things are the way they are, what constraints exist,
what alternatives were considered.]
## Links
- [Related topic file](other-topic.md) -- [why it's relevant]
- [External doc](URL) -- [what it covers]
- [Source file](path) -- [key file for this area]
Content standards:
> This doc covers the Go backend architecture. For frontend React architecture, see [frontend-architecture.md](frontend-architecture.md).cache/kube_cache.go:NewKubeCache()" not "the cache uses an informer-based approach."Every section you generate gets added to inferred_sections in frontmatter. Use slugs scoped by parent heading.
Slug algorithm (GitHub-flavored markdown slugging):
/: parent-slug/child-slugExamples:
## Key Entry Points -> key-entry-points## Patterns & Conventions -> patterns--conventions### Error Handling (under ## Patterns & Conventions) -> patterns--conventions/error-handling## Gotchas -> gotchasinferred_sections:
- id: key-entry-points
heading: "## Key Entry Points"
- id: patterns--conventions/error-handling
heading: "### Error Handling"
- id: gotchas
heading: "## Gotchas"
After drafting each topic, check: are there gaps where the answer would materially change content in OTHER topic files?
Critical gap = the answer affects multiple topics. Ask now (1-2 questions max):
"While documenting [area], I found [observation]. This affects how I document [other topics]. Can you clarify: [specific question]?"
Non-critical gap = the answer only affects this one topic. Queue it for the wrap-up pass.
You MUST ask exactly one question per topic. This is not optional. Scan the content you just drafted and identify the most significant architectural choice — the one where the "why" is least obvious from the code alone.
Fallback if nothing seems unusual: ask about the most significant technology or dependency choice in the topic. Every topic has at least one technology choice worth asking about ("Why gorilla/mux over chi?", "Why zerolog over zap?", "Why controller-runtime over client-go directly?").
Ask ONE question via AskUserQuestion:
Question: "While documenting [topic], I noticed [specific observation]. Why this approach?" Options:
The user can select an option or choose "Other" to provide a free-text explanation.
If the user provides an explanation: incorporate the answer into the Dependencies & Context or Gotchas section of the topic content. Per HARD RULE #4, remove the top-level section slug from inferred_sections (but keep subsection slugs). When extracting claims in Step 11, create a claim with provenance:
provenance:
origin: user
context: "<the user's answer>"
recorded: "<today's date>"
If the user skips: move on, no friction.
Heuristic — what to ask about:
Do NOT ask about: standard patterns, conventional dependency choices, obvious language idioms.
If in focus/SME mode, identify 3-5 non-obvious patterns in the code you just read for this topic. HARD RULE: ask them sequentially via AskUserQuestion — one question at a time, never batched. Do NOT make multiple AskUserQuestion calls in parallel.
Question types driven by code observations:
Each question uses AskUserQuestion with descriptive options. The user can select an option or choose "Other" for a free-text answer.
Follow-up cap: maximum 1 follow-up per answered question ("What was tried before?" / "What's fragile about this?"). Then move to the next question.
Total interaction budget: at most 10 interactions per topic (5 questions + 5 follow-ups). Users will skip some; typical is 6-8.
Do NOT ask: "What does [function] do?" — the code answers that. Ask "why", not "what."
Incorporate answers into Gotchas and Dependencies & Context sections. Per HARD RULE #4, remove the top-level section slug from inferred_sections for each section that received user input (keep subsection slugs). When extracting claims in Step 11, create claims from user answers with provenance:
provenance:
origin: user
context: "<the user's answer>"
recorded: "<today's date>"
For each topic: These scores are non-negotiable for draft output:
100 — the content was just generated from current code. Not a judgment call.inferred_sections / total sections) x 100. If no user answered any design decision questions during this draft, the score is 0. If the user provided answers and sections were removed from inferred_sections per HARD RULE #4, the score reflects that immediately.Replace the default watch_paths (set by discover) with the precise paths of files you actually read. This makes future drift detection accurate.
Write the complete topic file with:
Immediately after writing the topic file, extract 15-20 factual claims from the content you just wrote. Do this while the content is fresh in context — do NOT defer to a batch step later.
Claim ID scheme: <topic-slug>-<sequential-number> (e.g., backend-architecture-1, graph-engine-3). IDs increment forever — never reuse a retired ID.
ID stability: Before extracting, read existing .claims.yml if it exists. Match new claims to existing ones by exact match on {type, topic} and first 50 characters of the claim text. Matched claims keep their existing ID. Only genuinely new claims get the next sequential ID for that topic. When assigning new sequential IDs, skip any IDs in _retired_ids for that topic. For existing claims without an id field, assign IDs on first read.
Use only these five types:
| Type | What to extract |
|---|---|
technology | Named technology/framework/library choices |
pattern | Architectural or code patterns |
data_flow | How data moves between components |
boundary | System boundaries and ownership |
constraint | Rules, invariants, requirements |
Provenance: Every claim gets a provenance field using block YAML structure (never inline):
claims:
- id: backend-architecture-1
type: technology
topic: backend-architecture
claim: "HTTP routing uses gorilla/mux"
source: "routing/routes.go"
provenance:
origin: inferred
- id: backend-architecture-12
type: technology
topic: backend-architecture
claim: "PostgreSQL chosen over MongoDB for ACID support"
source: "internal/store/postgres.go"
provenance:
origin: user
context: "MongoDB rejected for lack of ACID; SQLite rejected for no vector search"
recorded: "2026-05-04"
origin: inferred (no context or recorded)origin: user, context captures the reasoning, recorded is the dateprovenance default to origin: inferred for all purposesWhen a claim is deleted (e.g., decision drift resolution), add its ID to a _retired_ids list in .claims.yml to prevent reuse. When assigning new sequential IDs, always skip any IDs in _retired_ids for that topic.
Append claims to docs/agents/.claims.yml. Include _meta with the topic's current git SHA as <topic>_extracted_at.
After writing each topic and extracting claims, run this checklist:
## headings: Key Entry Points, Patterns & Conventions, Gotchas, Dependencies & Context, Linksfreshness: 100 and human_input is calculated (0 if no user input, higher if sections were removed from inferred_sections).claims.yml for this topic# headinginferred_sections (HARD RULE #4)If any check fails, fix it before moving to the next topic.
Update .scribe/session.json with this topic's status as complete and the count of files read.
Present all queued non-critical questions to the user. These are questions where the answer only affects the single topic they belong to. Document answers into the relevant topic files. Remove answered sections from inferred_sections.
After all topics are drafted, regenerate docs/agents/STATUS.md (full overwrite):
.claims.yml for claim counts and any contradictionsIf during analysis you discover the topic structure was wrong (e.g., a services/ directory doesn't contain independent services), propose a revision:
"During analysis I found [observation]. I recommend merging
services.mdintoarchitecture.md. Proceed?"
After all topics are processed and STATUS.md is regenerated, check that the repo's four required standard files exist and have substantive content. Run this block once per draft invocation, not per topic.
For each of README.md, CONTRIBUTING.md, ARCHITECTURE.md at the repo root (AGENTS.md is already managed by the orchestrator — skip it here):
For each file classified as missing or thin, ask via AskUserQuestion one at a time, sequentially (do not batch). Prompt order: README.md → CONTRIBUTING.md → ARCHITECTURE.md.
Question: "README.md is [missing / thin (~N lines)]. Should I generate it?"
Options:
"Yes — generate README.md" — description: "I'll draft content based on the codebase and context already loaded.""No — skip README.md" — description: "Leave this file as-is."Wait for each answer before asking about the next file. If the user selects "Yes", generate and write the file (Step C). If "No", skip and move on.
Use only context already in scope — source files read during topic drafting, AGENTS.md, build config files read in Phase 0, and .claims.yml. Do not read additional source files; stay within the session budget.
README.md
# <Project Name>
<1-3 sentence description pulled from AGENTS.md or existing README.>
## Quick Start
<Minimal build/run commands from Makefile, package.json, Cargo.toml, etc. already read.>
## Documentation
<Links to docs/agents/ topic files that exist. Include one-line description per link.>
- [AGENTS.md](AGENTS.md) — quick reference hub for commands and architecture overview
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md).
docs/agents/ that exist. Use each topic's blockquote TL;DR as the link description.CONTRIBUTING.md
# Contributing
## Development Setup
<Steps to set up a local dev environment, derived from build files already read.>
## Running Tests
<Test commands from Makefile, package.json, etc.>
## Submitting Changes
1. Fork the repo and create a branch from `main`.
2. Make your changes with tests where applicable.
3. Run tests and ensure they pass.
4. Open a pull request with a clear description of what changed and why.
## Code Style
<Language-specific conventions observed in source files, e.g. gofmt/golangci-lint, eslint/prettier, ruff.>
ARCHITECTURE.md
ARCHITECTURE.md is a thin navigation hub — it does not duplicate content. All real architecture detail lives in docs/agents/.
# Architecture
> This file is a navigation index. For detailed documentation, follow the links below.
## Documentation Index
- [<Topic Title>](docs/agents/<name>.md) — <TL;DR blockquote from the topic file>
- ...
## Quick Reference
See [AGENTS.md](AGENTS.md) for commands, build instructions, and a directory overview.
Rules for ARCHITECTURE.md:
# heading). If the topic is still a stub, use its description instead.docs/agents/architecture.md) — they will become valid after drafting completes.docs/agents/.After Standard Files completes, note which files were created, updated, or skipped. This is reported in the orchestrator's Step 13 summary.
Skip this section when in rework mode — the orchestrator handles scoped re-review via Step 9d.
After all topics in this batch are drafted, STATUS.md regenerated, and Standard Files processed, you MUST run the review orchestration. This is not optional — do not return to the orchestrator or print a summary without doing this first.
Follow Step 9 (Review Orchestration) from the orchestrator command (commands/codebase-scribe.md). For each topic you drafted in this batch:
new_draft if they had scan: null, or check the classification rules in Step 9a)codebase-scribe:scribe-review skill using the Skill tool (NOT the Agent tool, NOT code-reviewer or any other plugin) for each topic that triggers review (Step 9c)Only after the review gate completes for all topics should you return control to the orchestrator.
npx claudepluginhub tommasobagassi/codebase-scribe --plugin codebase-scribeProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.