From build-second-brain
Analyze one or more git repositories commit-by-commit from the very first commit using maximum parallel agents to extract engineering patterns, architecture decisions, debugging approaches, scaling strategies, and coding conventions — then build a structured "second brain" knowledge base with a personalized engineer profile and hybrid global/local Claude memory injection. Supports multi-repo analysis (frontend + backend, microservices) to capture cross-repo patterns. Use this skill when the user mentions "second brain", "analyze my repo", "extract my patterns", "learn from my commits", "build my brain", "reverse engineer my thinking", "learn how I code", "analyze git history", or wants to capture their engineering decision-making from a codebase. Also trigger when the user wants to create an engineer profile, extract architecture patterns from commits, build a knowledge base from code history, or analyze multiple repositories together.
How this skill is triggered — by the user, by Claude, or both
Slash command
/build-second-brain:build-second-brainThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract an engineer's thinking patterns, architecture decisions, and coding philosophy from their git history — commit by commit, using maximum parallel agents — and build a structured knowledge base + engineer profile.
Extract an engineer's thinking patterns, architecture decisions, and coding philosophy from their git history — commit by commit, using maximum parallel agents — and build a structured knowledge base + engineer profile.
This skill runs a three-phase pipeline:
Every agent writes to disk immediately. Nothing is kept only in memory. If context compresses or an agent crashes, all work persists in scratchpad files and can be resumed.
This skill uses hooks and CronCreate loops to enforce spec compliance during execution — not just after the fact.
| Hook | Event | What It Enforces |
|---|---|---|
validate-write-paths.sh | PreToolUse (Write/Edit) | Blocks relative paths to .second-brain/ or second-brain/. Blocks batch files missing REPO_ID prefix. |
| Inline bash | PreToolUse (Bash) | Warns on relative paths in bash commands targeting build directories. |
validate-scratchpad-output.sh | PostToolUse (Write) | Validates scratchpad batch files have ## Commit: headers and ### Category Tags sections. |
validate-agent-completion.sh | SubagentStop | Reports current scratchpad/category file counts after every agent completes. |
| Inline bash | Stop | Blocks session stop if profile is missing or progress.md has unchecked items. |
| Inline bash | PreCompact | Reminds to re-read config.md and progress.md after context compaction. |
These hooks are conditionally active — they only fire when $CLAUDE_PROJECT_DIR/.second-brain/config.md exists (i.e., during an active build). Outside of a build, they silently exit 0.
| Loop | Phase | What It Monitors |
|---|---|---|
HARVEST_CRON_ID | Phase 1 | Counts scratchpad files every 2 min, reports % complete |
CATEGORIZE_CRON_ID | Phase 2 | Counts category files every 2 min, reports N/12 complete |
Both cron jobs must be cancelled with CronDelete when their phase completes (success or failure).
IMPORTANT — Persist cron IDs to disk so they can be recovered after a crash or context compression:
echo "Harvest Cron ID: <ID>" >> "$WORK_DIR/config.md"
echo "Categorize Cron ID: <ID>" >> "$WORK_DIR/config.md"
On resume, read cron IDs from config.md to cancel orphaned jobs with CronDelete.
After the build completes, run the verification script for a comprehensive check:
$PYTHON_CMD <SKILL_DIR>/scripts/verify.py "$WORK_DIR" "$OUTPUT_DIR"
Where $PYTHON_CMD is the Python command discovered during preflight (python3 or python). If Python was not available, skip verification.
This checks 12 test groups: config fields, batch naming, commit coverage, profile quality, and more.
Ask the user for three things (or accept as arguments):
/path/to/my-app/path/to/backend, /path/to/frontendhybrid (default, recommended): Core identity goes global, repo-specific patterns stay localglobal: Everything goes to global memory (available in all projects)local: Everything stays in the current project's memory onlyThen run preflight checks for each repo:
# For each repo path:
REPO_PATH="$(cd <repo_path> && pwd)"
REPO_ID="$(basename "$REPO_PATH")" # e.g., "my-backend"
git -C "$REPO_PATH" rev-list --count HEAD
git -C "$REPO_PATH" rev-parse HEAD # Record HEAD hash
# Verify Python is available (needed for indexer + verify scripts)
# Try python3 first, then python. Store whichever works as PYTHON_CMD.
if python3 --version 2>/dev/null; then
PYTHON_CMD="python3"
elif python --version 2>/dev/null; then
PYTHON_CMD="python"
else
PYTHON_CMD=""
fi
Preflight gates:
git rev-list fails: warn the user and skip that repo (continue with the remaining repos). If ALL repos fail, abort.PYTHON_CMD is empty (Python not found): warn user with install guidance:
"Python 3.7+ is required for the indexer and verification scripts but was not found. Install it from https://python.org or via your package manager (
brew install python3/apt install python3/winget install Python.Python.3). I can fall back to a less-accurate bash indexer, but verification will be skipped. Continue anyway?"
PYTHON_CMD in config.md so downstream phases know which command to use.For each repo, scan for planning/design/documentation artifacts that reveal thinking philosophy beyond code. These are gold mines — they show how the person thinks about products, breaks down problems, and communicates decisions.
# For each repo, discover artifact files (docs, planning, memory, configs)
ARTIFACT_DIRS=()
for dir in docs .planning .claude .github; do
[ -d "$REPO_PATH/$dir" ] && ARTIFACT_DIRS+=("$REPO_PATH/$dir")
done
# Also find root-level markdown files (README, CONTRIBUTING, CLAUDE.md, etc.)
ROOT_MDS=$(find "$REPO_PATH" -maxdepth 1 -name "*.md" -type f 2>/dev/null)
# Find nested docs directories (e.g., src/docs/, packages/*/docs/)
NESTED_DOCS=$(find "$REPO_PATH" -maxdepth 3 -type d -name "docs" 2>/dev/null)
What to scan for (ordered by value):
docs/ — design specs, requirements, ADRs, architecture decision records.planning/ — GSD roadmaps, phase plans, PROJECT.md, research files.claude/ — memory files, CLAUDE.md (reveals non-negotiables and workflow preferences).github/ — PR templates, issue templates (reveals process expectations).md files — README, CONTRIBUTING, CHANGELOG (reveals communication style)CLAUDE.md / AGENTS.md / GEMINI.md — project instructions (reveals coding philosophy)Chronological ordering: For each discovered artifact, use git log --format="%ai" --diff-filter=A -- <file> to find when it was first created. Sort artifacts by creation date — this reveals the order of thinking (what was important enough to document first).
Store the discovered artifact manifest in $WORK_DIR/artifacts.md:
# Discovered Artifacts
Repo: <REPO_ID>
| File | Created | Last Modified | Type |
|------|---------|---------------|------|
| docs/specs/auth-design.md | 2024-01-15 | 2024-03-20 | design-spec |
| .planning/PROJECT.md | 2024-02-01 | 2024-06-15 | planning |
| CLAUDE.md | 2024-03-01 | 2024-07-01 | project-instructions |
Report the full discovery:
"Found N repos with X total commits (repo-a: 500, repo-b: 300) and Y artifact files (design specs, planning docs, project instructions). I'll analyze commits in batches of Z with parallel agents, plus harvest all artifacts for philosophy and product thinking. Memory scope: hybrid. Estimated token usage: ~3M-6M per 1000 commits. Proceed?"
Wait for confirmation.
IMPORTANT — Resolve absolute paths now and use them everywhere:
WORK_DIR="$(pwd)/.second-brain"
OUTPUT_DIR="$(pwd)/second-brain"
BRAIN_NAME="<user's chosen name>"
SCOPE="hybrid" # or global or local
BATCH_SIZE=20 # or 50 if >5000 total commits
# Per repo (build arrays):
REPO_PATHS=("<abs_path_1>" "<abs_path_2>" ...)
REPO_IDS=("<repo-id-1>" "<repo-id-2>" ...)
Store these in $WORK_DIR/config.md. ALL subsequent steps, agent prompts, and bash commands MUST use these absolute paths — never relative paths like .second-brain/ or second-brain/.
Before initializing, check if $WORK_DIR/progress.md already exists:
test -f "$WORK_DIR/progress.md" && echo "RESUME" || echo "FRESH"
If resuming, read progress.md and determine the current state:
[x] checkmarks[x] checkmarks[x] checkmarksAsk: "Found existing progress — Phase N in progress, X% complete. Resume? (y/n)"
Create the working directory structure:
mkdir -p "$WORK_DIR/scratchpad" "$WORK_DIR/indexed" "$WORK_DIR/categories"
Write $WORK_DIR/config.md:
# Second Brain Config
Brain Name: <BRAIN_NAME>
Work Dir: <WORK_DIR> (absolute)
Output Dir: <OUTPUT_DIR> (absolute)
Memory Scope: <SCOPE> (hybrid/global/local)
Batch Size: <BATCH_SIZE>
Started: <timestamp>
## Repos
| Repo ID | Path | Commits | HEAD Hash |
|---------|------|---------|-----------|
| <repo-id-1> | <abs_path_1> | <N1> | <hash1> |
| <repo-id-2> | <abs_path_2> | <N2> | <hash2> |
Total Commits: <N1+N2+...>
Total Batches: <ceil(N1/BATCH_SIZE) + ceil(N2/BATCH_SIZE) + ...>
Python Command: <$PYTHON_CMD>
For each repo, extract commits into a separate file using git -C:
# For each repo:
git -C "$REPO_PATH" log --format="%H|%s|%ai" --reverse > "$WORK_DIR/<REPO_ID>-commits.txt"
Write initial progress.md with one checkbox line per batch per repo, per category, per synthesis step. Use the format from references/progress-template.md but replace pseudocode with actual generated lines. Group Phase 1 batches by repo.
This is the most critical phase. Every commit gets analyzed. For multi-repo, harvest each repo independently — spawn a team/wave per repo, or interleave batches from all repos into a single team's task list.
Try Agent Teams for self-balancing workload:
TeamCreateTodoWrite — each task specifies repo ID, repo path, batch number, start line, end line in <REPO_ID>-commits.txtThe prompt for each teammate is in references/harvest-agent-prompt.md. Read it and use it as the agent prompt, filling in these absolute paths:
REPO_PATH: absolute path to the repo (from the task — varies per batch in multi-repo)REPO_ID: short identifier for the repo (e.g., "my-backend")COMMITS_FILE: absolute path to $WORK_DIR/<REPO_ID>-commits.txtSCRATCHPAD_DIR: absolute path to $WORK_DIR/scratchpad/BATCH_SIZE: the configured batch sizeBATCH_ASSIGNMENT: For Teams mode: "Claim tasks from the team's shared task list using TodoWrite. Each task specifies a repo ID, repo path, batch number, and line range. Mark each task complete after writing its scratchpad file."If TeamCreate fails, fall to Fallback Mode.
Spawn background subagents in waves of 5:
run_in_background: trueTaskOutput to poll each agent ID until all 5 completeBATCH_ASSIGNMENT with explicit batch numbers and repo IDs: "Process: my-backend batch 1, my-frontend batch 2, ..."After launching harvest agents, set up a progress monitor using CronCreate:
Expression: "*/2 * * * *"
Prompt: "Run: ls <LITERAL_WORK_DIR>/scratchpad/batch-*.md 2>/dev/null | wc -l
Read <LITERAL_WORK_DIR>/config.md for total batches.
Calculate and report percentage.
If percentage >= 100, report HARVEST COMPLETE."
Recurring: true
IMPORTANT: Replace <LITERAL_WORK_DIR> with the actual resolved absolute path (e.g., /home/user/project/.second-brain). Do NOT pass shell variables like $WORK_DIR — cron prompts execute in a separate context where those variables don't exist.
Store the cron job ID as HARVEST_CRON_ID — write it to $WORK_DIR/config.md immediately so it can be recovered after a crash.
When using wave-based agents, enforce a strict completion loop for each wave:
WAVE = 1
while batches remain:
1. Spawn up to 5 background agents for this wave
2. Collect all agent IDs into WAVE_AGENT_IDS
3. POLL LOOP: For each agent ID in WAVE_AGENT_IDS:
- Call TaskOutput with the agent ID
- If agent still running, continue polling
- If agent complete, mark its batches done in progress.md
4. VERIFY LOOP: After all 5 agents complete:
- For each expected batch file: check file exists AND has correct ## Commit: count
- If any batch missing or incomplete → re-spawn ONLY that batch
- Repeat verify until all batches for this wave pass
5. Update progress.md: mark all wave batches [x]
6. WAVE += 1
The PostToolUse hook on Write validates scratchpad structure automatically — if a batch file is written without ## Commit: headers or ### Category Tags sections, the hook raises a warning.
After all agents complete, verify per repo:
batch-<REPO_ID>-*.md files for each repo — must equal that repo's batch count## Commit: headers — flag any files with fewer commits than expected batch size (last batch may have fewer)HARVEST_CRON_ID with CronDeleteAfter commit harvesting, mine the non-code artifacts for philosophy, product thinking, and planning patterns. These files reveal HOW someone thinks — not just what they coded.
If no artifacts were discovered in preflight, skip this step.
Spawn a background agent using the prompt from references/artifact-harvest-prompt.md. Fill in:
ARTIFACTS_MANIFEST: absolute path to $WORK_DIR/artifacts.mdSCRATCHPAD_DIR: absolute path to $WORK_DIR/scratchpad/REPO_LIST: from config (all repo IDs and absolute paths — agent needs these for git log --follow)BRAIN_NAME: from configThe artifact agent reads each discovered file, extracts philosophy/product-thinking/planning patterns, and writes findings to scratchpad files named artifacts-<REPO_ID>.md using the same category tag format as commit findings (so the indexer picks them up automatically).
Key difference from commit harvest: Artifacts are read directly (not via git show). But the agent also checks git log --follow -- <file> to understand how each artifact evolved over time — what was added, removed, or restructured reveals changing priorities.
Poll with TaskOutput until complete. Verify artifacts-<REPO_ID>.md files exist in scratchpad for each repo that had artifacts.
Split scratchpad findings by category tag so each Phase 2 agent reads only its relevant content. The indexer processes BOTH commit findings (batch-*.md) AND artifact findings (artifacts-*.md) from the scratchpad directory.
Locate the indexer script — it ships with this skill at:
skills/build-second-brain/scripts/indexer.py
Resolve its absolute path by checking the skill's installation directory.
$PYTHON_CMD "$SKILL_DIR/scripts/indexer.py" "$WORK_DIR/scratchpad" "$WORK_DIR/indexed"
Where $SKILL_DIR is the resolved absolute path to the skills/build-second-brain/ directory.
If the exact script path cannot be resolved, or Python is unavailable, do it with bash:
for category in architecture tech-stack debugging scaling security data-modeling code-style refactoring integration error-handling product-thinking workflow; do
grep -B 100 "### Category Tags" "$WORK_DIR"/scratchpad/{batch,artifacts}-*.md | \
grep -A 100 "$category" > "$WORK_DIR/indexed/${category}-raw.md" 2>/dev/null || true
done
Note: The bash fallback is approximate. The Python indexer is preferred for accuracy.
The indexer also produces $WORK_DIR/indexed/statistics-raw.md with pre-computed counts per category, total commits parsed, and commits per month. This feeds into Phase 3.
Spawn category agents in two waves of 6 (not all 12 at once — prevents resource exhaustion):
Wave 1: architecture, tech-stack, debugging, scaling, security, product-thinking Wave 2: data-modeling, code-style, refactoring, integration, error-handling, workflow
Each agent's prompt is templated from references/category-agent-prompt.md. Fill in:
CATEGORY_NAME: the category (e.g., "Architecture")CATEGORY_SLUG: the slug (e.g., "architecture")CATEGORY_DESCRIPTION: what to extract (from the table below)INDEXED_FILE: absolute path to $WORK_DIR/indexed/<slug>-raw.mdOUTPUT_FILE: absolute path to $WORK_DIR/categories/<slug>.mdBRAIN_NAME: from config| Category | Slug | Description (for CATEGORY_DESCRIPTION) |
|---|---|---|
| Architecture | architecture | System structure decisions, module boundaries, service splits, folder organization, layering patterns |
| Tech Stack | tech-stack | Library/framework/tool choices and the reasoning behind picking one over alternatives |
| Debugging | debugging | Bug patterns, root causes, diagnostic steps taken, fix approaches, what broke and why |
| Scaling | scaling | Queues, workers, caching strategies, async patterns, load handling, performance optimizations |
| Security | security | Auth mechanisms, input validation, sanitization, access control, secrets management |
| Data Modeling | data-modeling | Schema design, migration patterns, relationships, indexes, query optimization |
| Code Style | code-style | Naming conventions, file structure, import patterns, code organization, formatting rules |
| Refactoring | refactoring | What was messy before, why it was cleaned up, before/after patterns, triggers for refactoring |
| Integration | integration | External API connections, webhook handling, third-party service patterns, SDK usage |
| Error Handling | error-handling | Retry logic, fallbacks, circuit breakers, logging strategies, monitoring, alerting |
| Product Thinking | product-thinking | Feature scoping, requirements analysis, trade-off decisions, roadmap priorities, user-centric design choices, what was built vs explicitly rejected |
| Workflow | workflow | Planning patterns, process decisions, communication style, documentation habits, tool/system choices, review processes, how work is broken down and sequenced |
Set up a second progress monitor before spawning Wave 1:
Expression: "*/2 * * * *"
Prompt: "Run: ls <LITERAL_WORK_DIR>/categories/*.md 2>/dev/null | wc -l
Expected: 12 category files.
Report: N/12 categories complete.
If N >= 12, report CATEGORIZE COMPLETE."
Recurring: true
Store as CATEGORIZE_CRON_ID — write to $WORK_DIR/config.md for crash recovery.
For each wave, enforce strict completion:
WAVE 1 (architecture, tech-stack, debugging, scaling, security, product-thinking):
1. Spawn 6 background agents, collect agent IDs
2. POLL LOOP: TaskOutput each agent ID until all 6 complete
3. VERIFY: Check all 6 category files exist in $WORK_DIR/categories/
4. RE-SPAWN any missing category agents
5. Update progress.md: mark Wave 1 categories [x]
WAVE 2 (data-modeling, code-style, refactoring, integration, error-handling, workflow):
Same loop as Wave 1
The SubagentStop hook fires after each agent completes — it reports current scratchpad and category file counts automatically.
After both waves:
CATEGORIZE_CRON_ID with CronDeleteRun 3 agents sequentially — each depends on the previous, so you MUST wait for each to complete before spawning the next:
1. Spawn Agent 1 (Brain Builder) with run_in_background: true
2. Poll with TaskOutput until Agent 1 completes
3. VERIFY: Check $OUTPUT_DIR/patterns/ and $OUTPUT_DIR/decisions/ exist
4. Spawn Agent 2 (Profile Generator) with run_in_background: true
5. Poll with TaskOutput until Agent 2 completes
6. VERIFY: Check $OUTPUT_DIR/profile/engineer-profile.md exists and has 100+ lines
7. Spawn Agent 3 (Memory Injector) with run_in_background: true
8. Poll with TaskOutput until Agent 3 completes
9. VERIFY: Check memory files were written
10. Update progress.md: mark all Phase 3 steps [x]
Before spawning any Phase 3 agents, generate the commit log and create the output directory:
mkdir -p "$OUTPUT_DIR/raw"
grep "^## Commit:\|^Repo:\|^Message:" "$WORK_DIR"/scratchpad/batch-*.md > "$OUTPUT_DIR/raw/commit-log.md"
Read the prompt from references/brain-builder-prompt.md. Fill in:
CATEGORIES_DIR: absolute path to $WORK_DIR/categories/OUTPUT_DIR: absolute path (from config — the resolved $OUTPUT_DIR)BRAIN_NAME: from configREPO_LIST: from config (all repo IDs and paths)TOTAL_COMMITS: from configSTATISTICS_FILE: absolute path to $WORK_DIR/indexed/statistics-raw.mdImportant: The Brain Builder should NOT read scratchpad files directly. raw/commit-log.md was already generated in the pre-step above. For raw/statistics.md, use the pre-computed statistics-raw.md from the indexer.
Read the prompt from references/profile-generator-prompt.md. Fill in:
CATEGORIES_DIR: absolute pathOUTPUT_DIR: absolute pathBRAIN_NAME, REPO_LIST, TOTAL_COMMITS: from configRead the prompt from references/memory-injector-prompt.md. Fill in:
PROFILE_FILE: absolute path to $OUTPUT_DIR/profile/engineer-profile.mdPATTERNS_DIR: absolute path to $OUTPUT_DIR/patterns/DECISIONS_FILE: absolute path to $OUTPUT_DIR/decisions/tech-decisions.mdBRAIN_NAME: from configSCOPE: from config (hybrid, global, or local)Resolve memory paths BEFORE spawning this agent:
IMPORTANT: Claude Code has NO global memory directory (~/.claude/memory/ does not exist). Global identity must go to ~/.claude/CLAUDE.md which IS loaded in every session.
# Global: ~/.claude/CLAUDE.md (loaded in every Claude Code session)
GLOBAL_CLAUDE_MD="$HOME/.claude/CLAUDE.md"
# Local (project-specific) memory directory
LOCAL_MEMORY_DIR=$(ls -d ~/.claude/projects/*/memory/ 2>/dev/null | head -1)
# If not found, create it in the current project's memory path
if [ -z "$LOCAL_MEMORY_DIR" ]; then
# The project hash is derived from the CWD
LOCAL_MEMORY_DIR="$HOME/.claude/projects/$(pwd | sed 's/[\/\\:]/-/g' | sed 's/^-//')/memory"
mkdir -p "$LOCAL_MEMORY_DIR"
fi
Pass the resolved paths to the agent — all scopes need both paths:
hybrid: pass both GLOBAL_CLAUDE_MD and LOCAL_MEMORY_DIR — writes local files + appends to globalglobal: pass both GLOBAL_CLAUDE_MD and LOCAL_MEMORY_DIR — writes local files + appends to globallocal: pass only LOCAL_MEMORY_DIR — writes local files only, no global changesNote: Even global scope writes local memory files (detailed patterns/decisions) in addition to the global CLAUDE.md summary. The global label means the identity section also goes global, not that local files are skipped.
After all phases complete:
CronDelete for both HARVEST_CRON_ID and CATEGORIZE_CRON_ID$PYTHON_CMD <SKILL_DIR>/scripts/verify.py "$WORK_DIR" "$OUTPUT_DIR"
If Python was unavailable (preflight fallback), skip this step. If verify.py reports failures, fix them before reporting success.progress.md to show 100% completionThe Stop hook will block the session from ending if:
engineer-profile.md doesn't existprogress.md still has unchecked [ ] itemsReport to the user:
BUILD COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
second-brain/ — your full knowledge base
second-brain/profile/engineer-profile.md — "<Brain Name>" engineer DNA
second-brain/patterns/ — engineering patterns (8 categories)
second-brain/philosophy/ — product thinking & workflow patterns
second-brain/playbooks/ — debugging & scaling playbooks
Memory scope: <hybrid/global/local>
Global identity: ~/.claude/CLAUDE.md — core identity (loads everywhere)
Local memory: ~/.claude/projects/... — repo patterns (loads in project)
Repos analyzed: <list of repo IDs>
Commits analyzed: <N total> (<per-repo breakdown>)
Artifacts analyzed: <N files> (design specs, planning docs, project instructions)
Patterns found: <count>
Categories covered: <count>/12
Hooks enforced: path validation, scratchpad structure, batch naming
Verification: verify.py passed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## Commit: headers — if fewer than expected, re-run from the last recorded commit hash. The harvest agent checks for existing entries before re-analyzing.$WORK_DIR/config.md and progress.md to restore state.git show --stat + selective inspection. Binary files are skipped.git show --stat first. If empty, log as "Empty merge." If has changes, analyze selectively.CronDelete.npx claudepluginhub boparaiamrit/build-second-brain --plugin build-second-brainProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.