From agent-flywheel
Rebuild CHANGELOG.md files and release histories from git, tags, releases, and issue trackers. Use when writing changelogs, version timelines, or agent-facing project history summaries.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-flywheel:changelog-md-workmanshipThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- TOC: Core | Problem | Prompt | Quick Start | Modes | Research | Chunking | Structure | Troubleshooting | Validation | References | Tools | Subagents | Self-Validation -->
SELF-TEST.mdassets/CHANGELOG-COVERAGE-LEDGER-TEMPLATE.mdassets/CHANGELOG-RESEARCH-CHUNK-TEMPLATE.mdassets/CHANGELOG-RESEARCH-OVERVIEW-TEMPLATE.mdassets/CHANGELOG-RESEARCH-TEMPLATE.mdassets/CHANGELOG-TEMPLATE-HUGE.mdassets/CHANGELOG-TEMPLATE.mdreferences/COMMAND-RECIPES.mdreferences/EXAMPLES.mdreferences/LINKING-RULES.mdreferences/PROMPTS.mdreferences/QUALITY-BAR.mdreferences/QUICK-REFERENCE.mdreferences/RESEARCH-WORKFLOW.mdreferences/SECTION-TEMPLATES.mdreferences/TRACKER-ADAPTERS.mdreferences/TROUBLESHOOTING.mdreferences/ULTRA-LARGE-REPOS.mdscripts/bootstrap-changelog-workdir.shscripts/build-version-spine.pyCore Insight: A real changelog is a research artifact. If the history work is weak, the prose is fake.
Most changelogs fail in one of two ways:
The job is to build an orientation layer that lets another agent answer:
Never draft a serious changelog from memory. Research exhaustively, then write incrementally while the evidence is still in hand.
For large repos, do not wait until the end to write CHANGELOG.md. After each research chunk, update:
That is how you survive long histories without losing findings to context pressure.
Create or rebuild a serious CHANGELOG.md for this project.
Requirements:
1. Research the real history first: git commits, tags, releases, issue tracker, and existing docs.
2. Cover the requested scope window completely, from the beginning if needed.
3. Distinguish actual GitHub Releases from plain git tags.
4. Use live links for representative commits and version pages.
5. Include issue-tracker workstreams when available.
6. Organize by landed capabilities, not raw diff order, but keep a clear version timeline.
7. For large histories, split research into chunks and update CHANGELOG.md incrementally after each chunk.
8. Make it agent-friendly: another agent should be able to understand what changed without reading every diff.
Output:
- A canonical CHANGELOG.md
- A short note describing the evidence sources used
# 1. Read the repo's intent and rules first
cat AGENTS.md README.md 2>/dev/null
# 2. Create a compaction-resistant worklog immediately
touch CHANGELOG_RESEARCH.md
# 3. Build the version spine
git for-each-ref refs/tags --sort=creatordate --format='%(refname:short)%x09%(creatordate:short)%x09%(subject)'
gh release list --limit 100
# 4. Get early and recent history
git log --reverse --oneline --decorate=no --no-merges | head -n 50
git log --oneline --decorate=no --no-merges --max-count 120
# 5. Start writing the changelog skeleton early
cp .claude/skills/changelog-md-workmanship/assets/CHANGELOG-TEMPLATE.md CHANGELOG.md 2>/dev/null || true
cp .claude/skills/changelog-md-workmanship/assets/CHANGELOG-RESEARCH-TEMPLATE.md CHANGELOG_RESEARCH.md 2>/dev/null || true
1. Read AGENTS.md and README.md first.
2. Create CHANGELOG_RESEARCH.md immediately.
3. Gather the version spine: tags, releases, dates.
4. Slice history into chunks if the repo is large.
5. After each chunk, update the live CHANGELOG.md.
6. Finish with validation: dates, links, coverage, and structure.
One-page version: QUICK-REFERENCE.md
Bootstrap script: scripts/bootstrap-changelog-workdir.sh [repo-dir]
| Mode | Time | Depth | Use When |
|---|---|---|---|
| Small update | 10-20 min | Single version or narrow window | Recent release notes, point update |
| Standard rebuild | 30-90 min | Full version spine + capability waves | Mid-size repo or partial rewrite |
| Large-history reconstruction | Multi-pass | Chunked sequential research | Long-lived or sprawling project |
| Ultra-large history reconstruction | Multi-pass + staged artifacts | Chunk files, coverage ledger, automation aids | Histories that obviously exceed one context window |
If the repo is large enough that you cannot confidently hold the history in context, use chunked reconstruction immediately. Do not try to "just be more careful."
Huge-repo playbook: ULTRA-LARGE-REPOS.md
Trust sources in this order:
If sources disagree, history wins.
For a serious changelog, gather at least:
You are not listing every commit. You are identifying:
Link discipline and evidence rules: LINKING-RULES.md Quality bar by repo size: QUALITY-BAR.md Tracker source handling: TRACKER-ADAPTERS.md
Large repos must be researched sequentially in bounded slices. Good chunk boundaries:
CHANGELOG_RESEARCH.md or CHANGELOG_RESEARCH/NN-*.md before deep history work.CHANGELOG.md skeleton early.If you delay writing until all research is done, you will lose detail and create slop.
Chunking procedure, command patterns, and stopping rules: RESEARCH-WORKFLOW.md
For large histories, maintain a ledger in the research memo:
This prevents silent gaps and duplicate coverage.
For most substantial repos, this structure works best:
The strongest section shape is:
Delivered capabilityClosed workstreamsRepresentative commitsCopy-paste templates: SECTION-TEMPLATES.md
Templates and scaffolds: SECTION-TEMPLATES.md Command recipes: COMMAND-RECIPES.md
Three rules matter a lot:
.beads/issues.jsonl, link to that record instead of broad repo search when possible.Direct examples and rules: LINKING-RULES.md
| Don't | Do |
|---|---|
| Write from memory | Gather evidence first |
| Dump commits chronologically | Build a version spine + thematic synthesis |
| Link naked hashes | Link live commit URLs |
| Pretend every tag is a release | Distinguish Releases, tags, and drafts |
| Wait until all research is done | Update the changelog after each chunk |
| Use generic tracker links | Scope to the real tracker record |
| Write vague summaries | Name the capability, fix, or regression concretely |
| Treat generated release notes as canonical history | Keep CHANGELOG.md separate and durable |
| Problem | Cause | Fix |
|---|---|---|
| The repo has too many commits to hold in context | No chunking strategy | Split by tag/date/epic and maintain a coverage ledger |
| Tag dates and release dates disagree | Tag-only versions or draft releases | Distinguish Releases from plain tags explicitly |
| The tracker links feel noisy or useless | Links are too broad | Scope directly to the real tracker record |
| The changelog feels like fluff | Themes are vague | Restate sections in terms of actual capabilities and fixes |
| The changelog feels like a commit dump | No synthesis layer | Add capability-wave sections above raw history |
| You cannot tell whether coverage is complete | No research memo | Use a durable worklog and mark chunk status |
Deeper fixes: TROUBLESHOOTING.md
Before finishing, check all of this:
Validation details and final audit questions: RESEARCH-WORKFLOW.md
Quality thresholds: QUALITY-BAR.md
Audit script: scripts/validate-changelog-md.py /path/to/CHANGELOG.md
Network verification mode: scripts/validate-changelog-md.py --verify-links /path/to/CHANGELOG.md
If this skill is unclear, these should still obviously trigger it:
| I need to... | Read |
|---|---|
| Start fast | QUICK-REFERENCE.md |
| Run a full research workflow | RESEARCH-WORKFLOW.md |
| Copy-paste a structure/template | SECTION-TEMPLATES.md |
| Choose prompts for repo size | PROMPTS.md |
| Fix a weak or confusing draft | TROUBLESHOOTING.md |
| Understand the quality threshold | QUALITY-BAR.md |
| See distilled design lessons | EXAMPLES.md |
| Get link rules right | LINKING-RULES.md |
| Copy concrete command recipes | COMMAND-RECIPES.md |
| Handle different tracker ecosystems | TRACKER-ADAPTERS.md |
| Run the ultra-large workflow | ULTRA-LARGE-REPOS.md |
| Topic | Reference |
|---|---|
| Chunking long histories | RESEARCH-WORKFLOW.md |
| Version timeline skeletons | SECTION-TEMPLATES.md |
| Capability-wave section structure | SECTION-TEMPLATES.md |
| Release vs tag correctness | LINKING-RULES.md |
| Tracker-link scoping | LINKING-RULES.md |
| Small/medium/huge repo prompts | PROMPTS.md |
| Failure modes and recovery | TROUBLESHOOTING.md |
| Quality thresholds | QUALITY-BAR.md |
| Example patterns to emulate | EXAMPLES.md |
| Exact git/gh/jq recipes | COMMAND-RECIPES.md |
| Tracker normalization across ecosystems | TRACKER-ADAPTERS.md |
| Huge multi-pass scaffolding | ULTRA-LARGE-REPOS.md |
| Asset | Purpose |
|---|---|
assets/CHANGELOG-TEMPLATE.md | Reusable changelog scaffold for a new repo |
assets/CHANGELOG-RESEARCH-TEMPLATE.md | Durable research memo scaffold for chunked history work |
assets/CHANGELOG-TEMPLATE-HUGE.md | Huge-repo changelog scaffold with explicit multi-pass structure |
assets/CHANGELOG-RESEARCH-CHUNK-TEMPLATE.md | Reusable chunk file scaffold for staged history reconstruction |
assets/CHANGELOG-COVERAGE-LEDGER-TEMPLATE.md | Coverage ledger scaffold to prevent silent research gaps |
| Tool | Purpose |
|---|---|
git log, git for-each-ref | Commit and tag history |
gh release list, gh issue list | Release and issue metadata |
jq | Checked-in tracker mining |
br / bv | Beads-style issue history and planning context |
scripts/bootstrap-changelog-workdir.sh | Bootstrap CHANGELOG.md and CHANGELOG_RESEARCH.md from templates |
scripts/build-version-spine.py | Generate a markdown or JSON version timeline skeleton from local tags and GitHub releases |
scripts/extract-tracker-workstreams.py | Normalize tracker evidence from beads, GitHub Issues, Linear/Jira exports, and milestone docs |
scripts/cluster-history.py | Group commit history into candidate capability waves for faster thematic drafting |
scripts/validate-changelog-md.py | Audit a finished changelog for structural and evidence problems |
assets/CHANGELOG-TEMPLATE.md | Faster changelog bootstrap |
assets/CHANGELOG-RESEARCH-TEMPLATE.md | Faster research-memo bootstrap |
assets/CHANGELOG-TEMPLATE-HUGE.md | Huge-repo changelog bootstrap |
assets/CHANGELOG-RESEARCH-CHUNK-TEMPLATE.md | Huge-repo chunk bootstrap |
assets/CHANGELOG-COVERAGE-LEDGER-TEMPLATE.md | Huge-repo coverage-ledger bootstrap |
Run scripts directly (they have shebangs) rather than invoking python manually.
Useful searches during changelog work:
# Early history
git log --reverse --oneline --decorate=no --no-merges | head -n 50
# Recent history
git log --oneline --decorate=no --no-merges --max-count 120
# Tags
git for-each-ref refs/tags --sort=creatordate --format='%(refname:short)%x09%(creatordate:short)%x09%(subject)'
# Releases
gh release list --limit 100
# Beads-style tracker history
jq -r 'select(.status=="closed") | [.id,.title,.closed_at] | @tsv' .beads/issues.jsonl
# Tracker normalization
scripts/extract-tracker-workstreams.py --repo . --format markdown
# Capability-wave clustering
scripts/cluster-history.py --repo . --format markdown
| Subagent | Purpose |
|---|---|
subagents/history-researcher.md | Research one bounded historical slice and return changelog-ready findings |
subagents/draft-auditor.md | Review a changelog draft for missing coverage, weak synthesis, and evidence issues |
Validate the skill itself:
./scripts/validate-skill.py .claude/skills/changelog-md-workmanship/
Trigger tests: SELF-TEST.md
This skill is intentionally larger than the usual "concise skill" target.
That is deliberate:
npx claudepluginhub burningportra/claude-orchestrator --plugin agent-flywheelProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.