From docs-sync
This skill should be used when the user says /sync-docs, asks to sync documentation, fetch library docs, download docs for offline/LLM use, cache external documentation, update cached docs, scrape documentation pages, set up docs sources, or initialize documentation sync. Syncs external documentation into docs/external/ as markdown with provenance tracking.
How this skill is triggered — by the user, by Claude, or both
Slash command
/docs-sync:sync-docsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Fetch external documentation into `docs/external/` as markdown files with provenance frontmatter. Designed for LLM consumption — Claude can Read these files instead of fetching docs on-demand.
Fetch external documentation into docs/external/ as markdown files with provenance frontmatter. Designed for LLM consumption — Claude can Read these files instead of fetching docs on-demand.
Change tracking: git diff on docs/external/ shows exactly what changed in upstream docs.
--init — scaffold docs/sources.yaml template + docs/external/ directory{source-name} — sync a single source by name from docs/sources.yamlCheck the argument:
--init → go to Step 2 (Init Mode){source-name} → go to Step 3 with filterCheck if docs/sources.yaml already exists. If it does, ask:
docs/sources.yamlalready exists. Overwrite with template? (This won't delete existingdocs/external/files)
Create docs/sources.yaml:
# Documentation sources for /sync-docs
# Each source needs a `name` and exactly one fetch strategy:
# llms_txt — best option, single file with full docs (check if site has /llms.txt)
# sitemap — discover pages from sitemap.xml, optionally filter by path prefix
# base_url — crawl links from a root page (use with depth)
# urls — explicit list of page URLs
#
# Tip: If docs/external/ grows too large for git, add it to .gitignore
sources: []
# - name: dagster
# llms_txt: https://docs.dagster.io/llms-full.txt
# - name: my-api
# sitemap: https://docs.example.com/sitemap.xml
# filter: /api/v2/
# - name: some-lib
# base_url: https://docs.example.com/guides/
# depth: 2
# - name: specific-pages
# urls:
# - https://docs.example.com/getting-started
# - https://docs.example.com/api-reference
Create docs/external/.gitkeep (empty file).
Report:
Created
docs/sources.yamlanddocs/external/. Edit the config to add your documentation sources, then run/sync-docs.
Stop here — do not proceed to syncing.
Read docs/sources.yaml using the Read tool.
Validate:
/sync-docs --initsources arrayname (kebab-case, unique across sources)llms_txt, sitemap, base_url, or urlsfilter (with sitemap), depth (with base_url, default: 2)If validation fails, report the specific error and stop.
If a {source-name} argument was provided, filter to that source only. If not found, list available sources and ask user to pick.
For each source in the validated list, determine the fetch strategy from the config key and execute the corresponding step:
llms_txt → Step 5sitemap → Step 6base_url → Step 7urls → Step 8After processing all sources, go to Step 9 (Report).
The simplest strategy. Fetches a single file (typically /llms.txt or /llms-full.txt).
Fetch the URL:
curl -sS -L -w '\n%{http_code}' -o /tmp/docs-sync-{name}.txt '{llms_txt_url}'
Check HTTP status (last line of output). If not 200, report error and skip this source.
Compute content hash:
shasum -a 256 /tmp/docs-sync-{name}.txt | cut -d' ' -f1
If docs/external/{name}/ already exists, read the existing file's frontmatter and compare content_hash. If unchanged, report "unchanged" and skip.
Check file size:
wc -c < /tmp/docs-sync-{name}.txt
If > 1MB, warn user:
{name}llms file is {size}MB. Large files may bloat the git repo. Consider addingdocs/external/to.gitignoreif this becomes an issue.
Determine output filename from the URL (e.g., llms-full.txt → llms-full.md, llms.txt → llms.md).
Write to docs/external/{name}/{filename} with frontmatter:
---
source_url: {llms_txt_url}
fetched_at: {ISO 8601 UTC timestamp}
content_hash: sha256:{hash}
fetch_method: llms_txt
---
{file content}
Generate the timestamp with: date -u +%Y-%m-%dT%H:%M:%SZ
Discovers pages from a sitemap.xml and lets the user choose which sections to fetch.
6a. Fetch and parse sitemap:
curl -sS -L '{sitemap_url}' -o /tmp/docs-sync-{name}-sitemap.xml
Extract URLs (macOS-compatible — BSD grep does not support -P):
grep -o '<loc>[^<]*</loc>' /tmp/docs-sync-{name}-sitemap.xml | sed 's/<loc>//g;s/<\/loc>//g'
If filter is specified, keep only URLs containing that path prefix.
6b. Check for existing manifest:
If docs/external/{name}/_manifest.md exists, this is a re-sync. Read the manifest to know which pages were previously selected.
For re-sync:
Re-syncing
{name}:
- {N} previously selected pages to check
- {M} new pages found: {list first 5}
- {K} pages removed from sitemap: {list}
Fetch the {M} new pages too?
For first-time sync, proceed to 6c.
6c. Group and present to user:
Group URLs by path sections (first 2-3 path segments after domain). Present:
Found {N} pages in sitemap for
{name}:/{section-1}/ ({count} pages)
- /{section-1}/page-a
- /{section-1}/page-b
- ... and {remaining} more
/{section-2}/ ({count} pages)
- ...
Which sections do you want to fetch? (all / list specific sections / skip)
Wait for user selection.
6d. Fetch selected pages:
For each selected URL, fetch via Jina Reader:
curl -sS -L --max-time 30 -H 'Accept: text/markdown' 'https://r.jina.ai/{url}' -o /tmp/docs-sync-page.md
If Jina returns an error (403, empty response, captcha), fall back to direct curl:
curl -sS -L --max-time 30 '{url}' -o /tmp/docs-sync-page.html
Warn the user that direct-curl output may have worse formatting.
6e. Save each page:
Generate slug from URL path:
// with --.html, .htm, .php extensionsWrite to docs/external/{name}/{slug}.md with frontmatter (source_url, fetched_at, content_hash, fetch_method: sitemap).
6f. Write manifest:
Write docs/external/{name}/_manifest.md:
# {name} — Synced Documentation
Source: {sitemap_url}
Last sync: {timestamp}
Pages: {count}
| File | Source URL |
|------|-----------|
| {slug}.md | {url} |
| ... | ... |
Discovers pages by following links from a root page.
7a. Fetch root page:
curl -sS -L --max-time 30 -H 'Accept: text/markdown' 'https://r.jina.ai/{base_url}' -o /tmp/docs-sync-{name}-root.md
7b. Check for existing manifest (re-sync):
Same logic as Step 6b — if _manifest.md exists, re-fetch previously selected pages and present new discoveries.
7c. Extract and present links:
Read the fetched markdown content. Extract all markdown links that:
base_urlPresent discovered structure to user:
Crawled
{base_url}and found {N} linked pages:
- {link_1_title or path} → {url}
- {link_2_title or path} → {url}
- ... (first 20, then "and {remaining} more")
Which pages do you want to fetch? (all / pick specific / skip)
Wait for user selection.
7d. Depth crawling:
Maintain a set of all discovered URLs to prevent cycles and duplicates.
For selected pages, fetch each. If depth > 1, extract NEW links from fetched pages (not already in the discovered set). Present new discoveries:
Depth 2: found {M} additional pages linked from fetched content:
- {new_link_1}
- ...
Fetch these too?
Continue until depth is reached or no new links are discovered.
Safety limit: If total pages exceed 50, ask user before continuing. This prevents runaway crawls on large documentation sites.
7e. Save and write manifest:
Same as Step 6e and 6f — save each page with frontmatter, write _manifest.md.
Simplest multi-page strategy — no interaction needed.
For each URL in the urls list:
If docs/external/{name}/{slug}.md exists, read frontmatter and get content_hash.
Fetch via Jina Reader:
curl -sS -L --max-time 30 -H 'Accept: text/markdown' 'https://r.jina.ai/{url}' -o /tmp/docs-sync-page.md
Compute hash. If unchanged, skip.
If Jina fails, fall back to direct curl (warn about formatting).
Write with frontmatter (fetch_method: explicit_urls).
After processing all sources, present a summary:
Sync complete:
Source Status Details dagster updated llms-full.md changed (12 sections modified) dlt unchanged hash match, skipped clickup-api new 8 pages fetched saldeo-api error HTTP 503 on 1/3 URLs Run
git diff docs/external/to see what changed.
| Error | Behavior |
|---|---|
docs/sources.yaml not found | Suggest /sync-docs --init |
| Source name not in config | List available source names |
| HTTP 4xx/5xx | Report URL + status, skip page, continue with rest |
| Empty response from Jina | Fall back to direct curl, warn about formatting |
| Jina blocked (403/captcha) | Fall back to direct curl, warn about formatting |
| curl timeout (>30s) | Report timeout, skip page, continue |
| Invalid YAML config | Report parse error with hint, stop |
Jina Reader API: Converts any URL to clean markdown. Usage: curl -sS -L -H 'Accept: text/markdown' 'https://r.jina.ai/{url}'. No API key required. No rate limits observed. Preserves code blocks, headers, tables, links.
llms.txt convention: Many documentation sites publish /llms.txt (index) and /llms-full.txt (full content) — purpose-built for LLM consumption. Always prefer these over scraping when available.
Config reference: See references/sources-yaml-schema.md for full config schema, examples, decision tree, and list of known llms.txt URLs.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub skacper/claude-docs-sync --plugin docs-sync