From hiivmind-corpus
Add documentation source to corpus. Triggers: "add source", "add git repo", "include blog posts", "add local documents", "extend corpus with web pages", "add team docs", "add PDF to corpus", "import PDF book", "split PDF into chapters".
How this skill is triggered — by the user, by Claude, or both
Slash command
/hiivmind-corpus:hiivmind-corpus-add-sourceThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Add a documentation source to an existing corpus. Detects source type, collects
Add a documentation source to an existing corpus. Detects source type, collects configuration, executes acquisition, and updates config.yaml.
A config.yaml must exist in the working directory. If not found, display:
No config.yaml found. Run hiivmind-corpus-init first.
Inputs: working directory
Outputs: computed.config, computed.is_first_source
config.yaml and parse itsources array is empty → set computed.is_first_sourcetype: self:
This is an embedded corpus (type: self). Additional sources cannot be added.
Embedded corpora index their own repository content only.
To add external sources, create a standalone corpus instead.
Inputs: optional source_url from invocation
Outputs: computed.source_url, computed.source_type
Check the URL to auto-detect type:
.pdf or .PDF → go to PDF Detection (below){base_url}/llms.txt (then {base_url}/docs/llms.txt if 404)source_type = llms-txtsource_type = gitsource_type = websource_type = generated-docsBefore falling through to the generic type question, check if the source is an Obsidian vault:
For a URL pointing to a git repository:
gh api repos/{owner}/{repo}/contents/ to list root contents.obsidian directory is present in the root → vault detectedsource_type = obsidianFor a local path:
{path}/.obsidian/ existssource_type = obsidianReference: ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/obsidian.md
Ask: "What documentation would you like to add?"
| Option | Sets |
|---|---|
| Git repository | source_type = git |
| Local files | source_type = local |
| Web pages | source_type = web |
| llms.txt site | source_type = llms-txt |
| Obsidian vault | source_type = obsidian |
If user types a URL instead of selecting an option, store it as source_url and
re-enter the detection flow above.
If the URL/path ends with .pdf:
lib/corpus/patterns/sources/pdf.md:
pdf_utils.py building blocksuploads/{source_id}/local source referencing the .md output filespymupdf is installed (python -c "import pymupdf") before running — if missing, prompt:
Branch based on computed.source_type. Each path collects what's needed and updates config.yaml.
See: lib/corpus/patterns/sources/git.md
If source_url not set, ask: "What's the git repository URL?"
Parse URL to extract owner and repo_name
Derive source_id from repo name (lowercase, alphanumeric + hyphens)
Ask branch (default: main) and docs root (default: docs/)
Validate:
source_url is setsource_id is not already in config.sourcesClone: git clone --depth 1 --branch {branch} {url} .source/{source_id}
Get SHA: git -C .source/{source_id} rev-parse HEAD
Add to config.yaml per ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Git Source Entry"
Substitute collected values: source_id, url, owner, repo_name, branch, docs_root, sha.
See: lib/corpus/patterns/sources/local.md
Ask: "What should this local source be called? (used as ID)"
Ask: "Brief description of this source:"
Create directory: mkdir -p uploads/{source_id}
Add to config.yaml per ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Local Source Entry"
Substitute collected values: source_id, description.
Display: "Place your documents in uploads/{source_id}/. Supported formats: .md, .mdx, .pdf"
See: lib/corpus/patterns/sources/web.md
If source_url not set, ask: "What should this web source be called?"
Otherwise derive source_id from URL
Ask: "Brief description of this web source:"
Ask: "Enter the first URL to cache (you can add more later):"
Create directory: mkdir -p .cache/web/{source_id}
Fetch the URL content using WebFetch
Show preview (first ~500 chars) and ask: "Save this content to cache?"
If yes, save to .cache/web/{source_id}/{filename}.md
Add to config.yaml per ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Web Source Entry"
Substitute collected values: source_id, description, url, filename, timestamp.
See: lib/corpus/patterns/sources/llms-txt.md
If source_url not set, ask: "Enter the base URL for the llms.txt site:"
Then fetch manifest per llms-txt.md § "Fetch Manifest"
Parse manifest to extract title, sections, page count
Derive source_id from manifest title (or ask user if parsing fails)
Ask caching strategy: Selective (recommended) / Full / On-demand
Create directory: mkdir -p .cache/llms-txt/{source_id}
Hash manifest content for change detection
Add to config.yaml per ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "llms-txt Source Entry"
Substitute collected values: source_id, manifest_url, sha256_hash, timestamp, base_url, strategy.
See: lib/corpus/patterns/sources/generated-docs.md
Ask: "What's the source repository URL (where docs are generated from)?"
Ask: "What's the published docs URL?"
Derive source_id from repo name
Clone source repo for SHA tracking: git clone --depth 1 {url} .source/{source_id}
Get SHA: git -C .source/{source_id} rev-parse HEAD
Add to config.yaml per ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Generated-Docs Source Entry"
Substitute collected values: source_id, source_repo_url, sha, web_base_url.
See: lib/corpus/patterns/sources/obsidian.md
source_url not set (user selected "Obsidian vault" without URL), ask: "Where is the vault? (git URL or local path)"https:// or git@ → git-backedowner and repo_namesource_id from repo name (lowercase, alphanumeric + hyphens)git clone --depth 1 {url} .source/{source_id}git -C .source/{source_id} rev-parse HEAD${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Obsidian Source Entry"source_id from user inputvault_path as the absolute path to the vault directory${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md § "Obsidian Source Entry"Inputs: computed.source_id, computed.source_type
Outputs: computed.extraction_config (merged into config.yaml source entry)
After source type determination and before post-setup, offer extraction configuration.
Note: If the source type's extraction defaults are all
false(e.g., web, llms-txt, pdf), present "No extraction" as the pre-selected default option.
Inform the user:
Extraction is available for this source. It extracts wikilinks, tags, and frontmatter
to build a concept graph (graph.yaml) alongside the index. This enables richer navigation.
Ask: "How would you like to configure extraction?"
| Option | Action |
|---|---|
| Enable with defaults | Use default extraction settings for this source type (see extraction.md) |
| Customize | Ask per-feature: wikilinks? frontmatter? tags? (y/n each) |
| No extraction | Skip — no extraction block added to config |
If "Enable with defaults":
computed.source_type from ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/extraction.md § "Extraction Config" defaults tableextraction: block to the source's config.yaml entry with those defaultsIf "Customize":
extraction: block from responses and add to config.yaml source entryIf "No extraction":
extraction: block from config.yaml source entryReference: ${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/extraction.md
Inputs: computed.source_id, computed.is_first_source
If computed.is_first_source is true, update navigate skill examples per
lib/corpus/patterns/sources/shared.md § "Update Navigate Skill Examples".
Ask: "Would you like to add entries from this source to the index now?"
/hiivmind-corpus-build to analyze the source and create index entries."Display:
Source '{source_id}' added successfully.
Type: {source_type}
Location: {.source/ or uploads/ or .cache/}{source_id}
| Error | Message | Recovery |
|---|---|---|
| No config.yaml | "No config.yaml found" | Run hiivmind-corpus-init |
| Invalid git URL | "Could not parse repository URL" | Check URL format |
| Clone failed | "Failed to clone repository" | Check URL and access |
| Config update failed | "Failed to update config.yaml" | Check file permissions |
| Web fetch failed | "Failed to fetch URL content" | Check URL accessibility |
| Source ID exists | "Source ID '{id}' already exists" | Choose different name |
Source-specific operations referenced by this skill:
${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/config-yaml-formatting.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/git.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/local.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/web.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/llms-txt.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/generated-docs.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/pdf.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/obsidian.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/extraction.md${CLAUDE_PLUGIN_ROOT}/lib/corpus/patterns/sources/shared.md${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-init/SKILL.md${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-build/SKILL.md${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-enhance/SKILL.md${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-refresh/SKILL.md${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-graph/SKILL.md — View, validate, edit concept graphs${CLAUDE_PLUGIN_ROOT}/skills/hiivmind-corpus-bridge/SKILL.md — Cross-corpus concept bridges and aliasesnpx claudepluginhub hiivmind/hiivmind-corpusIngests content from Confluence, Google Docs, GitHub repos, remote URLs, or local files (DOCX, PDF, etc.) into Second Brain vault. Converts to Markdown via docling, runs graphify extraction, persists entities.
Harvests knowledge from external sources like sibling repos, local directories, files, or web URLs into the project's KB system with provenance tracking.
Builds and maintains an LLM-curated personal knowledge base of markdown files from ingested sources (papers, articles, notes). Compiles sources once into structured, cross-referenced wiki pages to accumulate knowledge over time.