From gnosis
Ingests content into gnosis-mcp from local files, git history, or web crawls. Supports flags for embedding, pruning, wiping, and forcing re-ingest. Useful for populating a knowledge base for agent queries.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gnosis:ingestThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
One skill that covers every way to get content into gnosis-mcp. Routes
One skill that covers every way to get content into gnosis-mcp. Routes
based on $ARGUMENTS:
git <repo> → git-history ingestcrawl <url> → web-crawl ingestreingest → full reset + re-ingest from the default pathprune <path> → delete DB chunks whose source is goneingest <path> — local filesDefault entry point. Handles .md, .txt, .ipynb, .toml, .csv,
.json (+ .rst / .pdf if those extras are installed).
gnosis-mcp ingest ./docs --embed
--embed runs the bundled ONNX embedder (requires [embeddings]
extra). Without it you get keyword-only search — usually enough for
dev-doc corpora (see bench-experiments), but --embed costs nothing
on a first ingest and enables hybrid later if you want it.--force to re-ingest regardless.Default is 2000 characters (~600 tokens) — peak of the Feb 2026 sweep on a real dev-docs corpus. Override per-ingest or globally:
# This invocation only
GNOSIS_MCP_CHUNK_SIZE=1500 gnosis-mcp ingest ./docs --embed
# Persistent (put in shell profile)
export GNOSIS_MCP_CHUNK_SIZE=3000 # long-form blogs / ADRs
If you're unsure, run /gnosis:tune to sweep sizes against your own
golden queries.
Files moved, deleted, renamed. Pick one:
# Safest: re-ingest + drop chunks for files that no longer exist
gnosis-mcp ingest ./docs --embed --prune
# Nuclear: drop everything first, then re-ingest
gnosis-mcp ingest ./docs --embed --wipe
# Preview what prune would delete
gnosis-mcp prune ./docs --dry-run
By default --prune leaves crawled URLs alone (since those don't
correspond to local files). Add --include-crawled if you want those
gone too.
ingest git <repo> — git commit historyIndexes each file's commit history as a searchable markdown document. Lets your agent answer "why does this code exist" queries.
gnosis-mcp ingest-git /path/to/repo --since 6m --embed
Common flags:
| Flag | Effect |
|---|---|
--since 6m / --since 2025-01-01 | Window of commits to include |
--until 2026-03-01 | Upper bound |
--author "alice@" | Filter by author name or email substring |
--max-commits-per-file 20 | Default 10, most-recent wins |
--include "src/**" | Glob filter on touched files |
--exclude "*.lock,package.json" | Skip noisy files |
--include-merges | Default excludes merge commits |
Each indexed doc's file_path is git-history/<original-path>.md.
Cross-file co-edits generate git_co_change edges; source-file
references get git_ref. Query them via
mcp__gnosis__search_git_history (or filter mcp__gnosis__get_related
by relation_type=git_co_change).
Re-run ingest-git whenever your history grows past the window you
already indexed.
ingest crawl <url> — web crawlIndexes a documentation website. Requires the [web] extra
(pip install 'gnosis-mcp[web]').
# Preferred — discover URLs from sitemap.xml
gnosis-mcp crawl https://docs.stripe.com --sitemap --embed
# No sitemap? BFS link crawl, one hop deep
gnosis-mcp crawl https://docs.example.com --max-depth 1 --embed
# Subset only
gnosis-mcp crawl https://docs.example.com --sitemap \
--include "/docs/api/**" --exclude "*.pdf"
# Preview, don't fetch
gnosis-mcp crawl https://docs.example.com --dry-run
Other flags:
| Flag | Effect |
|---|---|
--max-pages 5000 | Safety cap |
--force | Ignore the ETag / Last-Modified / hash cache |
Behaviour:
robots.txt. A same-host redirect on /robots.txt is
treated as disallow (prevents spoofing).~/.local/share/gnosis-mcp/crawl-cache.json — subsequent crawls
issue conditional GETs and skip unchanged pages.GNOSIS_MCP_CRAWL_EXTRACT_TIMEOUT_S).Vendor docs strategy: crawl them once, commit the indexed SQLite to version control, and you have offline, searchable vendor docs alongside your private docs. No Context7 subscription required.
ingest reingest — full resetDrop everything, reinitialise, reindex. Use when:
GNOSIS_MCP_EMBED_MODEL or _EMBED_DIM) —
old vectors are now incompatibleinit-db is idempotent so rerunning is safe)gnosis-mcp init-db # ensure schema is current
gnosis-mcp ingest ./docs --embed --wipe # delete everything + reingest
gnosis-mcp stats # confirm
ingest prune <path> — dead-chunk cleanupStandalone prune, independent of re-ingest.
# What would go
gnosis-mcp prune ./docs --dry-run
# Delete chunks for files no longer on disk under ./docs
gnosis-mcp prune ./docs
# Also drop crawled URLs (normally spared)
gnosis-mcp prune ./docs --include-crawled
Safer than --wipe because it only deletes rows whose original
file_path resolved as a local file under the given root AND is now
missing. Crawled URLs, git-history docs (git-history/*), and any
path outside the root are untouched unless you explicitly opt in.
Skip the manual re-run loop — the server can watch a folder and re-ingest on file changes.
gnosis-mcp serve --watch ./docs --transport streamable-http --rest
Mtime polling + debounce. Works on every OS, no fsnotify dependency. Ideal for docs-as-code repos where you push a doc and want it searchable by your editor within a few seconds.
Always run gnosis-mcp stats (or /gnosis:status stats) after a big
ingest to confirm:
--embed$ gnosis-mcp stats
Documents: 558
Chunks: 1,742
Embeddings: 1,742 / 1,742 (100.0 %)
Last access log entry: 2026-04-18 07:12 UTC
Backend: sqlite
/gnosis:tune — chunk-size sweep on your own corpus/gnosis:status — connectivity + DB health/gnosis:search — query the index you just populatedGNOSIS_MCP_* env varnpx claudepluginhub nicholasglazer/gnosis-mcpFirst-time setup wizard for Gnosis MCP: install, init database, ingest docs, and verify connectivity.
Seeds memory store from existing material (repos, PDFs, transcripts) via local or HTTP upload with preview, dry-run, and real ingestion modes.
Harvests knowledge from external sources like sibling repos, local directories, files, or web URLs into the project's KB system with provenance tracking.