From data-annotation
Stage raw data from a GitHub repo, local path, or remote URL into a working directory so downstream prep skills can operate on it. Use when the user provides a data source and wants it pulled in, or when `shape-dataset` needs to ingest before profiling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-annotation:ingest-sourceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Unified entry point for getting raw data onto disk in a known location. Does not transform the data — just stages it and reports what was staged.
Unified entry point for getting raw data onto disk in a known location. Does not transform the data — just stages it and reports what was staged.
DATA_ROOT="${CLAUDE_USER_DATA:-${XDG_DATA_HOME:-$HOME/.local/share}/claude-plugins}/data-annotation"
WORKSPACE="$DATA_ROOT/workspaces/<dataset-name>/raw"
If the user prefers a path they own (e.g. ~/repos/... or ~/Documents/), use that and store the pointer in $DATA_ROOT/config.json.
github.com/<owner>/<repo> or <owner>/<repo>)Clone shallow:
git clone --depth 1 https://github.com/<owner>/<repo>.git "$WORKSPACE"
Then scan the working tree for data files (.csv, .tsv, .json, .jsonl, .parquet, .arrow, .txt, .md, .yaml, .xml, .xlsx, image/audio dirs). Report a tree of candidates by extension with byte sizes. Do not assume the whole repo is data — README and source code are common.
github.com/<owner>/<repo>/blob/<ref>/<path>)Convert to raw URL and download:
curl -L https://raw.githubusercontent.com/<owner>/<repo>/<ref>/<path> -o "$WORKSPACE/<basename>"
Copy (don't move) the contents into $WORKSPACE/. Preserve directory structure.
curl -L -o "$WORKSPACE/<basename>" <url>. Verify content-length and content-type after download.
.zip, .tar.gz, .tar.bz2, .7z)Download, then extract into $WORKSPACE/. After extraction, delete the archive unless the user wants to keep it.
Write $WORKSPACE/_ingest.json with:
source — original source stringsource_type — one of github-repo, github-blob, local, url-file, url-archivestaged_at — ISO timestamptree — list of files with sizesnotes — anything the user should be aware of (binary blobs, unusual extensions, archive layout)Then briefly summarize for the user: "Staged N files, total X MB, candidates appear to be ..." and hand off to shape-dataset (or whatever the user wants next).
git clone fails, suggest gh repo clone (which uses the user's gh auth) instead of asking for credentials.npx claudepluginhub danielrosehill/claude-code-plugins --plugin data-annotationSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.