From data-annotation
Set up a Hugging Face dataset repository — create the remote repo (asking public/private), copy prepared data over, generate a dataset card, and push. Uses the huggingface-cli, not an MCP. Use when the user says "set up a HF dataset", "publish to Hugging Face", "create the HF dataset repo", or after annotation/prep is complete.
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-annotation:hf-setupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
End-to-end setup of an HF dataset repository from a prepared local dataset. Encompasses creation, data copy, dataset card, and push.
End-to-end setup of an HF dataset repository from a prepared local dataset. Encompasses creation, data copy, dataset card, and push.
huggingface-cli installed and logged in. Check with huggingface-cli whoami. If not logged in, instruct the user to run huggingface-cli login (don't try to do it programmatically).<workspace>/final/ from shape-dataset, optionally enriched with annotation outputs from scaffold-annotation-env.mit, apache-2.0, cc-by-4.0, cc-by-sa-4.0, cc0-1.0, or other. Ask if unclear.huggingface-cli repo create <name> --type dataset [--private] [--organization <org>]
Use --private if the user chose private. Capture the resulting repo URL (https://huggingface.co/datasets/<owner>/<name>).
If the workspace doesn't already have a git repo for the dataset, run init-dataset-repo first. Otherwise reuse it.
cd <dataset-repo>
git lfs install
huggingface-cli lfs-enable-largefiles .
git remote add origin https://huggingface.co/datasets/<owner>/<name>
Configure .gitattributes for LFS on *.parquet, *.arrow, *.json, *.jsonl over a size threshold, and any media files.
Lay out the dataset on disk in the conventional HF structure:
<dataset-repo>/
├── README.md # the dataset card (next step)
├── data/
│ ├── train.parquet # or .jsonl
│ ├── validation.parquet
│ └── test.parquet
└── LICENSE
If the prepared data is in a different format/layout, convert it. Prefer Parquet for tabular, JSONL for variable-shape records.
Write README.md with the YAML frontmatter HF expects:
---
license: <license>
task_categories:
- <task>
language:
- en
size_categories:
- <auto>
pretty_name: <Pretty Name>
tags:
- <tag>
---
Below the frontmatter, generate sections from the workspace artifacts:
schema.json, columns, splits, sizes (read from the actual files).pii-scanner run, if any.Anything the workspace doesn't have an answer for should be a clearly-marked <!-- TODO --> rather than a fabricated detail.
git add -A
git commit -m "Initial dataset upload"
git push origin main
Report back the dataset URL and remind the user that the card preview may take a minute to render on the Hub.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin data-annotation