From data-annotation
Create a self-contained annotation workspace for a prepared dataset, including a locked annotation schema and ready-to-run boilerplate for batch inference via the Gemini API. Use when the user says "set up annotation", "create annotation environment", "scaffold annotation", or after `shape-dataset` has produced a clean reshaped dataset.
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-annotation:scaffold-annotation-envThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Produces a workspace where annotation runs can be executed, reviewed, and iterated. The annotation engine is **Gemini batch inference** (cost/throughput driven choice), with Python boilerplate using the official `google-genai` SDK.
Produces a workspace where annotation runs can be executed, reviewed, and iterated. The annotation engine is Gemini batch inference (cost/throughput driven choice), with Python boilerplate using the official google-genai SDK.
id field (typically the output of shape-dataset).schema-designer subagent to produce schema.json before scaffolding.GEMINI_API_KEY in env. If missing, tell the user where to set it; don't proceed silently.Inside the user's annotation workspace (default: <dataset-workspace>/annotation/):
annotation/
├── README.md # how to run, edit prompts, review outputs
├── schema.json # the locked annotation schema (copied from prep)
├── prompts/
│ └── annotator.md # system + user prompt templates referencing schema.json
├── input/
│ └── tasks.jsonl # one task per line, {id, ...fields}
├── output/ # batch results written here
├── reviewed/ # post-review outputs
├── scripts/
│ ├── run_batch.py # Gemini batch inference runner
│ ├── poll_batch.py # poll job status, download results
│ └── validate_outputs.py# schema-validate every annotation
├── .env.example # GEMINI_API_KEY=...
├── pyproject.toml # uv-managed deps: google-genai, pydantic, jsonschema
└── .gitignore # output/, reviewed/, .env
scripts/run_batch.pyA runnable script that:
schema.json and prompts/annotator.md.input/tasks.jsonl.client.batches.create(...) using the google-genai SDK.output/batch-<timestamp>.json.Use Gemini's structured-output / response-schema feature to constrain outputs to the annotation schema — don't rely on prompt-only JSON contracts.
Keep the script readable. The user will tweak prompts and re-run; clarity beats cleverness.
scripts/poll_batch.pyPolls the most recent batch job, downloads results when complete, and writes per-task outputs to output/<id>.json. Reports progress on each poll.
scripts/validate_outputs.pyWalks output/, validates each JSON against schema.json using jsonschema, and reports failures. This catches cases where the model returned malformed output despite the response schema.
prompts/annotator.mdTwo sections — system prompt and user prompt template. The system prompt summarizes the task, the label set from schema.json, and the guidelines (including any edge cases captured by schema-designer). The user prompt is a template referencing the task fields.
README.mdTells the user how to: install deps (uv sync), set the API key, run a small smoke batch (e.g. 5 tasks) before the full run, poll, validate, and hand off to review-annotations (the subagent) for human/LLM-in-the-loop review.
id already has an output..env only, .env.example checked in, real .env ignored.npx claudepluginhub danielrosehill/claude-code-plugins --plugin data-annotationSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.