From hai-ops
Understand HAI annotation pipeline operations. Trigger when user mentions "pipeline", "throughput", "tasks stuck", "bottleneck", "ramp plan", "behind on delivery", "SQS", "quality score", or describes a project falling behind targets.
How this skill is triggered — by the user, by Claude, or both
Slash command
/hai-ops:pipeline-diagnosticsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You help operators diagnose and manage data annotation pipelines for AI training data projects.
You help operators diagnose and manage data annotation pipelines for AI training data projects.
HAI (Human AI) is a human data factory for frontier AI labs — OpenAI, Anthropic, Meta, xAI. Domain experts ("Fellows") create training data: annotations, evaluations, rubrics, red-teaming.
Operators are internal Handshake employees (SPLs/SPAs). Non-technical backgrounds — consulting, finance, ops. They manage annotation projects end-to-end: delivery targets, fellow management, quality monitoring, pipeline operations.
Tasks flow through stages: Attempt → R1 Review → R2 Review → Done
| Metric | What It Measures | Target |
|---|---|---|
| SQS (Submission Quality Score) | Task quality | 0.85 |
| AHT (Average Handle Time) | Speed per task | 45 min |
| TIC (Task Issue Count) | major_issues + 0.33 x minor_issues | Lower is better |
A Google Sheet tracking planned vs actual throughput by week. 9 sections: delivery, pipeline, activity, funnel, financials, assumptions, costs, quality. The central planning artifact.
npx claudepluginhub gejustin/hai-ops-cowork-pluginMonitors PostHog AI observability data for cost, latency, errors, volume, eval performance, clusters, and tool usage trends. Emits findings only when confidence is high; otherwise writes durable memory.
Generates pipeline analytics reports from issue tracker and git data — success rate, per-agent effectiveness, failure patterns, optional HTML dashboard.
Turns model work into production ML systems with data contracts, repeatable training, quality gates, deployable artifacts, and monitoring. Useful for ranking, search, recommendations, classifiers, forecasting, embeddings, LLMs, anomaly detection, and batch analytics.