By Goodeye-Labs
MCP server and agent skills for the Truesight AI quality platform. Score inputs, build evaluations, analyze errors, and review results through natural language.
Fastest route to a deployed live evaluation using a pre-built Truesight template. Use when the user wants a quick start without building judgment configs from scratch.
Build a custom web interface for trace annotation and review. Use when users need a bespoke review surface for their workflow.
Scope what quality should be measured, convert it into one or more actionable binary evaluations, deploy those evaluations through Truesight MCP, and generate a companion skill that applies them correctly. Use when a user wants to create new evals, quality checks, guardrails, or pass/fail criteria for AI outputs.
Systematically identify and categorize failure modes in evaluated traces using Truesight datasets and error-analysis tools. Use when quality issues are unclear, after major pipeline changes, or when incidents indicate drift.
Audit an existing evaluation workflow and produce severity-ranked findings with concrete next actions. Use when inheriting an eval setup, diagnosing quality regressions, or checking LLM evaluation process maturity.
External network access
Connects to servers outside your machine
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Agent skills and Cursor plugin for the Truesight MCP. Step-by-step workflow playbooks for scoring inputs, building live evaluations, error analysis, and the review loop.
Works with Claude Code, Cursor, and any client that supports the agent skills standard.
In Claude Code, run:
# Step 1: Register this repository as a marketplace
/plugin marketplace add Goodeye-Labs/truesight-mcp-skills
# Step 2: Install the plugin
/plugin install truesight@goodeye-labs-truesight
This installs the Truesight plugin and its MCP skills in Claude, including truesight-workflows, create-evaluation, and companion workflow skills.
To upgrade:
/plugin update truesight@goodeye-labs-truesight
If you installed via Claude Marketplace above, you can skip this. Use the manual commands below to install skill files directly (works with Claude Code, Cursor, and other clients).
BASE=https://raw.githubusercontent.com/Goodeye-Labs/truesight-mcp-skills/main/skills
for skill in truesight-workflows evaluate-trace error-analysis generate-synthetic-data review-and-promote-traces bootstrap-template-evaluation create-evaluation eval-audit build-review-interface; do
curl -fsSL "$BASE/$skill/SKILL.md" -o ".claude/skills/$skill/SKILL.md" --create-dirs
done
BASE=https://raw.githubusercontent.com/Goodeye-Labs/truesight-mcp-skills/main/skills
for skill in truesight-workflows evaluate-trace error-analysis generate-synthetic-data review-and-promote-traces bootstrap-template-evaluation create-evaluation eval-audit build-review-interface; do
curl -fsSL "$BASE/$skill/SKILL.md" -o "$HOME/.claude/skills/$skill/SKILL.md" --create-dirs
done
| Skill | What it does |
|---|---|
truesight-workflows | Strict orchestrator that routes to the correct Truesight MCP skill based on user intent |
generate-synthetic-data | Create diverse synthetic test inputs using dimension-based variation for evaluation bootstrapping |
error-analysis | Analyze traces in datasets, label failure modes, consolidate categories, and prioritize fixes |
bootstrap-template-evaluation | Provision a template dataset and deploy a live evaluation quickly |
create-evaluation | Scope, build, and deploy new custom live evaluations from scratch |
evaluate-trace | Evaluate one or more inputs against an existing live evaluation, with optional handoff to review flows |
review-and-promote-traces | Review flagged traces, submit judgments, and promote judged items back to datasets |
eval-audit | Audit evaluation workflow maturity and return severity-ranked findings with next-skill actions |
build-review-interface | Build a custom web annotation interface when Truesight web UI is not the preferred review surface |
Once the MCP is connected and skills are installed, your AI assistant will automatically pick up the right skill based on what you ask:
truesight-workflowsgenerate-synthetic-dataerror-analysisbootstrap-template-evaluationcreate-evaluationevaluate-tracereview-and-promote-traceseval-auditbuild-review-interfaceSome skills (like generate-synthetic-data and build-review-interface) work without any Truesight account. For skills that use the Truesight MCP, you need a free Truesight account. When prompted, sign in to authorize access. All tools are available based on your account permissions.
Want more control over permissions? You can also connect using a Platform API Key instead. See Connecting with an API key below.
npx claudepluginhub goodeye-labs/truesight-mcp-skillsAgent and skill evaluation harness with MLflow integration
Trace analysis and context remediation for AI agents
Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Anthropic Claude, HuggingFace, Ollama, and MCP.
Track and analyze AI experiments with a web dashboard and MCP tools
Measure AI output quality, user satisfaction, task success, and design effectiveness.
SDK Usability Benchmark — generate, execute, judge, and analyze AI agent benchmark suites