From truesight
Systematically identify and categorize failure modes in evaluated traces using Truesight datasets and error-analysis tools. Use when quality issues are unclear, after major pipeline changes, or when incidents indicate drift.
How this skill is triggered — by the user, by Claude, or both
Slash command
/truesight:error-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide the user through trace-grounded failure analysis and dataset labeling.
Guide the user through trace-grounded failure analysis and dataset labeling.
Ask one question at a time using the structured question tool (loaded per the HARD-GATE above).
Example question structure:
Which data source should we analyze first?
A) Existing Truesight dataset
B) New dataset to upload
C) Unsure, list datasets first
Rules:
list_datasets.upload_dataset.get_dataset_rows with pagination.suggest_error_notes._ts_error_notes and _ts_error_category with update_dataset_row.consolidate_error_categories.apply_category_mappings.create-evaluation for new evaluation coveragereview-and-promote-traces for judgment backlogeval-audit for broader process gapslist_datasets, get_dataset_rows require datasets:readupload_dataset, update_dataset_row, apply_category_mappings require datasets:writesuggest_error_notes, consolidate_error_categories require error-analysis:executenpx claudepluginhub goodeye-labs/truesight-mcp-skillsGuides analysis of LLM pipeline traces to identify, categorize, and prioritize failure modes. Use for new eval projects, pipeline changes, metric drops, or incidents.
Use this skill when the user asks to "analyze AI errors", "error analysis for our AI feature", "open coding", "axial coding", "analyze model failures", "categorize AI mistakes", "find patterns in bad AI outputs", "what's wrong with our AI", or has a set of bad AI outputs and wants to understand what's failing and why. This is the first step in the AI eval methodology from Hamel Husain and Shreya Shankar.
Logs data analyst errors like wrong SQL, metrics, schema, or logic with fixes, severity, categories, and datasets for future learning. Triggered by 'log a correction' or /log-correction.