From lightdash-agentops
Orchestrate evaluation runs and test case management for Lightdash agents.
How this skill is triggered — by the user, by Claude, or both
Slash command
/lightdash-agentops:run-lightdash-evalsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Skill for managing and executing evaluations for Lightdash AI agents.
Skill for managing and executing evaluations for Lightdash AI agents.
Enables the "Eval-Driven Development" workflow by providing tools to create evaluation suites, append test cases (prompts), execute evaluation runs, and analyze the results.
Wraps the following MCP tools from the lightdash-tools server:
ldt__list_agent_evaluationsldt__get_agent_evaluationldt__create_agent_evaluationldt__update_agent_evaluationldt__append_agent_evaluation_promptsldt__run_agent_evaluationldt__list_agent_evaluation_runsldt__get_agent_evaluation_run_resultsldt__delete_agent_evaluationlist_agent_evaluations, get_agent_evaluation, list_agent_evaluation_runs, get_agent_evaluation_run_results.create_agent_evaluation, update_agent_evaluation, append_agent_evaluation_prompts, run_agent_evaluation.delete_agent_evaluation.ldt__append_agent_evaluation_prompts to add 20-50 diverse test cases representing real-world user queries.ldt__run_agent_evaluation.ldt__list_agent_evaluation_runs.ldt__get_agent_evaluation_run_results.agent-tuner sub-agent to automatically process evaluation results for improvement.npx claudepluginhub yu-iskw/dbt-heroes --plugin lightdash-agentopsManages AI observability evaluations — inspect, run, debug, and summarize Hog (deterministic) and LLM-judge (prompt-based) evaluators against generations.
Runs evaluations on ADK agents: writing eval datasets, analyzing failures, comparing results, and optimizing agents using the Quality Flywheel methodology.
Evaluates and improves GenAI agent output quality using MLflow's native APIs for datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components.