Search everything...

Stats

Actions

Available In

evaluation

Name: evaluation
Author: owl-listener

By Owl-Listener

Design custom rubrics and benchmarks to evaluate AI outputs and products, run structured assessments scoring quality, task success, failures, UX heuristics, and user satisfaction, then monitor drift over time with dashboards, alerts, and actionable reports.

ai-ml

testing

monitoring

npx claudepluginhub owl-listener/ai-design-skills --plugin evaluation

Popularity

Stars

Top 10%

108

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands3

Step 1: Define What You're Evaluating

/create-rubric

Build a scoring rubric for evaluating AI output quality.

Step 1: Define Benchmark Goals

/design-benchmark

Design a benchmark suite to measure AI product performance over time.

Step 1: Define the Evaluation Scope

/run-evaluation

Execute a structured evaluation of an AI feature against defined criteria.

Skills7

comparative-evaluation

/comparative-evaluation

A/B testing, side-by-side comparison, and preference ranking for AI outputs.

failure-taxonomy

/failure-taxonomy

Classifying AI failures — hallucination, refusal, irrelevance, tone mismatch, latency.

heuristic-evaluation-ai

/heuristic-evaluation-ai

Adapting Nielsen's heuristics and new AI-specific heuristics for AI interfaces.

longitudinal-measurement

/longitudinal-measurement

Tracking AI product quality over time — drift, degradation, and improvement.

output-quality-rubrics

/output-quality-rubrics

Defining what "good" looks like for AI outputs — accuracy, relevance, helpfulness.

Stats

Version1.0.0

LanguagePython

Stars108

Forks16

MaintenanceGood

LicenseMIT

Last CommitApr 25, 2026

AddedMay 1, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

designer-skills1,558 ai-design-skills116

README

AI Design Skills Collection

Agentic skills, commands, and plugins for designing AI products — from interaction patterns to alignment, evaluation, agent orchestration, and prompt architecture.

42 skills and 18 commands across 6 plugins, available for both Claude Code and Gemini CLI. Same skills, same shape, both supported as first-class agents.

Why this exists

Agentic Experience Design (AXD) is a new discipline. It has its own vocabulary, definitions, and practices, and it is emerging in real time. Mixed-initiative flow. Harm anticipation. Handoff protocol. Error personality. These are real, useful, often-academic terms. They have not yet been collected into a form designers can reach for in real work.

This repo is the collection. Six plugins, mapped to six layers of the discipline: model interaction, alignment reasoning, system behaviour, evaluation, agent orchestration, prompt architecture. Inside each, seven skills and three commands your AI agent can load when you are designing or auditing agentic experiences. The work in the underlying ideas was done by alignment researchers and HCI scholars over the last three years; the translation into installable skills is the contribution. See REFERENCES.md for the mapping from skills back to source papers.

Quick Start

Pick the install command for your agent.

Claude Code

Add the marketplace, then install plugins from it.

claude plugin marketplace add Owl-Listener/ai-design-skills
claude plugin install model-interaction-design@ai-design-skills

To install all six plugins at once:

for p in model-interaction-design ai-alignment-reasoning system-behavior-shaping evaluation design-agent-orchestration prompt-architecture; do
  claude plugin install "$p@ai-design-skills"
done

Gemini CLI

Gemini CLI installs one extension per directory and expects the manifest at the install source's root, so for this monorepo, clone and install each extension from its local path.

git clone https://github.com/Owl-Listener/ai-design-skills
cd ai-design-skills
gemini extensions install ./gemini-extension/model-interaction-design

To install all six:

for ext in gemini-extension/*/; do gemini extensions install "./$ext"; done

For development (symlink instead of copy, so edits take effect immediately):

gemini extensions link ./gemini-extension/model-interaction-design

Both agents load skills the same way: each skill has a description field in its frontmatter, the agent matches your wording against those descriptions, and the relevant skill loads automatically. You do not pick the skill — the agent does.

First time using a CLI agent?

If you have not used either Claude Code or Gemini CLI before, the path is short.

Pick an agent. Both work with this skill set; both are well-documented; both are free for personal use. Set-up instructions for Claude Code and Gemini CLI.
Open a terminal in any project folder.
Run the install command for your agent (above).
From then on, the agent will load the relevant skills automatically when you ask agentic-experience-design questions.

Try this as a first prompt to see the difference:

"I am designing an AI assistant for customer support. Help me write the error states for when the assistant does not understand the user's question. Walk me through the trade-offs."

You will see the agent reach for error-personality, tone-calibration, and (depending on the framing) harm-anticipation automatically. Compare the answer to the same question without the skills installed — the difference is what this repo is for. A worked example of that exact comparison is at examples/error-states-walkthrough.md.

Where to start

You can install all six plugins at once, but if you want to start small, here is a guide based on what you are working on.

Building any agentic feature: start with model-interaction-design and prompt-architecture. These are the foundation.
Shipping a feature soon: add evaluation and ai-alignment-reasoning. You need the failure modes mapped before launch.
Working on a multi-agent system: add design-agent-orchestration. Handoff protocols and orchestration anti-patterns are non-negotiable here.
Designing the agent's voice: add system-behavior-shaping. Persona architecture, error personality, tone calibration.
Just curious: install all six. They are small.

What Are Skills and Commands?

Skills are domain knowledge units (nouns). They teach the agent about designing AI products — like crafting conversation patterns, specifying guardrails, or structuring system prompts.

View full README on GitHub

evaluation

Popularity

What's Inside

Confidence

README

AI Design Skills Collection

Why this exists

Quick Start

Claude Code

Gemini CLI

First time using a CLI agent?

Where to start

What Are Skills and Commands?

Similar Plugins

domain-expertise-extractor

ai-alignment-reasoning

evaluate-agent

pm-advanced

agentic-usability

prompt-engineer

More by Owl-Listener

prototyping-testing

design-research

design-ops

design-systems

ux-strategy

AI Design Skills Collection

Why this exists

Quick Start

Claude Code

Gemini CLI

First time using a CLI agent?

Where to start

What Are Skills and Commands?

Popularity

Health & Quality

More by Owl-Listener

prototyping-testing

design-research

design-ops

design-systems

ux-strategy

Similar Plugins

domain-expertise-extractor

ai-alignment-reasoning

evaluate-agent

pm-advanced

agentic-usability

prompt-engineer