By Owl-Listener
Design custom rubrics and benchmarks to evaluate AI outputs and products, run structured assessments scoring quality, task success, failures, UX heuristics, and user satisfaction, then monitor drift over time with dashboards, alerts, and actionable reports.
Build a scoring rubric for evaluating AI output quality.
Design a benchmark suite to measure AI product performance over time.
Execute a structured evaluation of an AI feature against defined criteria.
A/B testing, side-by-side comparison, and preference ranking for AI outputs.
Classifying AI failures — hallucination, refusal, irrelevance, tone mismatch, latency.
Adapting Nielsen's heuristics and new AI-specific heuristics for AI interfaces.
Tracking AI product quality over time — drift, degradation, and improvement.
Defining what "good" looks like for AI outputs — accuracy, relevance, helpfulness.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Agentic skills, commands, and plugins for designing AI products — from interaction patterns to alignment, evaluation, agent orchestration, and prompt architecture.
42 skills and 18 commands across 6 plugins, available for both Claude Code and Gemini CLI. Same skills, same shape, both supported as first-class agents.
Agentic Experience Design (AXD) is a new discipline. It has its own vocabulary, definitions, and practices, and it is emerging in real time. Mixed-initiative flow. Harm anticipation. Handoff protocol. Error personality. These are real, useful, often-academic terms. They have not yet been collected into a form designers can reach for in real work.
This repo is the collection. Six plugins, mapped to six layers of the discipline: model interaction, alignment reasoning, system behaviour, evaluation, agent orchestration, prompt architecture. Inside each, seven skills and three commands your AI agent can load when you are designing or auditing agentic experiences. The work in the underlying ideas was done by alignment researchers and HCI scholars over the last three years; the translation into installable skills is the contribution. See REFERENCES.md for the mapping from skills back to source papers.
Pick the install command for your agent.
Add the marketplace, then install plugins from it.
claude plugin marketplace add Owl-Listener/ai-design-skills
claude plugin install model-interaction-design@ai-design-skills
To install all six plugins at once:
for p in model-interaction-design ai-alignment-reasoning system-behavior-shaping evaluation design-agent-orchestration prompt-architecture; do
claude plugin install "$p@ai-design-skills"
done
Gemini CLI installs one extension per directory and expects the manifest at the install source's root, so for this monorepo, clone and install each extension from its local path.
git clone https://github.com/Owl-Listener/ai-design-skills
cd ai-design-skills
gemini extensions install ./gemini-extension/model-interaction-design
To install all six:
for ext in gemini-extension/*/; do gemini extensions install "./$ext"; done
For development (symlink instead of copy, so edits take effect immediately):
gemini extensions link ./gemini-extension/model-interaction-design
Both agents load skills the same way: each skill has a description field in its frontmatter, the agent matches your wording against those descriptions, and the relevant skill loads automatically. You do not pick the skill — the agent does.
If you have not used either Claude Code or Gemini CLI before, the path is short.
Try this as a first prompt to see the difference:
"I am designing an AI assistant for customer support. Help me write the error states for when the assistant does not understand the user's question. Walk me through the trade-offs."
You will see the agent reach for error-personality, tone-calibration, and (depending on the framing) harm-anticipation automatically. Compare the answer to the same question without the skills installed — the difference is what this repo is for. A worked example of that exact comparison is at examples/error-states-walkthrough.md.
You can install all six plugins at once, but if you want to start small, here is a guide based on what you are working on.
model-interaction-design and prompt-architecture. These are the foundation.evaluation and ai-alignment-reasoning. You need the failure modes mapped before launch.design-agent-orchestration. Handoff protocols and orchestration anti-patterns are non-negotiable here.system-behavior-shaping. Persona architecture, error personality, tone calibration.Skills are domain knowledge units (nouns). They teach the agent about designing AI products — like crafting conversation patterns, specifying guardrails, or structuring system prompts.
npx claudepluginhub owl-listener/ai-design-skills --plugin evaluationPlan and execute design validation through prototyping strategies, usability testing, heuristic evaluation, and A/B experiments.
User research skills for designers: personas, empathy maps, journey maps, interview scripts, usability testing, and card sorting.
Streamline design operations with critique frameworks, handoff specs, sprint planning, review processes, and team workflows.
Build, document, and maintain scalable design systems — from tokens and components to accessibility and theming.
Shape product direction through competitive analysis, design principles, experience mapping, and strategic alignment.
デジタルプロダクト系の専門領域(UI/UX, マーケティング, コピーライティング, ASO, グロース等)の 暗黙知を抽出・言語化・構造化し、他のSkillで使えるエージェント定義と評価基準を生成するスキル。 素人では判断できない「良い/悪い」の基準を、AI が専門家レベルで調査・分析・言語化し、 再利用可能なドメインナレッジとして出力する。
Design guardrails, safety boundaries, value alignment, and ethical constraints into AI products.
Set up evaluation of AI agents with tool call validation, correctness checks, task completion, and tool reliability using Dokimos. Framework-agnostic — works with any agent framework.
Advanced PM skills: AI Product Canvas, Multi-Source Signal Synthesiser, Experiment Designer, Design Handoff Brief. For senior PMs working on complex or AI-powered products.
SDK Usability Benchmark — generate, execute, judge, and analyze AI agent benchmark suites
Build evals, A/B test prompts, audit skills, and benchmark LLM outputs at production quality