From ux-researcher
Plan a usability test — define research questions, tasks, participant criteria, and analysis approach. Produces a structured test plan ready for execution. Use before conducting moderated or unmoderated usability testing with real users.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ux-researcher:usability-test-planThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Plan a usability test for $ARGUMENTS. This skill produces a structured test plan covering research questions, methodology, participants, tasks, and analysis. Use this before running any moderated or unmoderated usability test — a test without a plan produces anecdotes, not insights.
Plan a usability test for $ARGUMENTS. This skill produces a structured test plan covering research questions, methodology, participants, tasks, and analysis. Use this before running any moderated or unmoderated usability test — a test without a plan produces anecdotes, not insights.
Start with what you want to learn. Research questions drive every subsequent decision — methodology, tasks, participants, and analysis.
Define 3-5 specific, answerable research questions:
### Research Questions
| # | Question | Type | What it will tell us |
|---|---|---|---|
| RQ1 | [Specific question — e.g., "Can users complete checkout without assistance?"] | Behavioural / Attitudinal | [Decision this informs] |
| RQ2 | [e.g., "Where do users get confused in the onboarding flow?"] | Behavioural | [Decision this informs] |
| RQ3 | [e.g., "Do users understand what the pricing tiers include?"] | Comprehension | [Decision this informs] |
| RQ4 | [e.g., "How does the new navigation compare to the current one?"] | Comparative | [Decision this informs] |
Good research questions are:
Output: 3-5 research questions with types and the decisions they inform.
Select the testing approach based on research questions and constraints:
### Methodology
| Dimension | Choice | Rationale |
|---|---|---|
| **Moderation** | Moderated / Unmoderated | [Why — moderated for exploration, unmoderated for scale] |
| **Location** | Remote / In-person | [Why — remote for reach, in-person for context] |
| **Protocol** | Think-aloud / Task-completion / A/B comparison | [Why — think-aloud for discovery, task-completion for benchmarking] |
| **Prototype fidelity** | Live product / High-fidelity prototype / Low-fidelity wireframe | [Why — match to development stage] |
| **Tool** | [UserTesting / Lookback / Maze / in-person with recording] | [Why — capabilities needed] |
Method selection guide:
Output: Methodology table with rationale for each choice.
Determine who to test with and how to find them:
### Participants
| Criterion | Requirement |
|---|---|
| **Number** | [5-8 for qualitative; 20+ for quantitative benchmarking] |
| **User type** | [Match to personas — e.g., "Small business owner, 1-10 employees"] |
| **Experience level** | [Novice / Intermediate / Expert with the product or domain] |
| **Demographics** | [Relevant demographics — age range, accessibility needs, tech literacy] |
| **Exclusions** | [Who to exclude — employees, recent participants, competitors] |
### Screener Questions
| # | Question | Accept | Reject |
|---|---|---|---|
| S1 | [Screening question — e.g., "How often do you use project management tools?"] | [Daily/Weekly] | [Never] |
| S2 | [e.g., "What is your role?"] | [Manager, Team Lead] | [Developer, Designer] |
| S3 | [e.g., "Have you used [product] before?"] | [Yes, in last 30 days] | [Never used it] |
### Recruitment
| Item | Detail |
|---|---|
| **Source** | [Customer panel / Recruitment agency / Social media / In-app intercept] |
| **Incentive** | [Amount and form — e.g., "$50 gift card"] |
| **Timeline** | [Recruitment start → sessions complete] |
Nielsen's research shows 5 participants catch approximately 85% of usability issues. Use 5 for qualitative discovery, more for quantitative benchmarking.
Output: Participant criteria, screener questions, and recruitment plan.
Create realistic scenarios that map to research questions:
### Task Design
| # | Task | Scenario | Success criteria | Time limit | Research question |
|---|---|---|---|---|---|
| T1 | [Action-oriented task name] | [Realistic context — "You want to invite a colleague to your project..."] | [Observable outcome — "User reaches the invitation confirmation screen"] | [minutes] | RQ1 |
| T2 | ... | ... | ... | ... | RQ2 |
| T3 | ... | ... | ... | ... | RQ1, RQ3 |
Task design rules:
Output: Task table with scenarios, success criteria, time limits, and RQ mapping.
Script the session to ensure consistency across participants:
### Moderator Guide
#### Introduction (5 minutes)
- Welcome and thank participant
- Explain purpose: "We're testing the [product/feature], not you — there are no wrong answers"
- Confirm recording consent
- Explain think-aloud protocol (if applicable): "Please say what you're thinking as you work through the tasks"
- Ask: "Do you have any questions before we start?"
#### Warm-up (3 minutes)
- [Background question — "Tell me about how you currently [relevant activity]"]
- [Familiarity question — "How often do you [relevant task]?"]
#### Tasks (30-40 minutes)
For each task:
1. Read the scenario aloud (or share written scenario for unmoderated)
2. Observe without intervening
3. Note: time to complete, errors, hesitations, verbal comments
4. If stuck for > [time limit]: offer one neutral prompt ("What are you looking for?")
**Follow-up probes (use after each task):**
- "What did you expect to happen there?"
- "Was anything confusing or surprising?"
- "How would you rate the difficulty of that task? (1-5)"
#### Debrief (5-10 minutes)
- "Which task was most difficult? Why?"
- "What would you change about this experience?"
- "Is there anything else you'd like to share?"
- Thank participant, provide incentive
Output: Complete moderator guide with introduction, tasks, probes, and debrief.
Decide what to measure and how to analyse it before running sessions:
### Quantitative Metrics
| Metric | Definition | Target | How measured |
|---|---|---|---|
| **Task success rate** | % of participants completing the task | > 80% | Binary: completed / not completed |
| **Time on task** | Seconds from task start to completion | < [target]s | Stopwatch / tool timer |
| **Error rate** | Number of wrong actions per task | < 2 per task | Observer count |
| **Satisfaction (SEQ)** | [Single Ease Question](https://measuringu.com/seq10/) — "How easy was this task?" (1-7) | > 5.5 | Post-task questionnaire |
| **System satisfaction (SUS)** | [System Usability Scale](https://measuringu.com/sus/) — post-test | > 68 (above average) | Post-test questionnaire |
### Qualitative Analysis
| Method | Description |
|---|---|
| **Affinity diagramming** | Group observations into themes across participants |
| **Severity rating** | Rate each issue: [Critical / Major / Minor / Cosmetic] |
| **Frequency count** | How many participants encountered each issue |
| **Rainbow spreadsheet** | Task × Participant matrix showing success/failure/assistance patterns |
### Severity Scale
| Level | Definition | Action |
|---|---|---|
| **Critical** | User cannot complete the task | Must fix before release |
| **Major** | User completes with significant difficulty or errors | Should fix before release |
| **Minor** | User notices but works around it | Fix in next iteration |
| **Cosmetic** | Noticed only when pointed out | Fix if time permits |
Output: Metrics table with targets, qualitative analysis approach, and severity scale.
### Logistics
| Item | Detail |
|---|---|
| **Tool** | [Recording/testing platform] |
| **Recording** | [Screen + audio / Screen + audio + video / Notes only] |
| **Consent form** | [Template — covers recording, data use, withdrawal rights] |
| **Session duration** | [minutes — typically 45-60 for moderated, 15-30 for unmoderated] |
| **Schedule** | [Date range, sessions per day — max 4 moderated sessions/day to avoid fatigue] |
| **Observers** | [Who watches — product, design, engineering; max 2 observers per session] |
| **Note-taking** | [Dedicated note-taker or recording-only] |
| **Pilot session** | [Date — run one pilot to test the guide before real sessions] |
| **Deliverable** | [Report format and delivery date] |
Always run a pilot session with a colleague or friendly user before real sessions. The pilot tests your test — unclear tasks, timing issues, and technical problems.
Output: Logistics checklist with dates, tools, and responsibilities.
# Usability Test Plan: [Feature/Flow Name]
**Date:** [date] | **Researcher:** [name] | **Status:** [Draft/Approved/In progress/Complete]
## 1. Research Questions
[From Step 1 — 3-5 questions with types]
## 2. Methodology
[From Step 2 — moderation, location, protocol, tools]
## 3. Participants
[From Step 3 — criteria, screener, recruitment]
## 4. Tasks
[From Step 4 — scenarios with success criteria]
## 5. Moderator Guide
[From Step 5 — introduction, tasks, probes, debrief]
## 6. Metrics & Analysis
[From Step 6 — quantitative targets, qualitative approach, severity scale]
## 7. Logistics
[From Step 7 — schedule, tools, pilot, deliverables]
/ux-researcher:usability-review — heuristic evaluation without users. Use as a complement: heuristic review finds obvious issues cheaply, usability testing finds issues only real users reveal./ux-researcher:persona-definition — participant criteria should align with defined personas. Define personas first if they don't exist.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub hpsgd/turtlestack --plugin ux-researcher