run-usability-testing | grimoire

Stats

Actions

Tags

run-usability-testing | grimoire

Run Usability Testing

Systematically observe real users attempting real tasks to surface usability problems before or after launch.

Why This Is Best Practice

Adopted by: Google, Apple, Microsoft, Amazon — all run weekly usability sessions; mandated by US federal Section 508 compliance process Impact: Nielsen (1993) showed 5 users find 85% of usability problems; fixing usability issues costs 10–100× less before development than after launch (Pressman software engineering research) Why best: ISO 9241-11 defines usability as effectiveness + efficiency + satisfaction — testing operationalizes all three with actual task data rather than assumptions.

Sources: Nielsen "Usability Engineering" Ch. 6; Nielsen Norman Group "How Many Test Users"; ISO 9241-11:2018

Steps

Define research questions — write 3–5 specific questions you need answered (e.g., "Can users find the export function without assistance?").
Select participant profile — define 2–3 user segments; recruit 5 participants per segment matching real user demographics and technical proficiency.
Write task scenarios — create realistic, goal-based tasks using natural language; avoid UI vocabulary (say "pay your bill" not "click the payment button").
Choose test format — moderated in-person: richest data; moderated remote (Zoom + screen share): broad reach; unmoderated (Maze, UserTesting): scale with less depth.
Prepare test materials — build or stage prototype/product at correct fidelity; create screener, consent form, NDA, and observer guide.
Conduct pilot test — run one session with a colleague to validate task clarity, timing (~45–60 min total), and recording setup.
Run sessions — facilitate without leading: use "think aloud" protocol; ask "what are you thinking?" not "is this confusing?"; record screen, audio, and (optionally) face.
Capture observations — log quotes, behaviors, hesitations, and errors per task in a structured observation sheet (participant × task grid).
Analyze and prioritize — identify patterns across participants; rate severity using Nielsen's 0–4 scale (0 = not a problem, 4 = usability catastrophe); prioritize by frequency × severity.
Report and action — present findings as problem statements with evidence clips; pair each finding with a recommended design direction; schedule follow-up test to validate fixes.

Rules

Never correct users during tasks — intervene only if they are completely stuck for >3 minutes or distressed.
Tasks must be completable on the prototype; do not test features that do not exist.
Record observer impressions silently during sessions; debrief after, not during.
Report findings as behavioral evidence, not opinions: "3 of 5 participants could not find X" not "the navigation is confusing."
Fix the top 3 severity-4 issues before running another round; iteration is the mechanism of improvement.

Common Mistakes

Testing with colleagues or designers — familiarity bias invalidates results; recruit actual target users.
Leading questions — "Was that confusing?" biases toward yes; ask "What were you trying to do there?"
Recruiting only 1–2 participants — too few to identify patterns; minimum 5 per segment.
Testing too late — usability testing a shipped product with no change budget wastes effort; test prototypes.
Reporting all findings equally — 20+ minor issues overwhelm stakeholders; severity ratings focus the fix list.

When NOT to Use

Validating visual aesthetics alone — use preference testing or surveys instead
Measuring adoption at scale — use analytics and cohort analysis
Comparing two designs quantitatively — use A/B testing with sufficient statistical power