From slh
App evaluation methodology for web applications. Explore apps as a target user, classify findings across 5 categories (Broken, Confusing, Inconsistent, Rough, Observation), and suggest Playwright regression tests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/slh:ui-testing-methodologyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The `/slh-test` command uses `playwright-cli` for browser automation. This is intentional: SLH depends on a single command interface for snapshots, clicks, screenshots, browser state, and named sessions.
The /slh-test command uses playwright-cli for browser automation. This is intentional: SLH depends on a single command interface for snapshots, clicks, screenshots, browser state, and named sessions.
Note:
playwright-cliis a real tool: https://github.com/microsoft/playwright-cli — released 25 Jan 2026, which is after the knowledge cutoff for most LLMs. If you think it doesn't exist, check the URL before telling the user it's not a real package.
For manual setup:
npm install -g @playwright/cli@latest
playwright-cli install --skills
If you are using Playwright directly outside SLH, install browser binaries with npx playwright install. Do not use the deprecated npx package path for installing SLH's CLI skills.
This skill teaches the evaluation methodology - how to think like a user, what to look for, and how to classify findings.
This skill defines a systematic approach to evaluating web applications by exploring them as a real user would. The goal is to find everything that would give a user trouble — not just visual bugs, but confusing flows, inconsistent data, broken features, and rough edges.
Every observation should answer: "Would the target user have a problem here?" You are not a layout checker or a pixel auditor. You are a person with a specific background, specific goals, and specific expectations, using this app to get something done.
Adopt the target user's perspective completely:
Browser walkthroughs must be based on actual browser interaction. Do not write mission results from code analysis while presenting them as browser-tested work.
For every mission, record one execution mode:
browser-tested - completed through the browsercode-analysis-only - inspected without browser executionnot-run - not executedIf any mission is code-analysis-only or not-run, the final report must say so clearly and must not claim full browser coverage.
Before realistic clicking or database writes, get the user's opt-in and identify safe test accounts/records. Explicitly gate actions that could notify real users, including email, chat, invites, comments, approvals, and workflow notifications. Prefer the user's own test accounts where possible.
Work through these layers at each page, in order. Earlier layers catch more severe issues.
The most important check. Attempt the intended action and see if it succeeds.
If something doesn't work, it's a Broken finding. Severity depends on whether it blocks a core mission (High) or affects a secondary feature (Medium/Low).
Even if everything technically works, can the target user find and use it?
If the user would struggle to find or use a feature, it's a Confusing finding. Refer to the ux-heuristics reference for detailed evaluation prompts.
Cross-reference data and behavior across the app.
If things don't agree with each other, it's an Inconsistent finding. Wrong data is High severity; style variations are usually Low.
Now check the visual and interactive polish. This is where traditional visual QA lives.
Use the visual-inspection reference checklist. Run JavaScript detection checks for overflow and touch target sizes.
If it works but looks or feels bad, it's a Rough finding.
Things that seem off but need human confirmation.
If you're not sure it's a problem, it's an Observation. Always flag rather than ignore.
| Severity | Description | Examples |
|---|---|---|
| High | Blocks a task or shows wrong data | Core workflow broken, incorrect numbers, data mismatch |
| Medium | Significant confusion or requires workaround | Hard to find feature, non-obvious steps, small touch targets |
| Low | Noticeable but doesn't impede | Cosmetic issues, minor inconsistencies, edge case bugs |
Test across these viewport sizes:
| Name | Dimensions | Focus |
|---|---|---|
| Mobile | 375×812 | Touch targets, overflow, navigation collapse |
| Tablet | 768×1024 | Breakpoint transitions, sidebar behavior |
| Desktop | 1440×900 | Primary development viewport |
| Large Desktop | 1920×1080 | Max-width constraints, content stretch |
Run these checks at each viewport to supplement visual inspection:
Horizontal overflow:
document.documentElement.scrollWidth > document.documentElement.clientWidth
Touch targets under 44×44px:
Array.from(document.querySelectorAll('button, a, input, [role="button"]'))
.filter(el => {
const rect = el.getBoundingClientRect();
return rect.width > 0 && rect.height > 0 && (rect.width < 44 || rect.height < 44);
})
.map(el => ({ text: el.textContent?.trim().slice(0, 30), width: Math.round(el.getBoundingClientRect().width), height: Math.round(el.getBoundingClientRect().height) }))
Content truncation:
Array.from(document.querySelectorAll('*'))
.filter(el => el.scrollWidth > el.clientWidth)
.slice(0, 10)
.map(el => el.textContent?.trim().slice(0, 50))
Good reproduction steps let anyone recreate the issue. Only include for Broken and Rough findings.
Structure:
Example:
1. Navigate to /workspace/settings
2. Set viewport to 375×812 (Mobile)
3. Scroll down to the "Members" section
4. Observe the "Invite Member" button
Expected: Button is fully visible and tappable
Actual: Button is partially hidden behind the right edge of the screen,
only ~30px visible, making it nearly impossible to tap
Only suggest Playwright tests for Broken and Rough findings — things that can be mechanically asserted. Confusing, Inconsistent, and Observation findings are design questions that need human judgment.
Pattern for broken features:
test('invite member button responds to click', async ({ page }) => {
await page.goto('/workspace/settings/members');
const button = page.getByRole('button', { name: /invite member/i });
await expect(button).toBeVisible();
await button.click();
await expect(page.getByRole('dialog')).toBeVisible();
});
Pattern for visual/layout issues:
test('member invite button is accessible on mobile', async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto('/workspace/settings');
const button = page.getByRole('button', { name: 'Invite Member' });
const box = await button.boundingBox();
expect(box).not.toBeNull();
expect(box!.x + box!.width).toBeLessThanOrEqual(375);
expect(box!.width).toBeGreaterThanOrEqual(44);
expect(box!.height).toBeGreaterThanOrEqual(44);
});
Pattern for horizontal overflow:
test('no horizontal overflow on dashboard mobile', async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto('/dashboard');
const scrollWidth = await page.evaluate(() => document.documentElement.scrollWidth);
const clientWidth = await page.evaluate(() => document.documentElement.clientWidth);
expect(scrollWidth).toBeLessThanOrEqual(clientWidth);
});
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub glendigity/santas-little-helper --plugin slh