From ssep
Authors and runs tests across the test pyramid — unit, FE↔BE integration, and end-to-end browser scenarios — using Playwright MCP and runners (jest, vitest, mocha). Use when verifying multi-layer behavior or for "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동", "API 통합 테스트", "staging 검증", "배포 후 검증", "smoke test", or when a feature crosses boundaries (API + UI + DB). Also triggers when verifying a deployed feature or when a PR test plan has unchecked browser/API items. Triggers even when only one boundary is crossed and a manual Playwright click was performed; manual verification proves the path works once but does not produce a codified regression, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Decides which test level fits, sets up infra, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); handles integration and e2e tiers.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ssep:running-integration-testsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A passing unit test proves a function returns the expected value. It does not prove the API endpoint that calls the function works, that the database returns the expected schema, that the frontend renders the response correctly, or that the user can complete the flow end-to-end. Each of those is a different test level, and skipping the right one is how features ship "with full test coverage" an...
A passing unit test proves a function returns the expected value. It does not prove the API endpoint that calls the function works, that the database returns the expected schema, that the frontend renders the response correctly, or that the user can complete the flow end-to-end. Each of those is a different test level, and skipping the right one is how features ship "with full test coverage" and immediately break in production.
This skill handles the test levels above the unit tier — integration tests that exercise real HTTP/DB/queue boundaries, and end-to-end tests that drive a real browser through complete user flows. It also handles the meta-decision of which level fits which scenario, since the wrong level is worse than no test (slow, brittle, and gives false confidence).
superpowers:test-driven-development (unit TDD)reviewing-design-fidelity (which uses Playwright too, but for visual diff rather than behavioral assertion)A test should live at the lowest level where it can express the behavior it needs to verify. Higher levels are slower, more brittle, and harder to debug.
| Level | Speed | Verifies | Use when |
|---|---|---|---|
| Unit | <100ms | Pure logic, single function/class | Behavior is fully expressible in inputs/outputs without I/O |
| Integration | 100ms – 5s | Real HTTP/DB/queue boundaries, real schemas | Boundary contract is what's being tested (route returns expected JSON, query returns expected rows) |
| End-to-end | 5s – 60s | Full user flow through real UI | Multi-step user behavior that crosses many boundaries |
Common mistakes:
See references/test-level-decision.md for the decision tree and worked examples.
Write a single sentence: "When the user does X, the system should Y." This is the test's reason for existence; if you can't write it, the test isn't ready to be written.
Apply the decision tree from references/test-level-decision.md. If multiple levels could express the behavior, choose the lowest. Document the choice briefly in the test (a one-line comment) so future readers know it was deliberate.
Each level has its own setup pattern. See:
references/integration-test-patterns.md for HTTP+DB integration tests — test database setup, fixtures, transactional rollback, request helpersreferences/playwright-patterns.md for end-to-end browser tests via Playwright MCP — selector strategy, waiting, network mocking, capturing diagnostics on failurePer TDD discipline: write the test, run it, see it fail with the expected failure mode (not a setup error masquerading as a test failure). Only then is the test ready to be paired with implementation. If implementation already exists and the test is being added retroactively, still run it red first by intentionally breaking the implementation in one place — confirms the test would catch the regression it claims to.
Make the test pass. For new features, this is the implementation work. For retroactive tests, this is just verifying the test passes against current code.
When an integration or e2e test fails, the cause is rarely visible from the assertion message alone. Required diagnostics by level:
See references/playwright-patterns.md § "Failure diagnostics" for the standard capture sequence to run when a Playwright test fails.
A test that lives only on a developer's machine doesn't catch regressions. Commit the test and run it in CI. If CI doesn't yet run integration / e2e tests, that's its own finding to surface (CI gap → blocker for feature completeness).
data-testid or accessible roles over CSS classes that change with refactors.references/test-level-decision.md — decision tree and worked examples for choosing unit vs integration vs e2ereferences/integration-test-patterns.md — HTTP+DB integration patterns: test DB setup, transactional fixtures, request helpers, common pitfallsreferences/playwright-patterns.md — end-to-end browser test patterns using Playwright MCP: selector strategy, waiting, network mocking, failure diagnosticsnpx claudepluginhub pacho-h/ssep --plugin ssepGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.