From arc
Runs Playwright or Cypress e2e tests, analyzes failures, and iteratively fixes them (selectors, timing, bugs, flakiness) until all pass. Isolates verbose output in separate context. Max 5 iterations per file.
How this agent operates — its isolation, permissions, and tool access model
Agent reference
arc:agents/workflow/e2e-test-runnersonnetThe summary Claude sees when deciding whether to delegate to this agent
<arc_runtime> This agent is part of the full Arc runtime. Paths use these conventions: - `agents/...`, `references/...`, `disciplines/...`, `templates/...`, `scripts/...`, `rules/...`, `skills/<name>/...` are Arc-owned files at the plugin root. Resolve the plugin root from this agent file's filesystem location — it's the directory containing `agents/` and `skills/`. - `.ruler/...`, `docs/...`, ...
<arc_runtime> This agent is part of the full Arc runtime.
Paths use these conventions:
agents/..., references/..., disciplines/..., templates/..., scripts/..., rules/..., skills/<name>/... are Arc-owned files at the plugin root. Resolve the plugin root from this agent file's filesystem location — it's the directory containing agents/ and skills/..ruler/..., docs/..., src/..., or any project-relative path refers to the user's project repository.
</arc_runtime>Run e2e tests and fix failures iteratively until all pass.
# Check for Playwright
[ -f playwright.config.ts ] && echo "playwright"
# Check for Cypress
[ -f cypress.config.ts ] && echo "cypress"
# Check package.json scripts
grep -E "\"(e2e|test:e2e|playwright|cypress)\"" package.json
Playwright:
pnpm exec playwright test --reporter=list
Cypress:
pnpm exec cypress run
Tests must fail fast. A single hanging test should not kill an entire suite. This is critical when hitting real endpoints.
Playwright config (playwright.config.ts):
export default defineConfig({
// Global timeout per test - fail fast, don't hang
timeout: 30_000, // 30s max per test
// Expect assertions timeout
expect: {
timeout: 5_000, // 5s max to find elements
},
// Fail the entire suite on first failure (optional, faster feedback)
// maxFailures: 1,
// Verbose output
reporter: [['list'], ['html', { open: 'never' }]],
// Retries for flaky tests hitting real endpoints
retries: process.env.CI ? 2 : 0,
// Don't retry forever - fail fast on genuine issues
use: {
actionTimeout: 10_000, // 10s max per action (click, fill, etc.)
navigationTimeout: 15_000, // 15s max for page loads
},
});
Key principles:
| Setting | Purpose | Recommendation |
|---|---|---|
timeout | Max time per test | 30s for most tests, extend only if genuinely slow |
actionTimeout | Max time per click/fill/etc | 10s - if an element takes longer, something's wrong |
expect.timeout | Max time for assertions | 5s default, adjust per-assertion if needed |
retries | Handle flaky network | 1-2 in CI, 0 locally to surface real issues |
Per-test timeout override (when genuinely slow):
test('slow endpoint test', async ({ page }) => {
test.setTimeout(60_000); // Only this test gets 60s
// ...
});
Never:
test.slow() as a crutch for poor test designVerbose output flags:
# Playwright - see every step
pnpm exec playwright test --reporter=list
# Debug mode - step through
pnpm exec playwright test --debug
# Show browser
pnpm exec playwright test --headed
For each failure:
Fix strategy:
After each fix:
# Run just the failing test first (faster feedback)
pnpm exec playwright test path/to/test.spec.ts
# Once passing, run full suite
pnpm exec playwright test
Repeat Steps 3-4 until all tests pass.
Max iterations: 5 per test file. If still failing after 5 attempts, report back with:
## E2E Test Results
**Status:** ✅ All passing / ❌ X failures remaining
**Tests run:** N
**Passed:** N
**Failed:** N
### Fixes Applied
- `path/to/test.spec.ts`: Fixed selector for login button
- `path/to/other.spec.ts`: Added wait for network idle
### Remaining Issues (if any)
- `path/to/flaky.spec.ts`: Intermittent timeout, may need investigation
| Symptom | Likely Cause | Fix |
|---|---|---|
| Element not found | Selector changed | Update selector |
| Timeout waiting for element | Slow load / missing element | Add explicit wait or check if element should exist |
| Text mismatch | Content changed | Update expected text |
| Click intercepted | Overlay/modal blocking | Wait for overlay to close, or click through |
| Navigation timeout | Slow page load | Increase timeout or add waitForLoadState |
| "ECONNREFUSED" / "Network error" | Server not running, wrong port | Start server, check URL |
| LLM API timeout | Payload too large OR model overloaded | Reduce input, try faster model |
| "413 Payload Too Large" | Request body exceeds limit | Truncate input, remove images |
<required_reading> For LLM API failures, read:
references/llm-api-testing.md — Payload size is the most common culprit
</required_reading>Prefer data-testid for reliable element location.
When writing or fixing tests, use this selector priority:
data-testid — Most reliable, won't break with UI changesgetByRole('button', { name: 'Submit' })getByLabel('Email address')getByText('Welcome back') (fragile if copy changes)When creating tests, add data-testid to components:
<button data-testid="submit-order">Place Order</button>
// In test
await page.getByTestId('submit-order').click()
If a selector keeps breaking: Add a data-testid to the component rather than writing increasingly complex selectors.
Never:
data-testid would be more stableAlways:
data-testid attributes when writing new testable componentsnpx claudepluginhub commoninstruments/arc --plugin arcRuns and fixes Playwright E2E tests. Handles flaky tests, timing issues, and selector problems. Iterates until green or reports blockers. Contains verbose output.
End-to-end testing specialist using Vercel Agent Browser (preferred) with Playwright fallback. Manages test journeys, quarantines flaky tests, uploads artifacts, and ensures critical user flows work.
Debug and fix failing Playwright tests by inspecting DOM state, analyzing selectors, and applying web-first assertions.