From develop
Use when tests pass locally but fail in CI, pass some runs and fail others, or someone suspects a test is unreliable but cannot reproduce the failure consistently. Diagnoses root causes and provides concrete fixes. Triggers on: "flaky test", "intermittent test failure", "passes locally fails in CI", "test is non-deterministic", "random test failure", "테스트가 가끔 실패", "간헐적 테스트 오류", "CI에서만 실패하는 테스트", "flaky". Best for: timing races, shared state pollution, ordering dependencies, CI environment differences. Not for: consistently failing tests with a known error (those are bugs, not flakiness).
How this skill is triggered — by the user, by Claude, or both
Slash command
/develop:flaky-test-analyzerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Use when:**
Use when:
Do not use when:
test-master or test-driven-development)new Date(), Math.random(), UUID.randomUUID(), System.currentTimeMillis()@AfterEach teardown, unclosed streams, un-stubbed mocksIf sequential-thinking is available, use it to work through steps 1–5 in order — skipping triage before fixing is the primary failure mode.
For each flaky test analysis, provide:
| Claude | You |
|---|---|
| Identifies failure category from symptom description | Provide the test code and failure log |
| Explains why the root cause causes intermittent failure | Run the test 20 times in isolation to confirm |
| Provides the specific fix pattern (waitFor, Clock injection, etc.) | Implement and verify the fix |
Suggests @BeforeEach / @AfterEach cleanup patterns | Apply to the full test suite |
| Recommends CI flags (random ordering, etc.) | Configure CI pipeline |
| Fails When | Category |
|---|---|
| Run repeatedly in isolation | Timing dependency or resource leak |
| Run after specific other tests | Ordering / shared state dependency |
| Run in parallel | Concurrency or shared resource conflict |
| Run on CI but not locally | Environment dependency (clock, timezone, path, env var) |
| Run with real external systems | External dependency (network, DB, third-party API) |
| Passes after a sleep/wait | Timing / async race condition |
1. Timing Dependencies — Asserts on async work before it completes.
Fix: use waitFor/awaitility, event-driven signals, or inject a controllable Clock. Never use Thread.sleep.
2. Shared State Between Tests — Tests pollute database, static variables, in-memory caches, or file system.
Fix: reset state in @BeforeEach/@AfterEach; use @Transactional rollback or explicit truncate; prefer instance injection over singletons.
3. External Dependencies — Tests call real HTTP APIs, databases, or message queues. Fix: mock at the boundary (WireMock, Mockito); use Testcontainers for integration tests needing a real DB.
4. Test Ordering Dependencies — Test B implicitly relies on state created by Test A.
Fix: self-contained setup per test; run tests in random order (--randomly-seed=random).
5. Concurrency and Parallelism — Parallel tests share ports, files, or singletons. Fix: use port 0 (OS-assigned); inject resources so each test gets its own instance.
| Root Cause | Fix |
|---|---|
| Async race condition | Use waitFor / awaitility; never Thread.sleep |
| System clock dependency | Inject Clock; use fixed clock in tests |
| Database pollution | @Transactional rollback or truncate in @BeforeEach |
| Static/singleton state | Reset in @BeforeEach/@AfterEach; prefer instance injection |
| Real HTTP/DB calls | Mock with WireMock, Mockito, or Testcontainers |
| Ordering dependency | Self-contained setup; run tests in random order |
| Port conflict | Use port 0; never hardcode test ports |
| Timezone sensitivity | Set TZ=UTC in CI; use ZonedDateTime not Date |
| Random data collisions | Use unique test data per run (UUID prefix) |
For category-specific fix code examples, see references/flaky-fix-patterns.md.
@Disabled / @Ignore without a linked issuetest-master — writing new tests with built-in flakiness preventiontest-driven-development — TDD patterns that reduce flakiness by designchaos-engineer — when you want to intentionally test failure behaviornpx claudepluginhub newkayak12/claude-skills --plugin developDiagnoses non-deterministic test failures and eliminates root causes (timing, shared state, concurrency, external dependency, randomness) instead of retrying or skipping.
Expert approach to flaky-test-remediation in test automation. Use when working with .
Diagnoses and eliminates flaky or nondeterministic tests by classifying failure types (ordering, timing, resource, environment, external, concurrency) and isolating root causes with reproducible fixes.