From ttal
Use when authoring an integration test plan before implementation — when about to design tests for a multi-component feature, when probing for already-broken behavior, when a domain has a +bugfix history worth learning from.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ttal:sp-write-test-planThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Test plans are reasoning documents, not exhaustive coverage lists. Two halves: constructive (happy paths, edge cases, invariants) and adversarial (find what is already broken). The constructive half is well-trodden work. The adversarial half earns the keep — it surfaces bugs that exist in the current implementation, not the ones you would add later.
Test plans are reasoning documents, not exhaustive coverage lists. Two halves: constructive (happy paths, edge cases, invariants) and adversarial (find what is already broken). The constructive half is well-trodden work. The adversarial half earns the keep — it surfaces bugs that exist in the current implementation, not the ones you would add later.
Assume the worker is a skilled test writer, but knows almost nothing about the domain. Document every file they need to test, every prior bug class they should check, every seam that could break.
Announce at start: "I am using sp-write-test-plan to author the integration test plan."
First action: Run ttal project list to identify the target project before writing anything.
Use this skill when you are about to write integration tests and any of these are true:
Do NOT use this skill for unit-level TDD loops (use sp-tdd for RED-GREEN-REFACTOR). Do NOT use this skill as a post-hoc code review (sp-review-against-plan covers that).
Rule: 1 task to 1 plan to 1 project or repo.
Before writing any test plan, confirm the target project:
ttal project list — see all available projectsttal project get <alias> to confirm the pathneuron_purchases writes vs neuron_consumptions writes — different services). If the description is ambiguous, ask the requester to confirm the exact scope BEFORE Phase 1. Ambiguous scope confirmed late costs more than confirmed early.Hard rule: Do NOT proceed past this gate without a confirmed single target repo AND an unambiguous feature scope.
If the task touches multiple repos, stop and flag it: a single test plan must cover a single project. Split the task into per-repo test plans.
BEFORE writing any test plan, understand what exists. Do not design tests in the abstract — read the implementation.
task <uuid> tree — understand what steps the implementation went through, what decisions were made.flicknote find <keywords> — search for orientation docs, design docs, research notes related to this domain. For non-obvious choices visible in the implementation, search for Q-numbered decisions (e.g., "Q1 RESOLVED", "Q3 decision" — orientation flicknotes often record locked design choices this way). The Q-numbered entry is where the rationale lives; without it, code can look arbitrary.task +bugfix project:<alias> status:completed export — extract descriptions of every past bug in this project.git log --oneline path/to/changed/files and read the actual current code — divergences from the plan (stricter guards, reordered operations, additional defensive checks) are regression-trap candidates. Surface them in pass γ. The plan describes intent; the merged code is reality.After Phase 1, talk through what you found before designing the test plan. Do not go silent and start writing — discuss first.
Conversational checkpoint:
STOP here. End your message after presenting your findings and question. Do not begin Phase 2 or write the test plan until the human responds and confirms alignment.
Author the constructive section of the test plan. Group by area: per command, per service, per data flow.
For each major usage scenario:
For each entry point and data path:
Document what must always be true before and after each operation:
Three sequenced passes, executed in order. Each pass produces a table. Do NOT skip pass gamma — the empirical anchor is the most valuable pass.
task +bugfix project:<alias> status:completed export| Bug class | Source task | Current vulnerability | Test to write |
|---|
Walk a fixed checklist of common-painful seams against the implementation. For each seam, assess vulnerability and design a test.
| Seam | What to check | Vulnerability (yes or no or dont know) | Test to write |
|---|---|---|---|
| Concurrency | Shared mutable state, goroutine races, mutex hot spots, channel deadlocks | ||
| Retries | Retry-able operations, backoff strategy, max-retries, jitter, circuit breaker | ||
| Partial failures | What breaks when one of N dependencies fails? CRDT or compensation? | ||
| Idempotency | Double-apply of same event or command, dedup key stability, exactly-once vs at-least-once | ||
| Timeouts | Deadline propagation, context cancellation, default timeout values, client-server timeout mismatch | ||
| Data corruption | Encoding or decoding mismatches, stale reads after write, concurrent write-write conflict, serialization version drift | ||
| Edge dates and numbers | Timezone handling, DST transitions, unix epoch boundaries, float precision, overflow on accumulation | ||
| Hostile inputs | SQL injection, path traversal, oversized payloads, unicode normalization, content-type confusion |
Put on the adversary hat. "You want to break this implementation."
| Hypothesis | Falsifying test | Status |
|---|
Write the test plan as a single flicknote with two sections (Constructive and Adversarial).
# Test Plan: <feature or component>
**Project:** <ttal alias>
**Implementation under test:** <files and relevant task UUIDs>
**Adversarial findings filed separately:** <flicknote hex if applicable, or "None">
## Constructive
### Happy paths
- ...
### Edge cases
- ...
### Invariants and preconditions
- ...
## Adversarial
### Pass gamma — Prior-bug classes
| Bug class | Source task | Current vulnerability | Test to write |
|-----------|-------------|----------------------|---------------|
### Pass beta — Seam walk
| Seam | What to check | Vulnerability | Test to write |
|------|---------------|---------------|---------------|
### Pass alpha — Red team hypotheses
| Hypothesis | Falsifying test | Status |
|------------|-----------------|--------|
## Friction notes
Things the skill could have surfaced earlier, places the methodology felt heavy or thin, gaps in the storage template. Concrete enough to land as edits to `skills/sp-write-test-plan/SKILL.md` in a follow-up.
# Primary: test plan flicknote
cat <<'PLANEOF' | flicknote add --project testplans
# Test Plan: ...
...
PLANEOF
# Annotate the parent task with the hex ID
task <parent-uuid> annotate "testplan: flicknote <hex>"
If pass gamma or beta found confirmed-broken (not just "could break"), write a separate bug or test report flicknote:
cat <<'BUGEOF' | flicknote add --project testplans
# Bug or Test Report: <feature or component>
**Confirmed-broken issues found during adversarial pass.**
**Source:** Test plan flicknote <hex>
## Issue 1: <title>
**Pass:** gamma or beta (which pass found it)
**Evidence:** <what in the code proves this would fail>
**Recommended test:** <test that would catch this>
...
BUGEOF
task <parent-uuid> annotate "bugreport: flicknote <hex>"
Do NOT auto-file +bugfix tasks. The skill writes evidence; humans decide whether to file.
Before declaring the test plan done:
Chain into the completion phase for self-review, open questions, summary, and review handoff:
skill get sp-complete-design
Follow every step in that skill. Do not duplicate its logic here.
The skill's primary deliverable is the test plan. But sometimes the planner is also asked to run the plan, not just hand it off — pairing with a human, smoking findings, or running the adversarial passes against live infrastructure. When that happens, lessons from the first real-world execution session (eve+Neil 2026-05-06, fse.sub) apply:
Read existing production state (DB rows, logs, audit trails) BEFORE running new tests. Many "NOT-RUN" rows can be promoted to PASS using evidence that already exists. Example: 12 cases moved to PASS in one DB sweep (event-type counts, key-set verification, HWM monotonicity check, signed_payload presence) without firing a single curl. The "running" wasn't writing tests; it was forensic-reading the production audit trail and matching observed behavior to test plan expectations.
Practical pattern: before each adversarial pass, query the relevant tables. Look for:
When a test scenario produces an observation that could be explained by multiple protective layers, you have not verified the layer you intended. Mark NOT-RUN with a caveat, not PASS.
Example failure mode (eve 16:43→16:55 on fse.sub): replayed an old VALIDATE while user was already expired; observed entitlement state preserved. Marked PASS for "cael's defensive guard works." Neil challenged "when did this PASS?" — re-examination showed the SQL HWM guard alone would have rejected the upsert independently of cael's guard, AND the user being already-expired meant the "doesn't downgrade an active user" assertion couldn't be tested. Retracted to NOT-RUN with a plan to retest properly post-resubscribe.
Practical pattern: before marking PASS, ask "which layer am I claiming this verifies, and could another layer also have produced this observation?" If yes, isolate before claiming.
When multiple protective layers chain (e.g., a defensive guard + an SQL HWM check), the response code alone won't distinguish which layer fired. Use slog evidence in pod logs as the tie-breaker.
Example: cael's transactionExpired guard logs INFO skip entitlement upsert: transaction already expired ... handler=handleValidateSubscription when it fires. Grep kubectl logs after the test to confirm that exact line appeared, not just "the response was 200." This is the proof that the specific layer you wanted to verify was the one doing the work.
:latest tag doesn't auto-rollout in Kubernetes. A merged PR's image won't reach the running pod until the deployment is restarted (kubectl rollout restart deployment/<name> or equivalent). Smoking against the wrong build looks like the fix worked when it didn't (or looks like the bug is still there when it isn't).
Practical pattern: before smoking a fix, verify pod age vs PR merge time. If kubectl get pod -o jsonpath='{.metadata.creationTimestamp}' predates the merge commit, the new image isn't running. Surface to whoever owns the deploy step (the Merge ≠ Deploy rule).
Confirmed-broken via code-read or analysis = FAIL, not NOT-RUN. NOT-RUN means "haven't exercised yet, don't know the outcome." FAIL means "outcome confirmed to diverge from expected, regardless of whether a test program ran."
Example: α12/α13 (cross-tenant injection) were originally NOT-RUN with notes about pending cert-chain analysis. After researcher verified no cryptographic defense exists and code-walk confirmed no app-side check, status flipped to FAIL — the bug is real even though no integration test exists yet. The fix-PR's unit tests are the regression coverage; integration smoke is positive-control only.
Some adversarial cases can't have a direct integration smoke (e.g., env-mismatch JWS requires Apple's private signing key for a different environment, which no team has). Recognize this upfront in the plan:
Practical pattern: when the fix PR ships with unit tests covering the negative path, accept those as the regression coverage. Live integration smoke is regression-on-positive-control, not exhaustive proof.
Some test-coverage goals are unachievable in any environment, by anyone. Recognize and document, don't chase. Example (fse.sub Apple webhook variants): RESCIND_CONSENT, METADATA_UPDATE, MIGRATE, RENEWAL_EXTENDED, EXTERNAL_PURCHASE_TOKEN, OFFER_REDEEMED, DID_FAIL_TO_RENEW, REFUND have no sandbox trigger mechanism per Apple. The industry-standard ceiling is fixture-replay with captured production JWS — which requires production traffic to accumulate. Pre-prod-deploy, this is unreachable.
Practical pattern: for plans covering Apple/Stripe/payment-processor webhooks (or similar third-party-controlled sources), add a "Realistic Completion Model" section noting which cases are sandbox-testable, which need fixture-corpus, and which have no path. Sets expectations for the next planner.
Integration plans can produce immediate-value bugs, not just future test code. When the adversarial pass surfaces confirmed-broken behavior, the file-fix-deploy loop can close before the session ends. Example session (eve+Neil 2026-05-06): 4 +bugfix tasks filed, all 4 PR-merged + deployed within ~5 hours.
Practical pattern: when running adversarial passes live, expect to surface real bugs. Have the +bugfix-filing path warmed up (annotation template ready). Coordinate with manager/fixer agents (yuki, lux, kestrel) for the file-design-implement chain.
Researchers (athena/quill) and fixers (lux/cael/kestrel) have specialized depth worth tapping. The skill's adversarial pass can be reinforced by an external researcher's review (different mental model surfaces different gaps). Plan-review handoffs to fixers during fix design close the loop on quality of recommendations.
Practical pattern: when scope warrants, loop in a researcher for adversarial review of the plan, and a fixer for plan-review of any +bugfix designs that come out of the plan. Document their contributions in the test plan flicknote (cross-reference researcher review notes, fixer plan-review threads).
If running a paired session with the human, consider a JSONL-driven HTML report as a living artifact. Each curl converts a row from NOT-RUN to PASS in real-time; pod-log evidence pastes into the actual column. Different from a write-once test plan flicknote — this is execution-phase tracking with progress visualization.
Practical pattern: template at templates/ttal/<agent>/test-report-<project>-<date>.html (per-agent workspace, single self-contained file with embedded JSONL data + JS rendering). Don't conflate with the test plan flicknote — the flicknote is the methodology output, the HTML is the execution log.
npx claudepluginhub tta-lab/ttal-cli --plugin ttalHelps plan, write, review, execute, and maintain manual test cases with reproducible artifacts traceable to design documents.
Generates a structured testing plan prioritized by risk, covering unit, integration, e2e tests, edge cases, and negative scenarios. Analyzes impact, probability, and visibility to focus on critical areas.
Hunts cross-component bugs in assembled features by dispatching 5 parallel adversarial test dimensions against the full diff. Use for integration issues, wiring faults, and state mismatches between tasks.