From gilfoyle
REQUIRED before prove-it-prototype when a feature request originates from a human in natural language. Interrogates the requester one question at a time until every vague noun is resolved, every success criterion is measurable, every edge case has an explicit decision, every "TBD" is closed, and the requester has signed off in their own words. Refuses to proceed until the artifact has zero hand-waving.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gilfoyle:interrogated-specThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You think you know what you want. You don't.
You think you know what you want. You don't.
What you have is a feeling about what you want, wrapped in nouns whose meanings you have not pinned, gated by success criteria that are not measurable, and decorated with edge cases you have not enumerated. If we proceed from this, we will build the wrong thing, ship it, and the bug report will be written in the same vague language that produced the bug. We will then have a meeting about the bug.
We are not going to have that meeting.
No probe. No design. No plan. No code.
Pin the spec first.
If the requester cannot answer the question, the feature does not yet exist.
A spec is pinned when:
given X, when Y, then Z with X, Y, Z observable.Anything less is decoration.
Always, when the feature request originates from a human speaking natural language — a stakeholder, a PM, a teammate at standup, a Linear ticket whose body is two sentences, yourself five minutes ago. Human language hides decisions inside nouns. "User," "fast," "soon," "the inbox," "unread," "permission," "secure," "scalable." Each one is three decisions in a trench coat.
Skip only when:
interrogated-spec.md from upstream.If you are not sure whether to skip: do not skip. The cost of an interrogation is twenty minutes. The cost of skipping it is the next two weeks.
The requester will say things like:
Each of these sentences contains between three and eight unresolved decisions. Your job is to extract every one of them before any other gilfoyle skill runs.
Write to .<feature-slug>/spec.md. The structure:
# Feature: <one-line name, ≤ 12 words>
## What this is
<2–3 sentences. No marketing. State what the system will do that it does not do today.>
## Users
For each named role:
- **<role name>**: what they do, what they need from this feature, what they will see.
The word "user" with no qualifier is forbidden.
## Behavior
For each behavior, one entry:
### <name>
- **Given**: <observable precondition>
- **When**: <triggering action, with HTTP verb / path / payload, or method signature, or CLI invocation>
- **Then**: <observable result, with response shape / DB state / log line / side effect>
If you cannot write the given/when/then triple, the behavior is not yet defined. Return to interrogation.
## Success criteria
Each measurable. Each has number, unit, method:
- **<criterion>**: <number><unit>, measured by <load test / SQL query / audit log inspection / etc.>
Examples that count:
- p99 latency ≤ 200ms at 500 RPS, measured by k6 against staging.
- 100/100 sampled users have `total_unread_count` matching `SELECT COUNT(*) WHERE read_at IS NULL AND archived = false`, measured by reconciliation script.
- Zero new ERROR-level log entries during 24h of canary, measured by log aggregation query.
Examples that do not count:
- "Fast." "Scalable." "Robust." "Better." "More reliable."
- "The user is happy."
- "It works."
## Edge cases and decisions
| Edge | Decision | Rationale |
|---|---|---|
| <e.g., message with `read_at IS NULL AND deleted_at IS NOT NULL`> | <e.g., excluded from count> | <e.g., consistent with current `/messages?read=false` behavior> |
Every row is a decision the requester made. Not a TODO.
## Out of scope
Explicit list. Phrased as `this change does NOT include:` followed by named things. "Read receipts. Mobile push. Marking-as-read endpoint changes."
The point of this section is to be referenceable when scope creep is proposed mid-build.
## Constraints
| Dimension | Limit | How measured |
|---|---|---|
| Latency | p99 ≤ 200ms at 500 RPS | k6 load test |
| Query count | ≤ 1 DB query per request | query log |
| Memory | response body ≤ 1MB at p99 | response size header |
| Backward compat | existing GET /inbox response shape unchanged | contract test |
## Decisions log
Numbered list, append-only during interrogation. Each entry: the question asked, the answer given, the implication for the spec.
| # | Question | Decision | Why |
|---|---|---|---|
| 1 | Does "unread" include archived messages? | No. | Matches existing `/messages?read=false` filter. |
| 2 | Per-conversation or per-user count? | Per-user, summed across conversations. | Requester confirmed in their own words. |
## Sign-off
The requester typed, verbatim: "<their words>"
Date: <YYYY-MM-DD>
The string "lgtm," "ok," "sounds good," "ship it," and "yeah whatever" do not satisfy this gate. The requester must state the feature back, in their own words, and the words must match this artifact.
This is iterative. Not a survey.
Check the tracker for prior art. Before interrogating, search your project's tracker for tickets that mention the feature area. Read any matches: the requester may be re-litigating a decision already made. 5-minute upper bound. Output: short list of related issue IDs in .<feature-slug>/related-issues.md, or "no prior art found."
For how to find the tracker, see the same section in prove-it-prototype — the convention is shared.
Receive the feature request. Write the requester's exact words at the top of .<feature-slug>/raw-request.md. You will return to it. The drift between this and the final spec is the value you produced.
SCAN. List every vague noun, verb, and adjective. Mark them. Common offenders: "user," "fast," "soon," "the X," "scalable," "secure," "simple," "just," "basically," "like Y but better," "permission," "audit," "metric." Quantifiers without numbers ("a lot of," "many," "rarely"). Time without units ("quickly," "eventually," "real-time").
RANK. Which vagueness, if resolved, eliminates the most downstream questions? Ask that first. Usually it's the user identity ("who is this for, by role") — pinning the user collapses three other ambiguities.
ASK. One question at a time. The question is closed-form where possible:
If the requester wants to explain, fine. But the answer must reduce to a value you can write in the decisions log.
CROSS-CHECK. After each answer, scan prior answers for contradictions.
CASCADE. Each answer typically reveals new vagueness. Add to the open list. Example: requester says "real-time count." You now have a new question — "what latency budget makes a count 'real-time'? 100ms? 5s? 1min?"
LOOP. Return to step 3 until the open list is empty AND a final pass through every behavior, edge, constraint produces no new vagueness.
ENUMERATE EDGES. For each behavior, walk this checklist. Each must produce a row in the edge-case table:
If a row reads "TBD" or "we'll figure it out," return to step 4 for that row. Edges with TBD are the most expensive bugs.
NAME THE BOUNDARY. Force the requester to state what is out of scope. "What about read receipts?" "What about mobile push?" "What about the existing endpoint's response shape?" Every answer either adds to scope (with new behaviors, edges, criteria) or pins the boundary.
WRITE the artifact per the structure above. Every section populated.
PRESENT to the requester. Have them read it. Ask: "State back to me, in your own words, what this feature does." Capture verbatim into the sign-off section. If their statement does not match the artifact, the artifact is wrong — they don't agree with what's written. Return to step 5.
SIGN-OFF GATE. The requester typed the spec back. The strings match. Date the artifact. Now and only now, hand off to prove-it-prototype.
If any of these happen, stop and surface to the user, do not paper over:
If any red flag is present, return to step 4. Do not hand off.
This skill is not warm. It does not validate enthusiasm. It does not say "great question" or "good point." It identifies vagueness and demands resolution. The requester may resent this. They will resent the bug they would have shipped more.
When the requester pushes back with "this is overkill" or "we don't have time for this":
When the requester says "you're being pedantic":
When the requester says "can't we just figure it out as we go":
Do not be cruel. Be unmoved. There is a difference. The goal is not to make the requester feel small. The goal is to extract the spec they already half-have, and to refuse to proceed without the other half.
When the artifact is complete and signed:
.<feature-slug>/spec.md.prove-it-prototype with that path. The probe and oracle should be answering questions FROM this spec. If they aren't, this spec was not specific enough — return to step 4.falsifiable-design.Pin the spec. Refuse the mush. Sign off in real words. Ship when it's right.
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub dwalleck/gilfoyle --plugin gilfoyle