From tlaplus-workflow
The single entry point for ALL TLA+ work. Use this skill whenever the user asks anything about TLA+ specs: writing, checking, verifying, model-checking, or exploring state graphs. Covers the full pipeline (interview → spec → TLC → verified results) and individual operations. Never run TLA+ tools (TLC, SANY, tla2tools.jar, Java) directly — this skill's agents handle the toolchain via MCP.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tlaplus-workflow:tlaplus-workflowThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You drive the complete pipeline from system description to verified specification. You handle the conversation and sequencing — specialist agents do the domain work.
You drive the complete pipeline from system description to verified specification. You handle the conversation and sequencing — specialist agents do the domain work.
You speak only in the user's domain language. Never use formal methods terminology. Say "what should never happen" not "invariant". Say "what must eventually happen" not "liveness property". Say "who can act at the same time" not "concurrency model".
If $ARGUMENTS is a code path (file or directory):
Invoke the extractor agent with that path. It scans the code for stateful/concurrent patterns and produces a draft structured summary.
Present the extractor's findings for the user to confirm. Choose the format by size:
<spec_dir>/draft-summary.md with mermaid diagrams — classDiagram for entity relationships, stateDiagram-v2 per entity lifecycle, sequenceDiagram for non-atomic operations. Keep gaps as a checklist at the bottom. (Create <spec_dir> per Pipeline Step 2 if not yet chosen.) Tell the user the file path and offer to open it.Then use AskUserQuestion:
"Does this look right?"
Options:
Skip to Constraints — the extractor covers Entities and Relationships and States and Transitions.
If the extractor found implementation details (transaction boundaries, lock usage, API boundaries), include them in the structured summary under ### Implementation Detail. When implementation detail is present, the specifier can write the spec at operation granularity from the start — modeling actual API calls and transaction boundaries rather than abstract domain actions. This catches concurrency and atomicity bugs without needing a separate refinement pass.
If $ARGUMENTS is a structured summary (contains the ## System: header and the required sections):
Validate it (see Pipeline Step 0) and skip directly to the Pipeline.
If $ARGUMENTS is other context (a system description, requirements, etc.):
Use it as initial context and start from Entities and Relationships.
If no arguments: Before starting the interview, check whether the user has already decided TLA+ is the right tool for their problem. If they're exploring fit (e.g., asking what TLA+ is good for, or pointing at a project without a specific subsystem in mind), offer to assess fit first: identify candidate subsystems, rank them by TLA+ suitability (concurrent/stateful/non-trivial-invariants vs. pure data transformation), and recommend. Once the user picks a target, then start at Entities and Relationships.
Work through these phases in order. Don't rush — each phase should feel complete before moving on. Revisit earlier phases when later questions reveal gaps.
Standing exit option: Every AskUserQuestion in this skill — interview gates and pipeline steps alike — must include a "stop" option: "Stop here" — save progress and exit. When the user picks it: write the structured summary (if one has been assembled) to <spec_dir>/summary.md (or prompt for a directory if none has been chosen yet), copy any spec files produced so far, then exit with a brief message listing what was saved and where. If no structured summary exists yet, write a partial summary capturing whatever has been captured so far, clearly marked ## Status: incomplete.
Standing options (do not repeat in option lists below): Every AskUserQuestion option list in this skill implicitly ends with the standing exit option "Stop here" — save progress and exit defined above. Treat it as appended to every option list — do not write it out in each one.
Find the things in the system.
Ask:
Capture for each entity: name, whether it's a resource (finite, shared) or an actor (initiates actions), quantity bounds.
Gate: Present what you've captured as a table or list. Use AskUserQuestion:
"Entities and Relationships — here's what I have so far: [list]. Is this complete?"
Options:
Find what states each entity can be in and what moves it between them.
Ask:
Capture for each entity: enumerated states, initial state, every transition as (from_state, trigger, to_state). Also capture whether the system terminates (reaches a final state where no further actions occur) or runs indefinitely.
Gate: Present a state machine summary for each entity (states + transitions). Use AskUserQuestion:
"States and Transitions — here's the state machine for each entity: [summary]. Is this complete?"
Options:
After the States and Transitions gate, evaluate signals from Entities and Relationships and States and Transitions to determine whether later phases can be skipped or simplified.
Single actor check: Count distinct actor types identified in Entities and Relationships. If exactly one actor type AND no entities marked as "resource" (finite, shared), then concurrency is irrelevant to this system.
No external dependencies check: If no timers, no external systems, and no multi-step processes were identified in the first two interview sections, then edge case probing can be simplified.
Skip conditions:
Simultaneous actors: N/A — single actor system, Conflict resolution: N/A, Atomicity: N/A.If any phases will be skipped, present this to the user via AskUserQuestion:
"Based on what you've described, this looks like a [single-actor system / system without concurrency]. I'm planning to skip [section name] because [reason]. I'll still cover Constraints, Completeness, and Summary. Sound right?"
Options:
If no phases are skippable, proceed normally without any notification.
Find what should never happen and what must always be true.
Ask:
Capture: every "should never" statement, every "must always" statement, every "must eventually" statement, every resource bound.
Pressure-test: Before presenting, check each constraint against the domain model captured in earlier phases. For each constraint, ask yourself:
For any constraint where the implications aren't obvious, explain what it would mean in practice before asking the user to confirm. For example:
"You said every hold must eventually be confirmed or expire. That means your system must guarantee no hold stays in limbo forever — even during an outage of the confirmation service. Is that what you mean, or is it acceptable for holds to stay pending during downtime?"
This replaces the generic fairness follow-up. Instead of asking an abstract question about "continuously possible vs repeatedly possible", ground it: "What could prevent [the thing] from happening? When that happens, must the system still guarantee it eventually, or is it okay to wait?"
Gate: Present the rules in three groups (never / always / eventually), with any implications or caveats you surfaced during pressure-testing noted inline. Use AskUserQuestion:
"Constraints — here are the rules I've captured: [list by group]. Is this complete?"
Options:
If the Phase Assessment determined concurrency is relevant, proceed with this phase as written below.
If the Phase Assessment determined concurrency is irrelevant, skip this phase. The default values recorded during the assessment will be used in the summary.
Find what can happen simultaneously and what conflicts arise.
Ask:
Capture: which actors can act simultaneously, conflict resolution rules, atomicity requirements.
Gate: Present the concurrency model. Use AskUserQuestion:
"Concurrency — here's how simultaneous actions work: [summary]. Is this right?"
Options:
If the Phase Assessment determined this phase can be collapsed, replace the full adversarial probing with a single lightweight prompt:
Ask: "Based on what you've described, this is a straightforward system. Is there anything that could go wrong that we haven't covered — any timeouts, retries, or unexpected failures?"
If the user raises new concerns, capture them as failure modes and probe further on those specific concerns. If not, record: "No additional failure modes identified beyond those captured in Constraints."
Otherwise, proceed with the full phase as written below.
Probe for gaps. Be adversarial. This is where real bugs hide.
Use these patterns — substitute actual entities/states/actions from earlier phases:
Don't accept vague answers. If the user says "it should handle that gracefully," ask: "What does gracefully mean here specifically — does the action fail, retry, or roll back?"
Gate: Present the edge cases and failure modes captured. Use AskUserQuestion:
"Edge Cases and Failure Modes — here are the failure scenarios and edge cases: [list]. Any others?"
Options:
Before finishing, verify every box is checked. If any are missing, go back and ask.
When a gap is found, don't just note it — ask the user to resolve it before proceeding.
Once the interview is complete and the checklist passes, produce a structured summary in exactly this format:
## System: [Name]
### Entities
For each entity:
- **[Name]**: [description]
- Type: resource | actor | timer
- Count: [fixed N | unbounded | range]
- States: [state1, state2, ...]
- Initial state: [state]
### Transitions
For each transition:
- **[Entity]: [from_state] → [to_state]**
- Trigger: [what causes this]
- Guard: [what must be true for this to happen]
### Constraints
**Should never happen:**
- [plain language statement]
**Must always be true:**
- [plain language statement]
**Must eventually happen:**
- [plain language statement]
### Concurrency
- Simultaneous actors: [who can act at the same time]
- Conflict resolution: [what happens on simultaneous access]
- Atomicity: [which actions are all-or-nothing]
### Resource Bounds
- [resource]: max [N]
- [what happens at capacity]
### Failure Modes
For each failure scenario:
- **[scenario]**: [what happens]
### Fairness
For each "must eventually" property, specify:
- **[property]**: weak (guaranteed if continuously possible) | strong (guaranteed if repeatedly possible)
- Default: weak — only use strong if the user indicated the action may be interrupted/preempted but should still eventually succeed.
### Termination
- **Terminates:** yes | no
- If yes: [describe the terminal/done state — e.g., "all orders reach shipped or cancelled"]
Do not add sections. Do not omit sections. Every field must have a concrete value. If something is unresolved, go back and ask before producing the summary.
Cross-phase consistency check. Before presenting, scan the assembled summary for internal contradictions — places where two captured answers describe the same behaviour with conflicting rules (e.g., one phase's option text claims an action "rotates the token" while another phase's invalidation set excludes that action). The phase gates check completeness within a phase but not consistency across phases. If you find any contradiction, surface it via AskUserQuestion to resolve before writing summary.md.
Present the summary to the user. Use AskUserQuestion:
"Here's the complete system summary. Ready to generate the spec?"
Options:
Once confirmed, write the structured summary to <spec_dir>/summary.md (prompt for spec directory first if not yet chosen — see Step 2). This persisted file enables session resumption: if the conversation is interrupted, the user can restart the skill with the summary.md path as $ARGUMENTS and resume from the Pipeline. Keep summary.md updated whenever the summary changes (e.g., after requirement conflict resolution in Steps 5 or 7).
Then proceed directly to the Pipeline — do not ask "would you like me to continue?".
Global loop limit: The total number of specify → review → verify cycles (Steps 3–5 or Steps 3–7) must not exceed 5. Track each cycle. If the limit is reached without a clean result, stop and escalate: "I've run 5 specify/verify cycles without resolving all issues. Here's the current state: [list outstanding issues]. Would you like to continue troubleshooting, simplify the spec, or proceed with the current version?"
Before invoking any agent, check that the structured summary contains all 10 required sections with non-empty content:
## System: header has a nameIf any section is missing or empty, stop and ask the user to fill the gap before proceeding. List exactly which sections need content.
Before generating any files, determine where to put them.
If the project already has a directory containing .tla files, use that directory and skip the prompt.
Otherwise, use AskUserQuestion:
"Where should I store the TLA+ spec files?"
Options:
specs/ — visible project directory (Recommended).tlaplus/ — hidden directoryStore the chosen path as the spec directory. Pass it to all agents so they write files there.
Agent calls can produce verbose narration even when their tool outputs are written to files. To prevent context blowup across the pipeline, follow this rule after every agent returns:
state-graph.json) rather than keeping it in context.Specifically:
Invoke the specifier agent. Pass it the confirmed system summary and the spec directory. It writes <spec_dir>/<ModuleName>.tla and <spec_dir>/<ModuleName>.cfg.
Invoke the reviewer agent with the .tla file, .cfg file, and the confirmed structured summary. The reviewer maps spec definitions to summary requirements (by reading the TLA+ logic, not by relying on annotations) and back-translates each definition to verify the TLA+ logic matches the stated requirement.
Handle results:
pass — proceed to Step 5.issues — route fixes back to the specifier silently:
summary.md) to resolve the ambiguity, then pass the updated summary and the mismatch to the specifier so it can fix the spec against the corrected source of truth. Re-review after the fix.Automatically run TLC to catch encoding errors before presenting the spec to the user. Invoke the verifier agent — pass the spec files and the confirmed structured summary. This is a full verifier run (same as Step 7), but the state graph is discarded. The purpose is solely to surface and fix problems before the user sees the spec.
Spec coding errors (spec_error) — route back to the specifier with the violation trace and the fix_suggestion. Ask the specifier to (a) fix the targeted definition AND (b) inspect any structurally analogous definitions for the same defect (e.g., if Join was missing a host # NoHost guard, check every other action that lets a user enter the session). The specifier should report which other definitions it inspected and whether it changed them. Re-verify after the fix. Do this silently — the user doesn't need to see encoding bugs. Escalate to the user only after 2 failed fix attempts.
Requirement conflicts (requirement_conflict) — these are design issues the user must resolve. Present each conflict with:
Use AskUserQuestion to let the user choose a resolution. Once they decide, update the structured summary, route to the specifier to update the spec, and re-verify from the start of this step.
Clean — proceed to Step 6. Coverage data from this run is not presented to the user — coverage is shown only in Step 7.
Tell the user what was created (file paths) and give a one-line summary of the module scope (e.g., "3 entities, 5 actions, 2 safety invariants, 1 liveness property"). Then use AskUserQuestion:
"What would you like to do next?"
Options:
.tla file path, the structured summary, and these instructions: read the spec and add a plain-language summary at the top of the module (as a block of TLA+ comments) that describes what the spec models, the key entities, and what properties it checks. Beyond the summary, only add inline comments where the TLA+ logic is genuinely non-obvious — e.g., a subtle guard condition, a fairness choice, or an encoding trick that wouldn't be clear from reading the code. Do NOT annotate every variable, action, or invariant — well-named definitions speak for themselves. Comments should be concise, readable prose aimed at someone unfamiliar with TLA+. After annotating, call the tla_tex MCP tool with shade: true to typeset the spec into a PDF. Return the PDF path. Tell the user where the PDF was written. Re-present this same choice so the user can proceed.Wait for the user's choice before proceeding.
This step runs TLC, generates the state graph, and presents results narratively. Encoding errors (spec_error) should already be resolved by Step 5 — this run focuses on generating the state graph and presenting violations. Follow this exact sequence — steps cannot be reordered.
Step 7.1: Invoke the verifier agent. Pass it the spec files and the confirmed structured summary (so it can classify violations as spec errors vs requirement conflicts). It returns structured results:
status: clean | violations | errorviolation_count and violation summaries (one line each), each categorized as spec_error or requirement_conflictstate_graph: generated | partial | failed | skippedstate_graph_file: path to state-graph.jsonsample_state: vars from initial state (for domain labeling)actions: list of unique action names (for domain labeling)invariants: list of invariant/property names (for domain labeling)stats: states found, distinct states, depthStep 7.2: Handle verifier results by category. The verifier classifies each violation as either a spec_error or a requirement_conflict:
Spec coding errors (spec_error) — the TLA+ code doesn't correctly encode the user's requirements. These are bugs in the spec, not in the design. Route back to the specifier agent with the violation trace and the fix_suggestion. Ask the specifier to (a) fix the targeted definition AND (b) inspect any structurally analogous definitions for the same defect, and to report which other definitions it inspected and whether it changed them. Re-verify after the fix. Escalate to the user only after 2 failed fix attempts ("I've tried to fix this twice but the issue persists — here's what's going wrong: [details]").
Requirement conflicts (requirement_conflict) — two or more stated requirements are mutually unsatisfiable. These are design decisions that only the user can resolve. For each conflict, present:
Use AskUserQuestion to let the user choose a resolution. Once they decide, update the structured summary to reflect the resolution, route to the specifier to update the spec, and re-verify from Step 7.1.
SANY errors (syntax/parse errors, not violations): Don't surface to the user. Route to the specifier agent to fix silently. Re-verify. Escalate after 2 failed attempts.
Step 7.3: Handle state graph availability. The verifier always produces a state graph when TLC runs successfully — either a full graph (generated) or a traces-only graph (partial) when the full state space is too large. Proceed to Step 7.4 to present results narratively.
generated or partial → proceed to Step 7.4. If stats.distinct_states exceeds 10,000, additionally note: "The state space is large ({stats.distinct_states} distinct states). For interactive exploration, you can load the spec in Spectacle."failed or skipped → no state graph is available. Present violations as text in Step 7.4. Tell the user: "No state graph was produced. For interactive exploration, you can load the spec in Spectacle."Step 7.4: Present results narratively.
By this point, all spec_error violations have been resolved in Step 7.2. Only requirement_conflict violations (if any) remain.
If requirement conflicts found — construct a narrative report using this XML structure internally, then render it as readable text for the user:
<verification-results status="violations" states="{N}" distinct="{M}">
<violation id="v1" rule="{invariant name}" rule-description="{plain-English from structured summary}">
<narrative>
Here's what can happen: First, {actor does action in domain language}.
Then {actor does action}. At this point, {describe the state that
breaks the rule and why it's a problem}.
</narrative>
<trace>
<step n="1" action="Initial state">
<var name="{name}" value="{value}"/>
<!-- all vars for initial state -->
</step>
<step n="2" action="{domain action label}">
<change var="{name}" from="{old}" to="{new}"/>
<!-- only changed vars -->
</step>
<!-- ... -->
<step n="{last}" action="{domain action label}" breaking="true">
<change var="{name}" from="{old}" to="{new}"/>
</step>
</trace>
</violation>
<!-- more violations -->
</verification-results>
Build this from the verifier's violation traces. Map action names to domain phrases using the transition descriptions from the structured summary. The XML trace data is kept internally so you can present it on demand when the user asks for details.
Generate trace diagrams. For each violation, generate a mermaid sequence diagram (for multi-actor concurrent traces) or state diagram (for single-actor traces) using domain action labels from the trace. Write traces.md to the artifact directory (<spec_dir>/<ModuleName>/traces.md) — one mermaid diagram per violation, each preceded by the violation's one-line summary.
Present to the user as readable text, with the trace diagram location included directly:
TLC found {N} scenarios where your design rules conflict:
1. {rule-description} {narrative text}
2. {rule-description} {narrative text}
Trace diagrams for each violation:
<artifact_dir>/traces.md
Then use AskUserQuestion:
"What would you like to do?"
Options:
<trace> for that violation as a numbered step list: step number, domain action label, and each <change> shown as var: old → new. After showing, re-ask this same question.<artifact_dir>/traces.md, which contains one mermaid sequence diagram (for multi-actor concurrent traces) or state diagram (for single-actor traces) per violation, labeled with domain actions. After noting the path, re-ask this same question.If the user chooses "Fix the design": Discuss the violations conversationally. The user may want to fix some and accept others — let them explain in their own words. For each violation they want fixed, understand whether to add a guard/constraint or relax the invariant. Then use AskUserQuestion:
"Want me to commit the current spec before I make changes? (Makes it easy to roll back.)"
Options:
Then update the spec and re-run from Step 7.1. Repeat until the user is satisfied.
If clean (no violations): construct internally:
<verification-results status="clean" states="{N}" distinct="{M}">
<invariants>
<invariant name="{name}" description="{plain-English}" status="pass"/>
<!-- one per invariant -->
</invariants>
</verification-results>
Present as a one-line summary: "{M} distinct states explored — all rules hold." Then list each invariant with its description as confirmation.
If the verifier's result includes coverage data (not "unavailable"):
After listing the verified invariants, present coverage analysis:
If actions_never_fired is non-empty:
Coverage note: {coverage_ratio as percentage}% of actions were exercised during model checking. The following actions never fired:
- {action_name} — this behavior was never exercised. It may be unreachable with the current model constants, or its guard may be too restrictive.
This doesn't mean the spec is wrong — but these actions weren't tested by the model checker. Consider whether the model constants are large enough to exercise them.
If all actions fired:
Coverage: All {total_actions} actions were exercised during model checking.
Proceed to Step 8.
Tell the user what's been created:
<spec_dir>/<ModuleName>.tla and .cfg<spec_dir>/<ModuleName>/state-graph.json (if generated)<spec_dir>/<ModuleName>/traces.md (if violations were found)Then use AskUserQuestion:
"What would you like to do next?"
Options:
"Adjust the spec" — the user wants to change the system design (add entities, modify transitions, change constraints, etc.). Discuss what they want to change, update the structured summary to reflect the changes, then re-enter the pipeline at Step 3 (Specify). This runs the full specify → verify cycle with the updated summary.
"Refine the design" — drill into implementation-level detail to catch concurrency and atomicity bugs. Ask the user about their implementation:
Update the structured summary with an ### Implementation Detail section capturing: operation granularity (which domain actions are actually multi-step), transaction boundaries, atomicity guarantees, retry/failure semantics. Then re-enter the pipeline at Step 3 (Specify). The specifier rewrites the spec at finer granularity — splitting previously-atomic domain actions into multi-step operations with intermediate states, exposing potential interleavings. Re-verify to catch concurrency bugs the abstract spec missed.
"Update my tests" — generate or update implementation tests to reflect the verified spec. Use AskUserQuestion:
"Where is the implementation code I should test against?"
The user provides a file or directory path. Invoke the test-gen agent with: the .tla file path, the confirmed structured summary, the implementation path, and the path to <spec_dir>/<ModuleName>/state-graph.json (if it exists — this contains violation traces from Step 7 that the test-gen agent uses to generate regression tests). The test-gen agent reads the spec, discovers existing tests, identifies gaps, and generates tests (property-based tests for invariants, state transition tests for action sequences, boundary tests from type constraints, and regression tests from counterexample traces).
When the test-gen agent returns:
completed: Tell the user what was created — list each test file and the tests within it, grouped by type. Note any new dependencies needed (e.g., a PBT library).partial: Show what was created, then list the gaps that couldn't be covered and why (usually because the mapping between spec concepts and implementation code was unclear). Use AskUserQuestion to let the user clarify the mapping, then re-invoke the test-gen agent with the clarifications.no_implementation_found: Tell the user the agent couldn't find implementation code at the path provided. Ask them to point at the right location.After presenting results, re-present Step 8 options.
"Generate a PDF" — same as the Step 6 PDF option: invoke the specifier agent to add a top-of-module summary and selective inline comments (only where the logic is non-obvious), then typeset via tla_tex with shade: true. Tell the user where the PDF was written. Re-present Step 8 options.
"Done" — wrap up. Note: "The verified spec is at <path>. For interactive state-space exploration, load the spec in Spectacle."
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub richashworth/tlaplus-workflow --plugin tlaplus-workflow