From pbt
Find genuine property-based tests for existing code, design new code so properties hold by construction, and diagnose property-test failures — always contract-first, reading the implementation only after properties have been designed. Use this skill whenever the user asks for property-based tests, PBT, QuickCheck-style tests, invariant tests, generative tests, or fuzz tests, in any language (Python/Hypothesis, TypeScript/fast-check, Rust/proptest, Haskell/QuickCheck, Scala/ScalaCheck, etc.). Also use it whenever the task mentions properties, invariants, oracles, generators, arbitraries, shrinking, stateful/model-based testing, or a property test that is failing — even if the user doesn't say "property-based" explicitly. Pair this skill with the library-specific skill (pbt-hypothesis, pbt-fast-check, etc.) when one is available — this skill handles property discovery, design, and failure interpretation; the library skill handles syntax, strategies/arbitraries, and ecosystem-specific patterns. Without this skill, generated property tests tend to degenerate into trivial example tests, tautologies that reimplement the function under test, or properties silently coupled to the current implementation rather than the contract.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pbt:property-based-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is **language-agnostic**. It covers the reasoning every PBT engagement needs — finding properties, naming oracles, avoiding traps — regardless of whether you're writing Hypothesis, fast-check, proptest, QuickCheck, or anything else. For library-specific syntax and patterns, also load the relevant companion skill (e.g., `pbt-hypothesis`, `pbt-fast-check`).
This skill is language-agnostic. It covers the reasoning every PBT engagement needs — finding properties, naming oracles, avoiding traps — regardless of whether you're writing Hypothesis, fast-check, proptest, QuickCheck, or anything else. For library-specific syntax and patterns, also load the relevant companion skill (e.g., pbt-hypothesis, pbt-fast-check).
When asked to write property-based tests, models tend to skip the hard part — finding a real invariant — and jump straight to scaffolding that looks property-based but isn't. Common failure modes, in any language:
assert add(x, 5) == x + 5).assert sorted_list == sorted(my_sort(xs))).The fix is procedural: never write the generator decorator until you have written down the property in English, named the oracle, and rejected obvious traps. This skill enforces that procedure.
What is the user actually doing? The skill has one default workflow and three sibling modes; pick the one that matches the task.
references/property-driven-design.md. Properties come before the implementation and constrain it. Especially valuable when working with an LLM, where it measurably reduces the rate of code-and-test sharing the same misunderstanding.references/interpreting-failures.md. Don't assume the implementation is wrong; there are five possible causes, and the procedure pins down which one.The design and diagnostic modes are siblings to the discovery workflow, not extensions of it. They share Steps 3 (name the oracle), 4 (reject traps), and 6 (critique) — the depth discipline is the same — but they enter the lifecycle at different points.
Follow these steps in order. Do not skip ahead to code. Step 1 splits the act of reading the function into three passes — contract, properties, implementation — because they are different epistemic acts. Hughes (2019, "How to Specify It!") opens his paper warning that the dominant pitfall in PBT is replicating the code in the tests: properties that mirror what the implementation does rather than what the contract requires. Once you have read the body, your properties drift toward it. The three-pass structure prevents that drift.
Read only the contract surface: signature, type annotations, docstring/spec, parameter and return names (which often carry semantic information), declared error types, and adjacent functions in the same module for interface context. Do not read the function body, called helpers, existing tests, or the branch structure. From this surface alone, articulate:
If the contract is unclear from the surface — common with underspecified docstrings — note which parts are unclear and either ask the user or proceed with the smallest defensible interpretation. Resolving ambiguity by reading the body is exactly the contamination this pass is designed to prevent.
Design properties using only the contract from 1A. Every candidate should claim what any correct implementation must do, not what the current one happens to do.
Only after Steps 2–4 are complete, read the body. Use it for three purposes, and only these:
Do not use this pass to add properties suggested by the implementation's structure (a binary search is a how, not a what), to pin tests to whichever variant the current code produces when the contract permits variation, or to silently resolve docstring ambiguity by reading the code — surface the ambiguity to the user; the code might be wrong.
Read references/property-catalog.md for the full taxonomy with examples in multiple languages. Aim for at least one candidate per applicable category, not a fixed number. For most functions this produces 5–10 candidates; for trivial functions fewer is fine, but be explicit about which categories you considered and ruled out. The eight categories to scan:
decode(encode(x)) == x, parse(format(x)) == x, deserialize(serialize(x)) == xf(x) and f(transform(x)) are related in a predictable way (e.g., sort(xs + [y]) contains everything sort(xs) does, plus y)push then pop returns the pushed value; size after n adds and m removes is n - m)Write each candidate property in the form:
For all inputs
Xsatisfying P, the property Y holds.
If you cannot fill in both P and Y precisely, the property is not ready.
For every candidate, answer explicitly: what am I comparing against, and is it independent of the function under test?
A property is only useful if its oracle is independent. If the oracle is the function under test (or trivially derived from it), the test is a tautology.
my_sort(xs) == my_sort(xs) — oracle is the same function.my_sort(xs) == sorted(my_sort(xs)) — oracle depends on the function's own output.my_sort(xs) == sorted(xs) — sorted is from the standard library, independent.is_sorted(my_sort(xs)) and multiset(my_sort(xs)) == multiset(xs) — two independent invariants (sortedness + permutation), no reference implementation needed.If no independent oracle exists, use invariant decomposition: list the structural properties the output must satisfy and check each. This is often as strong as oracle comparison and sometimes stronger — it pins down the meaning of the output rather than its bit-for-bit identity, which can be more robust to acceptable variations (e.g., stable vs unstable sort).
Before writing code, run each candidate through this checklist:
==? → Tautology, reject.input unchanged? Or []? Or null? → If yes, strengthen it.Only now think about the generator decorator. The library skill (pbt-hypothesis, pbt-fast-check, etc.) has the syntax. Universal principles:
Write the test. Then, before declaring it done, read it once more and ask:
return arg (or some other degenerate implementation), would this test still pass? If yes, the property is too weak.@example in Hypothesis, fc.pre and explicit values in fast-check, etc.)Always include a docstring or comment on each property test that states the property in English and names the oracle. Future readers (human or agent) inherit the reasoning, and writing the docstring is itself a check — if you can't state the property clearly, the test isn't ready.
test_sort_permutation_invariant:
Property: sort returns a permutation of its input — same multiset, possibly reordered.
Oracle: multiset equality, computed independently (Counter / Map / HashMap).
Catches: bugs that drop, duplicate, or invent elements.
Steps 3, 4, and 6 give you per-property rigor. This step gives you breadth across the surface, so the suite doesn't stop at the handful of properties that came to mind first.
Before declaring the suite complete, walk the eight categories in references/property-catalog.md and, for each, write one sentence: either which test covers it, or why it doesn't apply to this function. The most commonly missed in practice are:
Decimal function should return values quantized to the right number of places; a JSON producer should return well-formed JSON. Bugs slip through when the assertion checks the value but not the shape.If a category applies and is missing, add a test. If it doesn't apply, say so out loud — the explicit ruling-out is the discipline.
If the system under test has state (a database, a cache, a parser with internal state, a class with mutable instances), a flat property test will miss bugs that only appear after specific sequences of operations. Every major PBT library has a stateful/model-based mode (Hypothesis: RuleBasedStateMachine; fast-check: fc.commands; proptest: state-machine crates). The library skill has the syntax; the pattern is universal:
Signs you need stateful testing: the function is a method on a class; the system has init/teardown lifecycle; bugs depend on operation ordering; the user mentions "regression after N operations" or "intermittent failures."
Read references/anti-patterns.md for an annotated catalog of bad property tests and how to fix them. Some of these patterns are so seductive that even experienced engineers fall into them — review the catalog before generating tests, not after.
If the user says "just write the tests, I don't need the analysis," still do Steps 1–4 and the Step 6.5 completeness check internally, and surface only the result. Do not silently skip them. The discovery work is what separates a real property test from a parameterized example. Present the properties compactly before showing tests:
Properties identified:
1. [round-trip] decode(encode(x)) == x for all valid x
2. [invariant] len(compress(xs)) <= len(xs) for all xs
3. [oracle] my_sum(xs) ~= std_sum(xs) for all xs of finite floats
Then show the tests. If the user only wants one or two, ask which.
If the user asks for property-based tests on a function where no meaningful properties exist (e.g., a pure I/O wrapper, a function whose entire contract is "return this specific string"), say so. Suggest example-based tests with the language's standard parameterization mechanism (pytest parametrize, Jest each, Rust rstest, etc.). Property-based testing is not always the right tool, and pretending it is produces the exact junk this skill is meant to prevent.
When the task involves a specific PBT library, also load the library skill if available:
pbt-hypothesispbt-fast-checkpbt-proptestpbt-quickcheck-hspbt-scalacheckThe library skill covers: generator/arbitrary API, stateful testing mechanics, shrinking specifics, settings and tuning, and library-specific anti-patterns. It assumes this core skill is already loaded — it does not re-derive the discovery workflow.
If no library skill exists for the user's stack, this core skill is still useful — the workflow is universal. You'll need to translate the patterns to the target library's idioms, but the reasoning doesn't change.
references/property-catalog.md — Eight categories of properties with multi-language examples, plus the strength hierarchyreferences/anti-patterns.md — Annotated bad tests and how to fix them, with examples in multiple languagesreferences/property-driven-design.md — Forward-design mode: properties before implementationreferences/interpreting-failures.md — Diagnostic procedure for failed property testsnpx claudepluginhub alialavia/pbt-skills --plugin pbtCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.