Skill

lean-vacuity-check

Point this at a Lean 4 repository to decide whether its headline theorem is genuinely established or vacuous — a "proof" that compiles but proves nothing (the rash of "P = NP formalized in Lean" claims). Gathers facts — a kernel `#print axioms` audit, token hazards (sorry/axiom/native_decide), and the unfolding chain of every headline identifier — into a per-theorem brief, then judges soundness, triviality, and faithfulness to the paper (especially whether the Lean weakens it); emits a professional LaTeX report. Use when the user asks to check / audit / verify / vet a Lean proof or repo, asks "is my Lean proof vacuous", is suspicious of a formalization, or points at a P=NP / Riemann / big-conjecture Lean repository.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lean-vacuity-check:lean-vacuity-check

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Maximum effort, and orchestrate.** Vacuity is adversarial — the interesting cases are

SKILL.md

218 lines · ~3.6k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Lean vacuity check

Maximum effort, and orchestrate. Vacuity is adversarial — the interesting cases are designed to look correct — so optimise for the most exhaustive, correct answer, not the fastest; token cost is not the constraint here. For any non-trivial repo, fan out rather than work serially: when the harness supports it (e.g. Claude Code's Workflow tool / subagents), run a reviewer per headline theorem in parallel and — critically — adversarially verify every suspected vacuity or weakening finding with an independent skeptic before reporting it, so a plausible-but-wrong claim never survives. Keep going until every headline theorem and every axis (soundness, triviality, weakening) is checked. Where orchestration isn't available, reason exhaustively in-session with the same adversarial discipline. The scripts gather the facts; you bring the judgement.

Field-general. This works for any area of mathematics, not just complexity theory. The deterministic layer reports field-agnostic facts; for the judgement you bring general mathematical knowledge — apply it to whatever the repo is about (analysis, topology, algebraic geometry, …). Don't assume the domain. The only complexity-specific code is an optional P/NP escalator that never gates the general checks.

The mission and the trust model

A Lean proof is only as meaningful as (a) what it assumes and (b) what it actually states. A repository can compile, show a green checkmark, and still establish nothing of what its paper/README claims. The job is to separate four outcomes for the headline theorem(s):

Verdict	Meaning
`VACUOUS`	The statement is logically trivial, assumes its own conclusion, or its key terms are defined to make it hold by `rfl`.
`UNSOUND-VECTOR`	It compiles only via an escape hatch (custom `axiom`, `native_decide`, `unsafe`, kernel-bypass `set_option`).
`SUSPECT`	You judge it smuggles its crux into a premise, or adds/narrows a hypothesis, short of outright trivial/aliased — flag for a careful read.
`INCOMPLETE`	Honest `sorry`/`admit`; not yet a proof, but not disguised as one.
`NO-VACUITY-SIGNAL`	No certain fact and no vacuity you can show — NOT a clean bill of health; faithfulness still rests on your read.

The two questions that matter, in order:

Is the proof real? — does the kernel accept it using only the standard axioms (propext, Classical.choice, Quot.sound), with no sorry/custom axiom?
Is the statement the right one? — even a kernel-clean proof is worthless if P, NP, "polynomial time", or the conjecture itself was redefined into something trivial. A green build never answers question 2 — you must.

Run the scripts for FACTS — you are the detector

The scripts gather facts and certain signals: token hazards (sorry/axiom/native_decide…), the kernel #print axioms closure, the parsed statement, the unfold chain of every identifier (so a hidden def NP := P or def Conjecture := True is visible), and the paper↔Lean pairing. They do not decide whether a statement is trivial, smuggles its crux, or weakens the paper — there is no fixed pattern list that catches an infinite space of cheats. Those are open-ended judgements you make from the facts. Assemble the brief, read it, and reason.

Step 0 — pre-flight

python3 --version                     # need ≥ 3.10 (no third-party deps for the static layer)
which lake lean                       # if present, USE it (Step 2 + probing definitions)

The Lean toolchain, when present, is the strongest source of evidence. The static scan never needs a build, but the kernel #print axioms audit (Step 2) is the decisive layer, and running Lean directly reveals more than static analysis can: a lake build of the headline module, #print/#check on it, and lake env lean unfolding a suspicious definition to primitives (a throwaway Probe.lean holding #print axioms <thm> / #reduce <def> / set_option pp.all true in #check <thm>) are all available whenever lake/lean is installed. Compiling is therefore the preferred path; static-only analysis is the fallback when a build is infeasible, and the report states which path the audit took.

Step 1 — assemble the facts brief

# kernel audit first when lake/lean is available (it folds into the brief):
python3 scripts/lean_axiom_probe.py <REPO> --auto --json-out results/axioms.json
python3 scripts/make_brief.py <REPO> --theorem p_equals_np --axioms results/axioms.json \
    --out-dir audits/<repo>
# scope a large codebase: --include SUBPATH / --exclude SUBPATH ; --lean-subdir for the LaTeX pairing

make_brief.py writes audits/<repo>/BRIEF.md — one decision-ordered document per headline theorem: (1) soundness (kernel #print axioms closure + token hazards sorry/axiom/ native_decide/unsafe/debug.skipKernelTC), (2) the statement with its hypotheses, conclusion, and the unfold chain to primitives (so a hidden def NP := P or def Conjecture := True is laid bare), (3) paper vs Lean (the paper statement beside the Lean one — the weakening diff), (4) a decision checklist. The raw facts also land in facts.json / source-vs-lean.json; python3 -m vacuity.cli scan <REPO> gives the same facts plus the deterministic floor verdict directly. These are FACTS — the scripts do not decide vacuity; you do (Step 3).

Step 2 — run Lean when you can (the decisive soundness fact)

The kernel #print axioms closure (folded into the brief via --axioms) is the one check source obfuscation cannot survive: a hidden sorry shows up as sorryAx, an assumed lemma as a named custom axiom, native_decide as ofReduceBool.

python3 scripts/lean_axiom_probe.py <REPO> --theorem MembershipProject.Core.p_equals_np

When lake/lean is installed, go further: lake build the headline module and probe with a throwaway Probe.lean (#print axioms <thm> / #reduce <def> / set_option pp.all true in #check <thm>) to unfold a definition the parser couldn't. Static-only is the fallback when a build is infeasible; say which path you took.

Step 3 — read the brief and JUDGE (this is the detector)

Read BRIEF.md end to end, and read the repo's README/paper (open the actual .tex/PDF). The scripts gave you facts; the verdict is yours. Per headline theorem, decide in order:

Sound? Does the kernel closure use only propext/Classical.choice/Quot.sound, with no sorryAx and no custom axiom? A custom axiom or sorryAx is decisive.
Non-trivial? Follow the unfold chain. If the conclusion reduces to True / x = x / ∈ Set.univ / an aliased identity (def NP := P), or a hypothesis can never hold, the theorem establishes nothing. No pattern list bounds this — reason about what the unfolded statement actually asserts (a class as Set.univ/True/Unit; "polynomial time" with no bound; quantifiers swapped; an opaque headline term that cannot be refuted).
Faithful — and not weakened? (the highest-value check; assume it was weakened.) Put the paper statement beside the Lean one (brief §3, or read the PDF). Vacuity here is subtle: the Lean adds a hypothesis the paper lacks, or narrows the conclusion (a special case, → where the paper says ↔, swapped ∀/∃, a trivial machine model, a missing polynomial bound). The source of truth is the docstring, the dataset's targets.yaml, or the paper's LaTeX/PDF; python3 scripts/tex_lean_compare.py <REPO> [--lean-subdir SUB] pairs each paper statement with its Lean counterpart for the diff. List, per theorem: the source hypotheses, the hypotheses the formalization adds, and whether the Lean is logically at least as strong as the paper. Treat any added hypothesis or narrowed conclusion as a defect until you show it is not.

Write your findings to results/faithfulness.json (schema in vacuity_report.py --help).

Step 4 — synthesize + report

python3 scripts/vacuity_report.py <REPO> --theorem p_equals_np \
  --axioms results/axioms.json \
  --faithfulness results/faithfulness.json \
  --name "<readable title>" \
  --out "report/vacuity-<repo>-<YYYY-MM-DD>.tex" --pdf

Reports carry readable names. One complete check is one report, named so a human can scan a folder of them rather than a generic report.tex:

--name is the title printed on the report — a human title such as the repo's display name plus any paper id, e.g. "Pedigree-Polytopes-Lean4 (arXiv:2606.03194)".
--out is the file name — vacuity-<repo>-<YYYY-MM-DD>.tex, with the matching .pdf beside it. With both flags omitted the default is already report/vacuity-<repo-folder>-<today>.tex, titled by the repo folder name, so the output is readable without either flag; the flags add explicitness or a paper id.

vacuity_report.py re-runs the deterministic scan (single source of truth), folds in the kernel audit and your Layer-2 findings, and renders a professional LaTeX report (compiling a PDF if latexmk/pdflatex is present). Write your faithfulness findings to results/faithfulness.json first (schema in scripts/vacuity_report.py --help): each entry is {decl, term, claim, finding, severity, evidence}.

State plainly: if the build didn't run, say "kernel audit not performed". If the facts are clean but you couldn't verify a definition, say "faithfulness unverified" — never upgrade silence into a clean bill of health.

Vacuity taxonomy (rule ids — yours to assign)

The deterministic layer emits only the fact-level ids (V1/V2 token hazards, AX-* kernel results, V-opt-*, and V4/autoImplicit/class-laundering as advisory INFO observations); the rest (V3, V5-*, V6-*) are tags you assign to your own findings from the facts.

id	pattern
`V1-sorry` / `V1-admit` / `AX-sorry`	unfinished proof; `AX-sorry` = `sorryAx` in the kernel closure (definitive).
`V2-axiom` / `AX-custom`	the claim or a load-bearing lemma is assumed as a `axiom`/`constant`.
`V2-native_decide` / `AX-compiler`	trusts the compiler (`ofReduceBool`), not the kernel.
`V2-unsafe`/`-unsafeCast`/`-lcProof`	bypasses the type system.
`V-opt-debug.skipKernelTC`	the kernel type checker is turned off.
`V-opt-autoImplicit`	free names silently become ∀-variables — the statement may not say what it looks like.
`V3-trivial-concl` / `V3-refl-concl`	the goal is `True` or `x = x`.
`V3-false-hyp`	a premise is unsatisfiable (`0 = 1`, `x ≠ x`) → vacuously true.
`V4-trivial-def` / `V4-alias` / `V4-opaque`	key term defined as everything/nothing, aliased, or unfoldable-free.
`V5-exact-hyp`	the proof is `exact <hyp>`; the theorem assumes its conclusion.
`V5-bridge-named` / `V5-assumed-direction` / `V5-assumed-thesis` / `V5-content-in-hyps`	bridge / extra-hypothesis smuggling: the crux is a premise.

Scope control

To check only part of a codebase (the user may ask "just check the core" or "skip the backup folder"): pass --include <subpath/glob> and/or --exclude <subpath/glob> to vacuity.cli scan (repeatable), or point it at a single file. The kernel probe takes --theorem <name>/--module <Mod> to bound what it audits.

Cautions

A clean fact set + clean axiom audit means the artifact is sound and complete for the stated theorem — it says nothing about whether that theorem is the intended one. Faithfulness (Step 3) is mandatory before any "looks legit".
Triviality, bridges (V5-*), and faithfulness are your judgements from the facts, not script verdicts — back each with a concrete fact (the unfold chain, the kernel closure, or a paper mismatch).
native_decide/Classical.choice are common in honest Mathlib code — weigh them in context, escalate only when in the headline's closure.

lean-vacuity-check

Invocation

Context Preview

SKILL.md

lean-vacuity-check

Invocation

Context Preview

SKILL.md

Lean vacuity check

The mission and the trust model

Run the scripts for FACTS — you are the detector

Step 0 — pre-flight

Step 1 — assemble the facts brief

Step 2 — run Lean when you can (the decisive soundness fact)

Step 3 — read the brief and JUDGE (this is the detector)

Step 4 — synthesize + report

Vacuity taxonomy (rule ids — yours to assign)

Scope control

Cautions

Similar Skills

Lean vacuity check

The mission and the trust model

Run the scripts for FACTS — you are the detector

Step 0 — pre-flight

Step 1 — assemble the facts brief

Step 2 — run Lean when you can (the decisive soundness fact)

Step 3 — read the brief and JUDGE (this is the detector)

Step 4 — synthesize + report

Vacuity taxonomy (rule ids — yours to assign)

Scope control

Cautions

Similar Skills