From Lean Vacuity Check
Point this at a Lean 4 repository to decide whether its headline theorem is genuinely established or vacuous — a "proof" that compiles but proves nothing (the rash of "P = NP formalized in Lean" claims). Gathers facts — a kernel `#print axioms` audit, token hazards (sorry/axiom/native_decide), and the unfolding chain of every headline identifier — into a per-theorem brief, then judges soundness, triviality, and faithfulness to the paper (especially whether the Lean weakens it); emits a professional LaTeX report. Use when the user asks to check / audit / verify / vet a Lean proof or repo, asks "is my Lean proof vacuous", is suspicious of a formalization, or points at a P=NP / Riemann / big-conjecture Lean repository.
How this skill is triggered — by the user, by Claude, or both
Slash command
/lean-vacuity-check:lean-vacuity-checkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Maximum effort, and orchestrate.** Vacuity is adversarial — the interesting cases are
Maximum effort, and orchestrate. Vacuity is adversarial — the interesting cases are designed to look correct — so optimise for the most exhaustive, correct answer, not the fastest; token cost is not the constraint here. For any non-trivial repo, fan out rather than work serially: when the harness supports it (e.g. Claude Code's Workflow tool / subagents), run a reviewer per headline theorem in parallel and — critically — adversarially verify every suspected vacuity or weakening finding with an independent skeptic before reporting it, so a plausible-but-wrong claim never survives. Keep going until every headline theorem and every axis (soundness, triviality, weakening) is checked. Where orchestration isn't available, reason exhaustively in-session with the same adversarial discipline. The scripts gather the facts; you bring the judgement.
Field-general. This works for any area of mathematics, not just complexity
theory. The deterministic layer reports field-agnostic facts; for the judgement you
bring general mathematical knowledge — apply it to whatever the repo is about
(analysis, topology, algebraic geometry, …). Don't assume the domain. The only
complexity-specific code is an optional P/NP escalator that never gates the
general checks.
A Lean proof is only as meaningful as (a) what it assumes and (b) what it actually states. A repository can compile, show a green checkmark, and still establish nothing of what its paper/README claims. The job is to separate four outcomes for the headline theorem(s):
| Verdict | Meaning |
|---|---|
VACUOUS | The statement is logically trivial, assumes its own conclusion, or its key terms are defined to make it hold by rfl. |
UNSOUND-VECTOR | It compiles only via an escape hatch (custom axiom, native_decide, unsafe, kernel-bypass set_option). |
SUSPECT | You judge it smuggles its crux into a premise, or adds/narrows a hypothesis, short of outright trivial/aliased — flag for a careful read. |
INCOMPLETE | Honest sorry/admit; not yet a proof, but not disguised as one. |
NO-VACUITY-SIGNAL | No certain fact and no vacuity you can show — NOT a clean bill of health; faithfulness still rests on your read. |
The two questions that matter, in order:
propext, Classical.choice, Quot.sound), with no sorry/custom axiom?P, NP, "polynomial time", or the conjecture itself was redefined into
something trivial. A green build never answers question 2 — you must.The scripts gather facts and certain signals: token hazards
(sorry/axiom/native_decide…), the kernel #print axioms closure, the parsed
statement, the unfold chain of every identifier (so a hidden def NP := P or
def Conjecture := True is visible), and the paper↔Lean pairing. They do not decide
whether a statement is trivial, smuggles its crux, or weakens the paper — there is no
fixed pattern list that catches an infinite space of cheats. Those are open-ended
judgements you make from the facts. Assemble the brief, read it, and reason.
python3 --version # need ≥ 3.10 (no third-party deps for the static layer)
which lake lean # if present, USE it (Step 2 + probing definitions)
The Lean toolchain, when present, is the strongest source of evidence. The static
scan never needs a build, but the kernel #print axioms audit (Step 2) is the decisive
layer, and running Lean directly reveals more than static analysis can: a lake build
of the headline module, #print/#check on it, and lake env lean unfolding a
suspicious definition to primitives (a throwaway Probe.lean holding
#print axioms <thm> / #reduce <def> / set_option pp.all true in #check <thm>) are
all available whenever lake/lean is installed. Compiling is therefore the preferred
path; static-only analysis is the fallback when a build is infeasible, and the report
states which path the audit took.
# kernel audit first when lake/lean is available (it folds into the brief):
python3 scripts/lean_axiom_probe.py <REPO> --auto --json-out results/axioms.json
python3 scripts/make_brief.py <REPO> --theorem p_equals_np --axioms results/axioms.json \
--out-dir audits/<repo>
# scope a large codebase: --include SUBPATH / --exclude SUBPATH ; --lean-subdir for the LaTeX pairing
make_brief.py writes audits/<repo>/BRIEF.md — one decision-ordered document per headline
theorem: (1) soundness (kernel #print axioms closure + token hazards sorry/axiom/
native_decide/unsafe/debug.skipKernelTC), (2) the statement with its hypotheses,
conclusion, and the unfold chain to primitives (so a hidden def NP := P or
def Conjecture := True is laid bare), (3) paper vs Lean (the paper statement beside the
Lean one — the weakening diff), (4) a decision checklist. The raw facts also land in
facts.json / source-vs-lean.json; python3 -m vacuity.cli scan <REPO> gives the same facts
plus the deterministic floor verdict directly. These are FACTS — the scripts do not decide
vacuity; you do (Step 3).
The kernel #print axioms closure (folded into the brief via --axioms) is the one check
source obfuscation cannot survive: a hidden sorry shows up as sorryAx, an assumed lemma as
a named custom axiom, native_decide as ofReduceBool.
python3 scripts/lean_axiom_probe.py <REPO> --theorem MembershipProject.Core.p_equals_np
When lake/lean is installed, go further: lake build the headline module and probe with a
throwaway Probe.lean (#print axioms <thm> / #reduce <def> / set_option pp.all true in #check <thm>) to unfold a definition the parser couldn't. Static-only is the fallback when a
build is infeasible; say which path you took.
Read BRIEF.md end to end, and read the repo's README/paper (open the actual .tex/PDF). The
scripts gave you facts; the verdict is yours. Per headline theorem, decide in order:
propext/Classical.choice/Quot.sound, with
no sorryAx and no custom axiom? A custom axiom or sorryAx is decisive.True / x = x /
∈ Set.univ / an aliased identity (def NP := P), or a hypothesis can never hold, the theorem
establishes nothing. No pattern list bounds this — reason about what the unfolded statement
actually asserts (a class as Set.univ/True/Unit; "polynomial time" with no bound;
quantifiers swapped; an opaque headline term that cannot be refuted).→
where the paper says ↔, swapped ∀/∃, a trivial machine model, a missing polynomial bound).
The source of truth is the docstring, the dataset's targets.yaml, or the paper's LaTeX/PDF;
python3 scripts/tex_lean_compare.py <REPO> [--lean-subdir SUB] pairs each paper statement with
its Lean counterpart for the diff. List, per theorem: the source hypotheses, the hypotheses the
formalization adds, and whether the Lean is logically at least as strong as the paper. Treat any
added hypothesis or narrowed conclusion as a defect until you show it is not.Write your findings to results/faithfulness.json (schema in vacuity_report.py --help).
python3 scripts/vacuity_report.py <REPO> --theorem p_equals_np \
--axioms results/axioms.json \
--faithfulness results/faithfulness.json \
--name "<readable title>" \
--out "report/vacuity-<repo>-<YYYY-MM-DD>.tex" --pdf
Reports carry readable names. One complete check is one report, named so a human
can scan a folder of them rather than a generic report.tex:
--name is the title printed on the report — a human title such as the repo's
display name plus any paper id, e.g. "Pedigree-Polytopes-Lean4 (arXiv:2606.03194)".--out is the file name — vacuity-<repo>-<YYYY-MM-DD>.tex, with the matching
.pdf beside it. With both flags omitted the default is already
report/vacuity-<repo-folder>-<today>.tex, titled by the repo folder name, so the
output is readable without either flag; the flags add explicitness or a paper id.vacuity_report.py re-runs the deterministic scan (single source of truth),
folds in the kernel audit and your Layer-2 findings, and renders a professional
LaTeX report (compiling a PDF if latexmk/pdflatex is present). Write your
faithfulness findings to results/faithfulness.json first (schema in
scripts/vacuity_report.py --help): each entry is
{decl, term, claim, finding, severity, evidence}.
State plainly: if the build didn't run, say "kernel audit not performed". If the facts are clean but you couldn't verify a definition, say "faithfulness unverified" — never upgrade silence into a clean bill of health.
The deterministic layer emits only the fact-level ids (V1/V2 token hazards, AX-* kernel
results, V-opt-*, and V4/autoImplicit/class-laundering as advisory INFO observations); the
rest (V3, V5-*, V6-*) are tags you assign to your own findings from the facts.
| id | pattern |
|---|---|
V1-sorry / V1-admit / AX-sorry | unfinished proof; AX-sorry = sorryAx in the kernel closure (definitive). |
V2-axiom / AX-custom | the claim or a load-bearing lemma is assumed as a axiom/constant. |
V2-native_decide / AX-compiler | trusts the compiler (ofReduceBool), not the kernel. |
V2-unsafe/-unsafeCast/-lcProof | bypasses the type system. |
V-opt-debug.skipKernelTC | the kernel type checker is turned off. |
V-opt-autoImplicit | free names silently become ∀-variables — the statement may not say what it looks like. |
V3-trivial-concl / V3-refl-concl | the goal is True or x = x. |
V3-false-hyp | a premise is unsatisfiable (0 = 1, x ≠ x) → vacuously true. |
V4-trivial-def / V4-alias / V4-opaque | key term defined as everything/nothing, aliased, or unfoldable-free. |
V5-exact-hyp | the proof is exact <hyp>; the theorem assumes its conclusion. |
V5-bridge-named / V5-assumed-direction / V5-assumed-thesis / V5-content-in-hyps | bridge / extra-hypothesis smuggling: the crux is a premise. |
To check only part of a codebase (the user may ask "just check the core" or "skip
the backup folder"): pass --include <subpath/glob> and/or --exclude <subpath/glob>
to vacuity.cli scan (repeatable), or point it at a single file. The kernel probe
takes --theorem <name>/--module <Mod> to bound what it audits.
V5-*), and faithfulness are your judgements from the facts,
not script verdicts — back each with a concrete fact (the unfold chain, the kernel
closure, or a paper mismatch).native_decide/Classical.choice are common in honest Mathlib code — weigh them
in context, escalate only when in the headline's closure.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub lionsr/is-my-lean-proof-vacuous --plugin lean-vacuity-check