Search everything...

Stats

Actions

Available In

proofreader

Name: proofreader
Author: binarybison

By binarybison

Proofreader: an agentic LLM tool for pre-submission self-review of formal proof-based papers. Evaluate paper quality, audit proofs, hunt counterexamples agentically, stress-test findings via an independent defender/arbiter chain, and draft LaTeX or Markdown briefs for revision. Grounded in real-time systems but extensible via domain packs.

npx claudepluginhub binarybison/proofreader --plugin proofreader

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands3

/diff-proofread

Compare Proofreader audit outcomes between two versions of a paper. Shows what was fixed, what regressed, what's new, and what's unchanged -- useful for revision rounds, before/after responding to reviewers, or tracking convergence across drafts.

/proofread

End-to-end Proofreader pass on a paper. Evaluates, audits flagged proofs (one per result), hunts counterexamples in fresh subagents, stress-tests findings via fresh defender + arbiter subagents, and writes the results into a structured report directory.

/stress-test-defense

Dispatches a fresh defender subagent and a fresh arbiter subagent to stress-test an audit finding. Each runs in isolated context for genuine independence — defender doesn't know about the arbiter, arbiter wasn't involved in producing the defense.

Agents4

arbitrate-finding

/arbitrate-finding

Independent adjudicator for an audit-vs-defense dispute over a formal result. Spawn as a fresh subagent after both `audit-proof` and `defend-finding` have produced their outputs. The agent has no stake in either side, reads the paper independently, and renders a true/false-positive verdict with a flaw taxonomy. Returns a structured Markdown report.

defend-finding

/defend-finding

Author-perspective defense of a paper against a proof-audit finding. Spawn as a fresh subagent so the defense is built independently of the eventual arbiter — and so the defender has no incentive to soften its arguments in anticipation of being rebutted. Returns a structured Markdown defense. Use immediately before `arbitrate-finding` whenever an audit (and optional counterexample) needs adversarial review.

find-counterexample

/find-counterexample

Agentic counterexample hunt for a formal result flagged as potentially flawed. Spawn this as a fresh subagent so the iterative Python script-writing, candidate-testing, and debugging output stays out of the main conversation. Returns a structured Markdown report. Use when the orchestrator (or user) needs to try to break a specific theorem/lemma — e.g. after `audit-proof` produces a `likely_flawed` verdict, or when the user explicitly asks to "find a counterexample to Lemma 4" / "try to break Theorem 3".

verify-restatement

/verify-restatement

Cross-paper claim verification. When a paper restates a theorem or lemma from a cited prior work (e.g., "Theorem 1 (Liu-Layland)"), this agent fetches the cited source (with explicit user permission), locates the original statement, and compares against the restatement to detect subtle changes in preconditions, conclusion strength, or quantifier scope. Spawn as a fresh subagent so the web-fetch + reading workload stays out of the main context. Returns a structured comparison report.

Skills5

annotate-latex

/annotate-latex

Inject Proofreader audit findings as comment-only annotations into the paper's LaTeX source so the author can review them in their editor next to the actual proof text. Output is plain `%` comments that do NOT affect the rendered PDF and do NOT require any new packages. Produces a unified diff for review before applying. Triggers on "annotate my LaTeX with the findings", "add audit comments to my .tex", "patch my source with the audit results".

audit-proof

/audit-proof

Deep audit of a single proof (theorem, lemma, corollary, or proposition). Use this when the user wants to scrutinize one specific formal result — checking logical gaps, assumption consistency, boundary cases, dependency correctness, and quantifier scope. Triggers on phrases like "audit the proof of Theorem N", "is this proof correct", "check the proof of Lemma M", "find issues in this proof".

evaluate-paper

/evaluate-paper

First-pass author-facing review of a real-time systems (or related formal) paper. Use this when the user wants an overall quality assessment, score sheet, or an inventory of every formal result with per-result verdicts before submission. Triggers on phrases like "review my paper", "first pass on this draft", "score this submission", "list every theorem and flag the shaky ones".

prepare-paper-context

/prepare-paper-context

Normalize a paper input (PDF, LaTeX source, multi-file LaTeX project, or pre-extracted text) into a clean structured representation that downstream Proofreader skills can consume. Use this once at the start of any non-trivial Proofreader session so the audit, counterexample, defender, and writeup skills don't each re-parse the source. Especially useful for LaTeX projects, where it preserves theorem-environment fidelity that PDF extraction loses. Triggers on "prepare this paper", "set up context for my .tex source", "extract theorems from this project", or when the user supplies a .tex / project root.

writeup-finding

/writeup-finding

Draft a clean, self-contained technical brief documenting a correctness finding — usable to share with coauthors, to record in a project log, or to drop into a paper revision as the basis for a fix. Produces LaTeX or Markdown depending on the user's preference. Use when the user says "write this up", "draft a finding for X", "write a brief on the Theorem N issue", or after `audit-proof` + `find-counterexample` + `stress-test-defense` have produced material to synthesize.

Stats

Version0.1.0

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitJun 2, 2026

AddedMay 16, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

binarybison

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

Proofreader

An agentic LLM tool for pre-submission self-review of formal proof-based papers. Designed for the author who wants to find the holes in their own proofs before a reviewer — or a later reader — does.

The name is a deliberate double entendre: the tool proofreads, in the copy-editor's sense, by reading proofs, in the formal sense.

"Proofreading" here means scrutinizing the correctness of the formal arguments — not just typo-hunting, and not mechanized proof checking in Lean / Rocq / Coq. This is human-language proof scrutiny, executed by an LLM with optional Python code execution for counterexample search. Proofreader screens; it does not certify. Its output is stochastic and neither sound nor complete — it can flag results erroneously, and it can miss flaws entirely. We have nonetheless found it useful, including in cases where independent authors confirmed the findings.

What it does

Three inline skills (run in the main conversation), four subagents (fresh isolated context), and three slash-command orchestrators.

Inline skills

Skill	What it does
`evaluate-paper`	First-pass read: quality scores, flags, complete inventory of theorems/lemmas with per-result verdicts. One call per paper.
`audit-proof`	Deep audit of one theorem or lemma. Lists issues by severity. The orchestrator calls this once per flagged result (mirroring the original pipeline).
`writeup-finding`	Produces a clean LaTeX or Markdown brief of a finding — share with coauthors or keep as a record.

Subagents

These run in fresh, isolated contexts. The main conversation dispatches them and receives a single report back. The structural independence matters for correctness (see Why subagents? below).

Agent	What it does
`find-counterexample`	Adversarial CX hunt for a flagged result. Writes and runs Python to verify. Isolated context keeps the noisy iteration out of the main conversation.
`defend-finding`	Mounts the strongest legitimate defense of the paper against an audit finding. Fresh context means the defender has no idea about the eventual arbiter — and no incentive to soften its case in anticipation. May fetch cited references with explicit user permission.
`arbitrate-finding`	Impartial adjudicator. Reads paper + audit + defense (+ optional counterexample) with fresh eyes and renders a true/false-positive verdict with a flaw taxonomy.
`verify-restatement`	When the paper restates a theorem from a cited source (`Theorem 1 (Liu-Layland)`), fetches the original (with permission) and compares — detects precondition drift, conclusion strengthening, quantifier-scope changes that propagate through the paper's downstream proofs.

Slash commands

Command	What it does
`/proofread <paper.pdf>`	Full pipeline: `evaluate-paper` → `audit-proof` (per flagged result) → `find-counterexample` (per likely-flawed audit, in subagents) → `defend-finding` + `arbitrate-finding` (in subagents) → `writeup-finding`. Single Markdown report at the end.
`/stress-test-defense <result>`	Dispatches the defender subagent, then the arbiter subagent, then synthesizes their outputs. Use when you want to gut-check whether an audit finding is real.
`/diff-proofread <old> <new>`	Compares two versions of a draft and reports what was fixed, what regressed, what's new, and what's unchanged. Use it across revision rounds to confirm fixes took and to catch regressions early.

Why subagents?

The defender, arbiter, and counterexample-hunter are subagents (not inline skills) for one reason: structural independence. The original paper-evaluation pipeline got independence for free by making each role a separate API call; the plugin recreates that property via fresh-context subagents.

The defender has an asymmetric incentive to defend. If it knew the arbiter would later rebut it in the same context, it would soften its case. Fresh context preserves the asymmetric incentive.
The arbiter brings genuine independent judgment because it never produced the audit or the defense — it reads both as documents.
The counterexample hunt is long, iterative, and Python-heavy. Isolating it keeps the noise out of the main conversation.

If your tool doesn't support subagents (Codex/Gemini in some configurations), the orchestrators degrade gracefully and mark the report independence: degraded.

Every skill has a mode knob:

rigorous (default): flags issues clearly with severity, always suggests a fix or follow-up. Reviewer tone for journal/conference feedback.
adversarial: red-team the paper. Don't give yourself the benefit of the doubt. Useful for self-review where you want to break your own work before someone else does.

You set the mode by including mode: adversarial (or mode: rigorous) in your request, e.g. "Audit the proof of Theorem 3 in adversarial mode."

View full README on GitHub

proofreader

Popularity

What's Inside

Confidence

README

Proofreader

What it does

Inline skills

Subagents

Slash commands

Why subagents?

Similar Plugins

unity-dev-toolkit

creative-writing

dotnet-skills

everything-claude-code

Proofreader

What it does

Inline skills

Subagents

Slash commands

Why subagents?

Popularity

Health & Quality

Similar Plugins

unity-dev-toolkit

creative-writing

dotnet-skills

everything-claude-code

r-skills

fullstack-dev-skills