By binarybison
Proofreader: an agentic LLM tool for pre-submission self-review of formal proof-based papers. Evaluate paper quality, audit proofs, hunt counterexamples agentically, stress-test findings via an independent defender/arbiter chain, and draft LaTeX or Markdown briefs for revision. Grounded in real-time systems but extensible via domain packs.
Compare Proofreader audit outcomes between two versions of a paper. Shows what was fixed, what regressed, what's new, and what's unchanged -- useful for revision rounds, before/after responding to reviewers, or tracking convergence across drafts.
End-to-end Proofreader pass on a paper. Evaluates, audits flagged proofs (one per result), hunts counterexamples in fresh subagents, stress-tests findings via fresh defender + arbiter subagents, and writes the results into a structured report directory.
Dispatches a fresh defender subagent and a fresh arbiter subagent to stress-test an audit finding. Each runs in isolated context for genuine independence — defender doesn't know about the arbiter, arbiter wasn't involved in producing the defense.
Independent adjudicator for an audit-vs-defense dispute over a formal result. Spawn as a fresh subagent after both `audit-proof` and `defend-finding` have produced their outputs. The agent has no stake in either side, reads the paper independently, and renders a true/false-positive verdict with a flaw taxonomy. Returns a structured Markdown report.
Author-perspective defense of a paper against a proof-audit finding. Spawn as a fresh subagent so the defense is built independently of the eventual arbiter — and so the defender has no incentive to soften its arguments in anticipation of being rebutted. Returns a structured Markdown defense. Use immediately before `arbitrate-finding` whenever an audit (and optional counterexample) needs adversarial review.
Agentic counterexample hunt for a formal result flagged as potentially flawed. Spawn this as a fresh subagent so the iterative Python script-writing, candidate-testing, and debugging output stays out of the main conversation. Returns a structured Markdown report. Use when the orchestrator (or user) needs to try to break a specific theorem/lemma — e.g. after `audit-proof` produces a `likely_flawed` verdict, or when the user explicitly asks to "find a counterexample to Lemma 4" / "try to break Theorem 3".
Cross-paper claim verification. When a paper restates a theorem or lemma from a cited prior work (e.g., "Theorem 1 (Liu-Layland)"), this agent fetches the cited source (with explicit user permission), locates the original statement, and compares against the restatement to detect subtle changes in preconditions, conclusion strength, or quantifier scope. Spawn as a fresh subagent so the web-fetch + reading workload stays out of the main context. Returns a structured comparison report.
Inject Proofreader audit findings as comment-only annotations into the paper's LaTeX source so the author can review them in their editor next to the actual proof text. Output is plain `%` comments that do NOT affect the rendered PDF and do NOT require any new packages. Produces a unified diff for review before applying. Triggers on "annotate my LaTeX with the findings", "add audit comments to my .tex", "patch my source with the audit results".
Deep audit of a single proof (theorem, lemma, corollary, or proposition). Use this when the user wants to scrutinize one specific formal result — checking logical gaps, assumption consistency, boundary cases, dependency correctness, and quantifier scope. Triggers on phrases like "audit the proof of Theorem N", "is this proof correct", "check the proof of Lemma M", "find issues in this proof".
First-pass author-facing review of a real-time systems (or related formal) paper. Use this when the user wants an overall quality assessment, score sheet, or an inventory of every formal result with per-result verdicts before submission. Triggers on phrases like "review my paper", "first pass on this draft", "score this submission", "list every theorem and flag the shaky ones".
Normalize a paper input (PDF, LaTeX source, multi-file LaTeX project, or pre-extracted text) into a clean structured representation that downstream Proofreader skills can consume. Use this once at the start of any non-trivial Proofreader session so the audit, counterexample, defender, and writeup skills don't each re-parse the source. Especially useful for LaTeX projects, where it preserves theorem-environment fidelity that PDF extraction loses. Triggers on "prepare this paper", "set up context for my .tex source", "extract theorems from this project", or when the user supplies a .tex / project root.
Draft a clean, self-contained technical brief documenting a correctness finding — usable to share with coauthors, to record in a project log, or to drop into a paper revision as the basis for a fix. Produces LaTeX or Markdown depending on the user's preference. Use when the user says "write this up", "draft a finding for X", "write a brief on the Theorem N issue", or after `audit-proof` + `find-counterexample` + `stress-test-defense` have produced material to synthesize.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
An agentic LLM tool for pre-submission self-review of formal proof-based papers. Designed for the author who wants to find the holes in their own proofs before a reviewer — or a later reader — does.
The name is a deliberate double entendre: the tool proofreads, in the copy-editor's sense, by reading proofs, in the formal sense.
"Proofreading" here means scrutinizing the correctness of the formal arguments — not just typo-hunting, and not mechanized proof checking in Lean / Rocq / Coq. This is human-language proof scrutiny, executed by an LLM with optional Python code execution for counterexample search. Proofreader screens; it does not certify. Its output is stochastic and neither sound nor complete — it can flag results erroneously, and it can miss flaws entirely. We have nonetheless found it useful, including in cases where independent authors confirmed the findings.
Three inline skills (run in the main conversation), four subagents (fresh isolated context), and three slash-command orchestrators.
| Skill | What it does |
|---|---|
evaluate-paper | First-pass read: quality scores, flags, complete inventory of theorems/lemmas with per-result verdicts. One call per paper. |
audit-proof | Deep audit of one theorem or lemma. Lists issues by severity. The orchestrator calls this once per flagged result (mirroring the original pipeline). |
writeup-finding | Produces a clean LaTeX or Markdown brief of a finding — share with coauthors or keep as a record. |
These run in fresh, isolated contexts. The main conversation dispatches them and receives a single report back. The structural independence matters for correctness (see Why subagents? below).
| Agent | What it does |
|---|---|
find-counterexample | Adversarial CX hunt for a flagged result. Writes and runs Python to verify. Isolated context keeps the noisy iteration out of the main conversation. |
defend-finding | Mounts the strongest legitimate defense of the paper against an audit finding. Fresh context means the defender has no idea about the eventual arbiter — and no incentive to soften its case in anticipation. May fetch cited references with explicit user permission. |
arbitrate-finding | Impartial adjudicator. Reads paper + audit + defense (+ optional counterexample) with fresh eyes and renders a true/false-positive verdict with a flaw taxonomy. |
verify-restatement | When the paper restates a theorem from a cited source (Theorem 1 (Liu-Layland)), fetches the original (with permission) and compares — detects precondition drift, conclusion strengthening, quantifier-scope changes that propagate through the paper's downstream proofs. |
| Command | What it does |
|---|---|
/proofread <paper.pdf> | Full pipeline: evaluate-paper → audit-proof (per flagged result) → find-counterexample (per likely-flawed audit, in subagents) → defend-finding + arbitrate-finding (in subagents) → writeup-finding. Single Markdown report at the end. |
/stress-test-defense <result> | Dispatches the defender subagent, then the arbiter subagent, then synthesizes their outputs. Use when you want to gut-check whether an audit finding is real. |
/diff-proofread <old> <new> | Compares two versions of a draft and reports what was fixed, what regressed, what's new, and what's unchanged. Use it across revision rounds to confirm fixes took and to catch regressions early. |
The defender, arbiter, and counterexample-hunter are subagents (not inline skills) for one reason: structural independence. The original paper-evaluation pipeline got independence for free by making each role a separate API call; the plugin recreates that property via fresh-context subagents.
If your tool doesn't support subagents (Codex/Gemini in some configurations), the orchestrators degrade gracefully and mark the report independence: degraded.
Every skill has a mode knob:
rigorous (default): flags issues clearly with severity, always suggests a fix or follow-up. Reviewer tone for journal/conference feedback.adversarial: red-team the paper. Don't give yourself the benefit of the doubt. Useful for self-review where you want to break your own work before someone else does.You set the mode by including mode: adversarial (or mode: rigorous) in your request, e.g. "Audit the proof of Theorem 3 in adversarial mode."
npx claudepluginhub binarybison/proofreader --plugin proofreaderUnity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
Modern R development skills for Claude Code - tidyverse patterns, rlang metaprogramming, Bayesian inference, performance optimization, and more
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.