A lot of empirical papers are, underneath, a search. You pick a sample, a window, a set of controls, a way to cluster the errors, and you keep the version that comes out significant. Then you write the story forward as if that path was the only sensible one. Econoclast plays the other side of that game. It re-derives the numbers in the paper, hunts for the choices that were quietly made, finds and re-runs the data when the data is public, and tells you how much of the headline actually survives.

You are meant to ask once and let it finish. Inside Claude Code or Codex you can just say "check this paper for me" with a link, and the agent works out what it needs, asks you in plain words for anything missing (you do not need to know any of the tech), runs the whole thing, and explains the result in plain language. From a terminal it is one command:

econoclast verify https://arxiv.org/abs/2401.12345

That downloads the paper, runs the offline statistical checks and the model critique, researches any method it does not already cover, looks in the paper for a public dataset, downloads it, works out which regression is the headline result, and re-runs it across hundreds of defensible specifications. Out comes a fragility score and a list of specific, quotable problems.

Two layers

The first layer is a set of statistical checks that run offline with no API key and no network. statcheck recomputes every p-value from its test statistic. GRIM and GRIMMER catch means and standard deviations that are impossible for integer data. p-curve, z-statistic bunching at 1.96, TIVA, Benford, and terminal-digit tests look at the shape of the reported numbers. These are arithmetic, so a flag here is hard to argue with.

The second layer is a set of adversarial critiques written by a model: specification search, cherry-picked samples and windows, weak identification, missing robustness checks, hypotheses that look invented after the fact, and claims the evidence does not support. Every finding has to quote the paper, a mechanical check confirms the quote is really there, and a separate referee pass turns the pile into one verdict.

You can stop at the first layer (instant, free) or add the second with any backend: OpenAI, Anthropic, Google, OpenRouter, a local model through Ollama, or your existing Claude Code or Codex subscription with no separate key.

Install

One command installs everything and registers the agent tool:

curl -fsSL https://raw.githubusercontent.com/shoal-rat/econoclast/main/install.sh | bash

Or with pip:

pip install "econoclast[all] @ git+https://github.com/shoal-rat/econoclast"
econoclast setup     # detect your backend, write the config, register the MCP tool

Try the offline checks on the bundled demo. No keys needed:

econoclast forensics examples/demo_paper.txt

                          Minimum Wages and Teen Employment (synthetic)
 Test       Verdict      N   Summary
 statcheck  suspicious   2   2/2 reported p-values disagree with the recomputed value, 2 flip
                             significance at .05
 grim       suspicious   1   1/1 reported means are impossible for integer data of that N
 grimmer    suspicious   1   1/1 (mean, SD, N) triples are impossible for integer data
 p-curve    suspicious   6   p-curve is flat/left-skewed: consistent with p-hacking
 caliper    suspicious  18   test statistics bunch just above the significance thresholds

econoclast

Popularity

What's Inside

README

Two layers

Install

Commands

Confidence

Similar Plugins

ui-design

nanobanana

llm-council-plugin

product-management