A lot of empirical papers are, underneath, a search. You pick a sample, a window, a set of controls,
a way to cluster the errors, and you keep the version that comes out significant. Then you write the
story forward as if that path was the only sensible one. Econoclast plays the other side of that
game. It re-derives the numbers in the paper, hunts for the choices that were quietly made, finds and
re-runs the data when the data is public, and tells you how much of the headline actually survives.
You are meant to ask once and let it finish. Inside Claude Code or Codex you can just say "check this
paper for me" with a link, and the agent works out what it needs, asks you in plain words for anything
missing (you do not need to know any of the tech), runs the whole thing, and explains the result in
plain language. From a terminal it is one command:
econoclast verify https://arxiv.org/abs/2401.12345
That downloads the paper, runs the offline statistical checks and the model critique, researches any
method it does not already cover, looks in the paper for a public dataset, downloads it, works out
which regression is the headline result, and re-runs it across hundreds of defensible specifications.
Out comes a fragility score and a list of specific, quotable problems.
Two layers
The first layer is a set of statistical checks that run offline with no API key and no network.
statcheck recomputes every p-value from its test statistic. GRIM and GRIMMER catch means and standard
deviations that are impossible for integer data. p-curve, z-statistic bunching at 1.96, TIVA, Benford,
and terminal-digit tests look at the shape of the reported numbers. These are arithmetic, so a flag
here is hard to argue with.
The second layer is a set of adversarial critiques written by a model: specification search,
cherry-picked samples and windows, weak identification, missing robustness checks, hypotheses that
look invented after the fact, and claims the evidence does not support. Every finding has to quote the
paper, a mechanical check confirms the quote is really there, and a separate referee pass turns the
pile into one verdict.
You can stop at the first layer (instant, free) or add the second with any backend: OpenAI,
Anthropic, Google, OpenRouter, a local model through Ollama, or your existing Claude Code or Codex
subscription with no separate key.
Install
One command installs everything and registers the agent tool:
curl -fsSL https://raw.githubusercontent.com/shoal-rat/econoclast/main/install.sh | bash
Or with pip:
pip install "econoclast[all] @ git+https://github.com/shoal-rat/econoclast"
econoclast setup # detect your backend, write the config, register the MCP tool
Try the offline checks on the bundled demo. No keys needed:
econoclast forensics examples/demo_paper.txt
Minimum Wages and Teen Employment (synthetic)
Test Verdict N Summary
statcheck suspicious 2 2/2 reported p-values disagree with the recomputed value, 2 flip
significance at .05
grim suspicious 1 1/1 reported means are impossible for integer data of that N
grimmer suspicious 1 1/1 (mean, SD, N) triples are impossible for integer data
p-curve suspicious 6 p-curve is flat/left-skewed: consistent with p-hacking
caliper suspicious 18 test statistics bunch just above the significance thresholds
Commands