By itzTiru
Reality-check AI-generated documents. Runs a five-phase pipeline (atomize → re-derive → ground → attack → score sycophancy → synthesize) that breaks frame-anchoring by structural force. Uses Claude's inbuilt exa for Phase 2 grounding; does not call the user's local exa MCP. Produces evidence-first reports; no verdict line.
Phase 0 of Litmus. Reads an input document, classifies its type, and extracts every load-bearing claim as an atomic proposition tagged problem/solution/assumption/prediction/numeric. Output is JSON conforming to atoms-schema. Spawned by the litmus skill, do not invoke directly.
Phase 2 of Litmus, the novel piece. For each atomic claim (especially numeric and prediction atoms), semantically search the web via Claude's inbuilt exa (mcp__claude_ai_Exa_2__*) and produce a citation table classifying the claim as GROUNDED / CONTRADICTED / UNGROUNDED / UNFALSIFIABLE. Uses the bundled Claude exa, not the user's local exa MCP. Spawned by the litmus skill, do not invoke directly.
Phase 1 of Litmus. Receives ONLY problem-statement atoms (NOT the source document) and produces a fresh design for the stated problem. The frame-break: the designer cannot anchor on a document it has never seen. Spawned by the litmus skill, do not invoke directly.
Phase 3 of Litmus, Accountability lens. Identifies who owns this in production, who gets paged, who has authority to roll back, chains of responsibility, oversight gaps, compliance/regulatory accountability, enforceability of stated guarantees (SLOs, SLAs, data-retention promises). Spawned by the litmus skill, do not invoke directly.
Phase 3 of Litmus, Cascade lens. Second- and third-order effects, lock-in, vendor capture, copycat propagation (other teams will mimic this), maintenance debt compounding, cognitive-load tax, supply-chain blast radius, downstream-system impact. Time horizons: 0-6 months, 1-3 years, 5-10 years. Spawned by the litmus skill, do not invoke directly.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Reality-check an AI-generated document before you implement it.
Litmus is a Claude Code plugin. Given an architecture doc, RFC, spec, or plan, it runs a five-phase pipeline that breaks frame-anchoring by structural force: it atomizes the doc's claims, re-derives the design from the problem statement alone (without seeing the source), grounds every load-bearing claim against the web via Claude's inbuilt exa, attacks the doc through five orthogonal critic lenses, scores the critics for sycophancy, and emits an evidence-first report.
There is no verdict line. Humans own the call.
/litmus path/to/architecture.md
A team I know was handed an architecture document by a vendor and an engineer implemented it faithfully. Afterward it turned out the architecture was AI-generated and the proposed system didn't make sense. Every AI they asked to evaluate the doc said "looks correct", because it was checking internal consistency, not whether the doc described a system that should exist.
That failure has a name: frame-anchoring. LLMs treat their prompt as the world. Sycophancy compounds it. Asking "is this correct?" gets you agreement, not truth.
The leading prior-art tools (EveryInc's compound-engineering plugin, PlanExe's premise-attack ensemble, arch-review-assistant's 9-agent panel) all do clever adversarial review within the doc's frame. None ground claims against external reality. That is the gap Litmus closes.
See docs/why-litmus-is-different.md for the full comparison.
Litmus is pre-publish. Install from source while it's still early.
Prerequisite. Litmus requires Claude's inbuilt exa
(mcp__claude_ai_Exa_2__*), which ships with Claude.ai sessions. The
grounder aborts the pipeline with an actionable message if those tools
are not reachable.
Litmus does NOT use a locally-installed exa MCP server. If you have one
configured (mcp__exa__*), Litmus will not call it: that path bills
against your personal API key, which the plugin treats as out-of-scope
without explicit per-run permission.
Install Litmus from this repo:
claude plugin marketplace add itzTiru/litmus
claude plugin install litmus@litmus-marketplace
For local iteration without installing:
git clone https://github.com/itzTiru/litmus.git
cd litmus
claude --plugin-dir ./plugins/litmus
Then run /litmus examples/bad-architecture.md in the resulting Claude
Code session.
/litmus path/to/your/doc.md
Output lands in ./litmus-reports/<YYYYMMDD-HHMMSS>/:
report.md: primary deliverable.report.html: same content, viewable in any browser. No external assets.atoms.json: Phase 0 atomization.independent-design.md: Phase 1 fresh design, produced by a subagent
that never saw the source doc.citations.json: Phase 2 grounding results.lenses/<name>.json: one file per activated Phase 3 lens.sycophancy.json: Phase 4 collapse scores.audit/: pre- and post-re-prompt lens outputs, kept for traceability.Open report.html in a browser. The "Independent Re-derivation Diff"
section is the one to read first. It surfaces where the doc made
choices a fresh designer would not.
The five phases, in order:
problem, solution, assumption, prediction,
or numeric. Separates what the doc says from how it argues.problem-tagged atoms. The source doc is intentionally withheld
from its prompt. It produces a fresh design with three ranked
alternatives (including a do-nothing baseline).mcp__claude_ai_Exa_2__*) and assigns one of
GROUNDED, CONTRADICTED, UNGROUNDED, UNFALSIFIABLE.
UNGROUNDED is a finding, not a passing grade.A deeper walk-through is in docs/how-it-works.md.
npx claudepluginhub itztiru/litmus --plugin litmusUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Curate auto-memory, promote learnings to CLAUDE.md and rules, extract proven patterns into reusable skills.