Stats

Actions

Available In

Tags

research-compiler

A research paper is a compressed implementation artifact. The detail a coding agent needs to actually reproduce it lives across the citation neighborhood — in the cited methods, datasets, baselines, prior architectures, evaluation protocols. research-compiler is a Claude Code plugin that compiles that neighborhood into a queryable Graph RAG store and a Karpathy-style llm-wiki, then teaches Claude — through skills and MCP tools — which lever to pull for which sub-task.

A single command turns one arXiv ID into ~430 papers, ~165 implementation atoms, ~19,000 indexed chunks (prose, tables, captions, equations), three detected research communities, and a regenerable wiki — all sitting in research/ next to your code, addressable by stable UIDs, served read-only over 26 MCP tools.

The hypothesis

PaperBench (2025) measured what we already suspected: frontier models reproduce papers at ~21%, and the dominant failure mode is missing implementation context — the kind of detail that lives one citation hop away. Existing paper-to-code systems start from the target PDF and ignore that neighborhood. We compile it.

The thesis being tested, in three operational conditions:

Condition

Setup

Predicted outcome

Claude Code + target paper PDF

baseline

Claude Code + compiled research.md brief

most of the lift

Claude Code + brief + Graph RAG MCP + wiki

+10pp over A, hallucination halved, atom coverage ≥1.5×, last 3-5pp of accuracy on cross-paper queries

The plugin succeeds or fails as that research claim. The evaluation rubric lives in docs/05-evaluation-plan.md.

What this adds to Claude Code

Three primitives — skills, MCP server, forked subagent — composed into a single plugin so the workflow is one prompt away.

┌────────────────────────────────────────────────────────────────────┐ │ Claude Code session (user-facing) │ │ /paper-compiler:build-research-context arxiv:2603.19312 │ │ /paper-compiler:use-research-context (auto-invoke) │ │ /paper-compiler:audit-against-research (auto-invoke) │ │ /paper-compiler:wiki-query / wiki-ingest / wiki-lint │ │ 15 mcp__paper-compiler__* tools │ └─────────┬──────────────────────────────────────────▲───────────────┘ │ queries │ structured evidence ▼ │ ┌────────────────────────────────────────────────────────────────────┐ │ paper-compiler MCP server (read-only) │ │ sqlite + sqlite-vec + FTS5; lazy-loaded; ~72 MB per paper │ └─────────┬──────────────────────────────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────────────────┐ │ research/ (lives in your repo, git-friendly) │ │ research.md — ≤ 8000-token brief │ │ research.db — Graph RAG store │ │ SCHEMA.md — DB schema reference for Claude │ │ evidence/<atom>.md — per-atom verbatim spans │ │ wiki/ — Karpathy llm-wiki (atoms, papers, │ │ communities, promoted answers, log) │ └─────────▲──────────────────────────────────────────────────────────┘ │ writes (compile-time only) │ ┌────────────────────────────────────────────────────────────────────┐ │ paper-compiler CLI — runs in background, progress monitored │ │ resolve → acquire → parse → expand → classify → atom-extract → │ │ score → render → build DB → communities → wiki │ └────────────────────────────────────────────────────────────────────┘

The strict separation is the point. The CLI never serves runtime queries; the MCP server never writes. A stale DB diagnosed without re-running a compile. A buggy tool replaced without re-acquiring papers. The wiki regenerated as a pure function of the DB.

What it does well (and the tasks it actually simplifies)

research-compiler

A research paper is a compressed implementation artifact. The detail a coding agent needs to actually reproduce it lives across the citation neighborhood — in the cited methods, datasets, baselines, prior architectures, evaluation protocols. research-compiler is a Claude Code plugin that compiles that neighborhood into a queryable Graph RAG store and a Karpathy-style llm-wiki, then teaches Claude — through skills and MCP tools — which lever to pull for which sub-task.

The hypothesis

The thesis being tested, in three operational conditions:

Condition	Setup	Predicted outcome
A	Claude Code + target paper PDF	baseline
B	Claude Code + compiled `research.md` brief	most of the lift
C	Claude Code + brief + Graph RAG MCP + wiki	+10pp over A, hallucination halved, atom coverage ≥1.5×, last 3-5pp of accuracy on cross-paper queries

The plugin succeeds or fails as that research claim. The evaluation rubric lives in docs/05-evaluation-plan.md.

What this adds to Claude Code

Three primitives — skills, MCP server, forked subagent — composed into a single plugin so the workflow is one prompt away.

┌────────────────────────────────────────────────────────────────────┐
│                  Claude Code session (user-facing)                 │
│   /paper-compiler:build-research-context arxiv:2603.19312          │
│   /paper-compiler:use-research-context     (auto-invoke)           │
│   /paper-compiler:audit-against-research   (auto-invoke)           │
│   /paper-compiler:wiki-query / wiki-ingest / wiki-lint             │
│   15 mcp__paper-compiler__* tools                                  │
└─────────┬──────────────────────────────────────────▲───────────────┘
          │ queries                                  │ structured evidence
          ▼                                          │
┌────────────────────────────────────────────────────────────────────┐
│            paper-compiler MCP server  (read-only)                  │
│  sqlite + sqlite-vec + FTS5; lazy-loaded; ~72 MB per paper         │
└─────────┬──────────────────────────────────────────────────────────┘
          │                                                          
          ▼                                                          
┌────────────────────────────────────────────────────────────────────┐
│              research/  (lives in your repo, git-friendly)         │
│   research.md       — ≤ 8000-token brief                           │
│   research.db       — Graph RAG store                              │
│   SCHEMA.md         — DB schema reference for Claude               │
│   evidence/<atom>.md — per-atom verbatim spans                     │
│   wiki/             — Karpathy llm-wiki (atoms, papers,            │
│                       communities, promoted answers, log)          │
└─────────▲──────────────────────────────────────────────────────────┘
          │ writes (compile-time only)                              
          │                                                          
┌────────────────────────────────────────────────────────────────────┐
│       paper-compiler CLI  — runs in background, progress monitored  │
│   resolve → acquire → parse → expand → classify → atom-extract →   │
│   score → render → build DB → communities → wiki                   │
└────────────────────────────────────────────────────────────────────┘

paper-compiler

Popularity

What's Inside

Confidence

README

research-compiler

The hypothesis

What this adds to Claude Code

What it does well (and the tasks it actually simplifies)

Similar Plugins

claude-buddy

context7-plugin

prompt-improver

creative-writing

claude-mem

nanobanana

research-compiler

The hypothesis

What this adds to Claude Code

What it does well (and the tasks it actually simplifies)

Popularity

Health & Quality

Similar Plugins

claude-buddy

context7-plugin

prompt-improver

creative-writing

claude-mem

nanobanana