From quant-paper-agent
Read a downloaded quant finance paper PDF and author note.md + metrics.json using judgment. Use when the user wants a TL;DR that captures the paper's contribution (not a copy-pasted abstract), reported performance metrics (Sharpe, max drawdown, annual return, volatility, Calmar, information ratio, win rate) validated against table values as well as narrative prose, the paper's core formulas verified by cross-checking Results-section citations, and paper-specific open questions needed for replication. Second stage of the paper-to-production pipeline; reads papers/<arxiv-id>/paper.pdf and writes note.md + metrics.json into the same directory. Bundled Python scripts prepare a structured context bundle that Claude then reasons over to produce the outputs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/quant-paper-agent:paper-extractThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn a quant paper PDF into `note.md` (human synopsis) and `metrics.json` (machine hand-off for `paper-replicate`). Bundled Python scripts prepare a structured context bundle; Claude reads it and synthesizes the outputs — so the TL;DR captures the paper's contribution, metric extraction covers table values as well as narrative, and the replication open-questions are paper-specific.
Turn a quant paper PDF into note.md (human synopsis) and metrics.json (machine hand-off for paper-replicate). Bundled Python scripts prepare a structured context bundle; Claude reads it and synthesizes the outputs — so the TL;DR captures the paper's contribution, metric extraction covers table values as well as narrative, and the replication open-questions are paper-specific.
prepare_context.py extracts per-section text, candidate formulas (heuristic), candidate metric hits (regex), and meta into a single context.json. Bounded by character caps so the bundle fits comfortably in your context window.context.json and author note.md + metrics.json directly via the Write tool, using the schema below. The Python candidates are recall hints, not final output — verify, augment, drop false positives, and fix obvious OCR damage on formulas.validate_output.py schema-checks metrics.json so paper-replicate never gets malformed input.papers/<arxiv-id>/paper.pdf (from paper-search or user-supplied) and the user wants it parsed into structured fields.Requires Python ≥ 3.9 and pymupdf (pip install -r requirements.txt from the plugin root).
python scripts/prepare_context.py papers/<arxiv-id>/paper.pdf
Writes papers/<arxiv-id>/.extract/context.json. Useful flags:
--per-section-cap 8000 (default) — caps each section's text so the bundle stays focused.--total-cap 40000 (default) — soft total-char ceiling; shrinks biggest sections first if exceeded.The bundle contains:
meta — from paper-search's meta.json (title, authors, abstract, categories, published).outline — canonical section list with page ranges and char counts.sections — per-section text (abstract, introduction, data, methodology, results, conclusion, references). This is the material you read.candidate_formulas / candidate_core_formulas — every formula-looking line the heuristic caught, plus its guess at core formulas (labels cited in both Methodology and Results).candidate_metrics — regex hits for eight common metrics with page + 160-char context window.candidate_data_period — pattern-matched "from YYYY to YYYY, daily/monthly/..." if present.If context.json comes back with total_section_chars > 30000 or many truncated sections, spawn an Explore sub-agent via the Task tool on the bundle, asking it to return the fields you need. Keeps the main context clean.
Read papers/<arxiv-id>/.extract/context.json. Then author the two output files using the Write tool.
metrics.json — target schema{
"arxiv_id": "2403.12345",
"reported_metrics": [
{
"name": "sharpe",
"value": 1.42,
"variant": "12m lookback, long-short",
"unit": null,
"page": 14,
"context": "Table 3 reports an annualized Sharpe ratio of 1.42 for the 12-month lookback variant."
}
],
"core_formulas": [
{
"label": "Eq. 3",
"section": "methodology",
"page": 7,
"text": "r_{i,t} = (1/L) * sum_{k=1..L} ret_{i,t-k}",
"cited_in_results": true
}
],
"data_period": {"start": "1990-01", "end": "2020-12", "frequency": "daily"},
"universe": ["US equities", "CRSP common shares", "NYSE/NASDAQ/AMEX"],
"notes": "Authors specify returns are excess of 1-month T-bill; signal formed at month-end and held for 1 month."
}
Fields with null are allowed when the paper genuinely does not state the value — do NOT guess.
reported_metrics — Start from candidate_metrics, then:
sharpe pattern).variant with the specific configuration (lookback length, long-only / long-short, top decile, market-neutral, etc.). The regex writes crude variants; you write precise ones.unit — "18%" should be {"value": 18.0, "unit": "%"}, not {"value": 0.18, "unit": null}. Keep values in the paper's original unit; paper-replicate's compare step normalizes.context to one sentence that identifies where the number comes from.core_formulas — Start from candidate_core_formulas, then:
summation k=1..L should render as sum_{k=1..L}). Keep it ASCII-safe plain text — no LaTeX rendering attempts.cited_in_results: true only when the paper's Results section references the equation label.data_period — Read the Data section text. Use the narrative (e.g. "our sample spans January 1990 through December 2020, sampled daily") in preference to the candidate_data_period regex output. Null if the paper really does not say.
universe — Extract specific, named universes from the Data section. "US equities" alone is weak; "CRSP common shares on NYSE/NASDAQ/AMEX, excluding stocks below $5" is useful. A list of 1-4 strings.
notes — One to three sentences capturing replication-critical details the four fields above cannot express: signal-formation timing, rebalancing cadence, return definition (excess / total / log), treatment of delisted or missing names.
note.md — target shapeFollow references/note_template.md. The critical section is TL;DR: it must capture the paper's contribution, not its abstract. See the template for an explicit before/after example.
python scripts/validate_output.py papers/<arxiv-id>/metrics.json
Runs a stdlib-only schema check (types, required fields, reasonable ranges). Warnings on empty arrays, unknown metric names. Use --strict to fail on warnings. If it complains, fix metrics.json and re-run before handing off to paper-replicate.
python scripts/extract_text.py <paper.pdf> # raw text + heading detection
python scripts/extract_formulas.py <paper.pdf> # formula candidates
python scripts/extract_metrics.py <paper.pdf> # regex metric hits
Each prints JSON to stdout.
core_formulas[].text is a plain-ASCII approximation that a human (or the paper-replicate planner) cross-checks against the PDF.data_period is null. If no metric table exists, reported_metrics is []. Do not invent anchors paper-replicate will later measure against.paper-search owns that.references/note_template.md for the contrast.metrics.json.Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub lucaswychan/quant-paper-agent --plugin quant-paper-agent