From sci-brain
Downloads academic references (arXiv IDs or DOIs) into a sci-brain knowledge base: fetches metadata, PDFs, renders to markdown, updates INDEX.md and ref.bib.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sci-brain:download-refThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- A discussion / draft surfaces a paper not yet in the project KB, and you want it indexed for future search.
ref.bib.Do NOT use:
The renderer uses pymupdf4llm for highest-fidelity output (preserves figures). Fallbacks (markitdown → pdftotext) are text-only — figures silently missing. Verify before fetching:
python3 -c "import pymupdf4llm; print('ok', pymupdf4llm.__version__)"
If that errors, install for the same python3 the helpers will use:
# macOS / Homebrew Python
/opt/homebrew/bin/python3 -m pip install --user --break-system-packages pymupdf4llm
# Linux / system Python
python3 -m pip install --user pymupdf4llm
1806.08734, 2006.10739) — strip the vN suffix.10.1103/PhysRevLett.130.036401) — lowercase preferred; renderer normalizes.download-ref writes:
$KB/.raw/{arxiv,doi}/<id>.{json,pdf}$KB/.figures/{arxiv__<id>,doi__<safe>}/...$KB/<id>_<slug>.md (rendered paper, one per ref)$KB/INDEX.md (regenerated each run)$(dirname $KB)/ref.bibdownload-ref never touches:
$KB/NOTES.md — owned by survey / researchstyle / humans (sub-themes, open problems, bottlenecks).If the caller passes --kb <abs-path>, use that. Otherwise:
KB=$(python3 skills/download-ref/helpers/resolve_kb.py)
if [ -z "$KB" ]; then
# resolve_kb printed "unresolvable from ..." to stderr and exited 2.
# Ask the user via AskUserQuestion where the KB should live.
exit 1
fi
For advisor flows (/incarnate, /brainstorm-ideas with a selected advisor), resolve the advisor KB instead: KB=$(python3 skills/download-ref/helpers/resolve_kb.py --advisor <slug>). This honors $SCIBRAIN_KB_DIRNAME the same way the project-KB form does.
for id in 1806.08734 2006.10739; do
[ -f "$KB/.raw/arxiv/$id.json" ] && echo "$id present" || echo "$id missing"
done
for doi in 10.1103/PhysRevLett.130.036401; do
safe=$(echo "$doi" | tr '/' '-')
[ -f "$KB/.raw/doi/$safe.json" ] && echo "$doi present" || echo "$doi missing"
done
Helpers are idempotent — this check is for human-readable status, not gating.
3a. Direct input (single-shot mode):
TMP=/tmp/download-ref-manifest.json
cat > "$TMP" <<'EOF'
{"arxiv": ["1806.08734", "2006.10739"], "doi": []}
EOF
3b. From an existing ref.bib (bulk mode, --from-bib):
TMP=/tmp/download-ref-manifest.json
python3 skills/download-ref/helpers/bibtex_to_manifest.py "$(dirname $KB)/ref.bib" > "$TMP"
When in bulk mode, optionally ask the user:
"I see 59 refs in the manifest. Render all, topic-filtered, or specific IDs?"
- (a) All — proceed with the full manifest
- (b) Topic-filtered — name a heading from
NOTES.md(skill greps for cite keys under it)- (c) Specific IDs — paste arXiv IDs / DOIs
For (b) and (c), edit $TMP accordingly before continuing.
python3 skills/download-ref/helpers/fetch_metadata.py \
--kb "$KB" \
--manifest "$TMP" \
--download-arxiv-pdfs
Populates $KB/.raw/{arxiv,doi}/<id>.{json,pdf} idempotently. PDFs are downloaded sequentially with 2s sleep between requests to avoid arXiv rate limits. Each PDF is verified for a %%EOF trailer; truncated downloads are discarded and retried. For DOIs whose publisher gates the PDF (APS / Nature / IOP / AAAS / ACS), the helper falls back to the arXiv preprint via externalIds.ArXiv when present. If even that fails, you'll see a miss line — go to Step 4b.
Tip: Set SEMANTIC_SCHOLAR_API_KEY in your environment to raise the Semantic Scholar rate limit from ~1 req/s to 100 req/s. Get a free key at https://www.semanticscholar.org/product/api#api-key-form.
If Step 4 reports miss for any DOI (no open-access PDF and no arXiv preprint), use the sci-hub-server MCP tool. For each missing DOI:
mcp__sci-hub-server__get_paper_link with the DOI to get a direct PDF URL.mcp__sci-hub-server__download_pdf and save to $KB/.raw/doi/<safe>.pdf (<safe> = DOI with / → -).If the MCP isn't configured, tell the user to add it:
"mcpServers": {
"sci-hub-server": {
"command": "npx",
"args": ["sci-mcp-server"]
}
}
Skip this step if all PDFs were fetched in Step 4.
python3 skills/download-ref/helpers/render.py --kb "$KB"
Add --only-missing to skip papers that already have a rendered .md file (>500 bytes). This is much faster when adding a few papers to a large KB:
python3 skills/download-ref/helpers/render.py --kb "$KB" --only-missing
No manifest needed — renderer auto-discovers .raw/{arxiv,doi}/*.json. Renders new entries; overwrites existing.
PDF backend priority:
pymupdf4llm — markdown + extracted images into $KB/.figures/.markitdown — text-only fallback.pdftotext -layout — last-resort fallback..raw/ and .figures/ should stay out of git. Append to .gitignore if missing.
In single-shot mode (Step 3a), ask the user to confirm each new cite key. In bulk mode (Step 3b), the keys come from ref.bib directly — skip this step.
python3 skills/download-ref/helpers/append_bibtex.py propose \
--kb "$KB" --id 1806.08734 --type arxiv
Output JSON has proposed_key (form lastname_year_firstkeyword), title, authors, year, bibtex_with_proposed_key. Show the user via AskUserQuestion:
Once confirmed:
python3 skills/download-ref/helpers/append_bibtex.py append \
--kb "$KB" --id 1806.08734 --type arxiv \
--key rahaman_2018_spectral \
--bib "$(dirname $KB)/ref.bib"
The helper rewrites the BibTeX cite key, refuses duplicates, appends with one blank-line separator.
python3 skills/download-ref/helpers/index.py \
--kb "$KB" \
--title "<project-or-advisor-slug> — references" \
--source-note "Reading list and full-text harness."
Replace <project-or-advisor-slug> with this KB's name. Once chosen, keep --title and --source-note byte-identical across runs — INDEX.md is regenerated wholesale every time; drift causes noisy diffs.
# New md files appear at top level
ls -t "$KB"/*.md | head
# Frontmatter present
for f in "$KB"/*.md; do
case "$(basename "$f")" in INDEX.md|NOTES.md) continue ;; esac
head -1 "$f" | grep -q '^---$' || echo "MISSING FRONTMATTER: $f"
done
# Raw blobs gitignored
KB_NAME=$(basename "$KB")
git -C "$(dirname "$KB")" check-ignore "$KB_NAME/.raw/" 2>/dev/null \
|| echo "WARN: $KB_NAME/.raw/ not gitignored"
# INDEX picked up the new ids
for id in 1806.08734 2006.10739; do
grep -q "$id" "$KB/INDEX.md" || echo "WARN: $id missing from INDEX.md"
done
Tell the user: new cite key(s), rendered file path(s), full_text yes/no per ref.
After the done checklist passes, offer the next step:
"Papers downloaded and rendered. What next?"
- (a) Write a review — invokes
survey-writerto produce a technology assessment from the active KB- (b) Ideas — continue to brainstorming with
/brainstorm-ideas- (c) Done — stop here
The natural pipeline is: /survey → /download-ref → /survey-writer.
/survey (upstream): writes/extends $KB/NOTES.md, appends to $(dirname $KB)/ref.bib, regenerates $KB/INDEX.md, then hands off to /download-ref to fetch PDFs and render full text. The survey's transition checkpoint offers this directly./survey-writer (downstream): consumes the rendered KB (full-text .md files + $(dirname $KB)/ref.bib) to produce a structured technology assessment report./survey / /researchstyle: write their own .raw/ JSON via batched fetches and call append_bibtex.py directly (skipping the per-ref confirmation in Step 6). They invoke index.py at the end of their run./brainstorm-ideas end-of-session: surfaces candidate IDs/DOIs from the conversation; for the user's selections, invokes /download-ref in single-shot mode./incarnate: invokes /download-ref (or /researchstyle) targeting the advisor KB resolved by python3 skills/download-ref/helpers/resolve_kb.py --advisor <slug>.| Mistake | Fix |
|---|---|
Passing a relative --kb | Always absolute. Helpers don't cd; figures depend on absolute paths. |
Forgetting --download-arxiv-pdfs in Step 4 | Without it full_text: no and Step 5 has nothing to render. |
Using arXiv:XXXX with prefix or vN suffix | Strip both — manifest takes bare ids: 1806.08734. |
Editing the rendered .md and losing it on re-render | Renderer overwrites without warning. Edit .raw/ source or renderer logic. |
| Cite-key collision with different content | Helper skips silently — investigate, re-run propose with a different key. |
Drifting --title / --source-note between runs | INDEX.md regenerates wholesale; first-run values are canonical. Copy verbatim from existing INDEX.md. |
.raw/{arxiv,doi}/<id>.json exists for every requested id.raw/{arxiv,doi}/<id>.pdf exists where the source allows (else recorded as miss)<id>_<slug>.md per ref at $KB/ root, with frontmatter$KB/INDEX.md regenerated, lists each new entry$(dirname $KB)/ref.bib has the new cite key (no duplicate)full_text yes/no per refnpx claudepluginhub quantumbfs/sci-brain --plugin sci-brainIndexes a researcher's paper collection (Zotero library, PDF folder, or Google Scholar profile) into a structured knowledge base under .knowledge/.
Manages Paperpile reference library and resolves citations to PDFs via the paperpile CLI. Supports add, search, fetch, label, edit, trash, and auth operations.
Use this skill for "search for papers", "find citations", "look up a DOI", "get BibTeX", "download PDF", "convert PDF to markdown", "find canonical papers", "convert identifiers", "batch download papers", "configure opencite", "literature review", "find related papers", "what papers cite this", "export references", "read this paper", or mentions of opencite, Semantic Scholar, OpenAlex, PubMed, academic literature search, citation management, or paper retrieval.