From academic-research
Runs a full PRISMA-style systematic literature review: search, screening, coding, and export. Targets social sciences SLRs with Zotero integration.
How this skill is triggered — by the user, by Claude, or both
Slash command
/academic-research:systematic-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Glossary:** unfamiliar with **PRISMA**, **MCP**, **BBT**, **ABS**,
Glossary: unfamiliar with PRISMA, MCP, BBT, ABS, DOI, ISSN, SFX, TDM, CSL, stage tag, or FE-code? See skills/_glossary.md for one-line definitions of every acronym this skill uses.
Before any step below, verify the plugin has been configured:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/check_configured.py"
If the result is NOT CONFIGURED, stop immediately and tell the user:
The academic-research project has not been set up on this machine yet. Run the setup skill or setup wizard first to configure API keys (Zotero, Elsevier, WoS, Anthropic, Gemini, Semantic Scholar), MCP servers, and permission rules. Do not attempt an SLR before that.
Do not call MCP tools, run pipeline scripts, or proceed with any stage of the procedure. Running the setup skill/wizard is the required first step.
If the result is configured, proceed.
An SR project needs (a) the canonical directory scaffold, (b) four
regression-test files, and (c) pipeline-stage config templates. Run
the three setup helpers below in order. They are all idempotent —
re-running skips anything already in place. Do not use shell
mkdir -p (prompts the user, bash-only) or chained cp calls
(prompts the user for every chain) for the same work.
Create the directory scaffold:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/ensure_dir.py" \
scripts screening pdfs analysis analysis/results manuscript
Check what's already present:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/check_project_scaffold.py" \
scripts/test_common.py scripts/test_citations.py \
scripts/test_empirical_integrity.py scripts/test_systematic_review.py \
search_config.py screening_config.py \
analysis/manuscript_stats.py manuscript/manuscript_tables.py \
manuscript/manuscript.qmd
If any are missing, install them (one call, skip-if-exists for the rest):
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/install_templates.py" \
test_common.py:scripts/test_common.py \
test_citations.py:scripts/test_citations.py \
test_empirical_integrity.py:scripts/test_empirical_integrity.py \
test_systematic_review.py:scripts/test_systematic_review.py \
search_config.py:search_config.py \
screening_config.py:screening_config.py \
manuscript_stats.py:analysis/manuscript_stats.py \
manuscript_tables.py:manuscript/manuscript_tables.py \
manuscript.qmd:manuscript/manuscript.qmd
Tell the user which files were installed and flag that the top of
each test_*.py has project-specific paths, test_empirical_integrity.py
has a FORBIDDEN_LITERALS tuple, and search_config.py /
screening_config.py / manuscript_stats.py all need customisation
before use.
If the project has no CLAUDE.md yet, suggest using
${CLAUDE_PLUGIN_ROOT:-.}/templates/sr_claude_md.md as a starting
point — but don't write it without the user's say-so. CLAUDE.md is
user-owned. To install once the user confirms:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/install_templates.py" \
sr_claude_md.md:CLAUDE.md
zotero-operations skill's IRON RULEBefore any Zotero work in this skill: when reading or writing the
user's library, the access hierarchy is (1) MCP mcp__zotero__*
tools → (2) scripts/pipelines/zotero_io.py and
scripts/pipelines/bbt_client.py → (3) never direct HTTP. A
direct urllib.request.urlopen("http://127.0.0.1:23119/...") or
curl localhost:23119 is a defect signal — propose adding the
missing helper to zotero_io.py rather than working around it
inline. The full rule lives in skills/zotero-operations/SKILL.md
under "IRON RULE — Zotero access goes through the plugin's surface".
The CI guard at tests/unit/test_no_direct_localhost_zotero.py
fails the build if a direct-HTTP call slips into a pipeline file
that isn't zotero_io.py or bbt_client.py.
Run this first, right after bootstrap. Pin down which Zotero
library will hold this review's bibliography before starting the
scope conversation — the choice is independent of scope, takes a
single question to resolve, and unblocks every later step that
touches Zotero. Running it first also means the project's
CLAUDE.md carries the library reference from the outset, so a
future session opening the project sees it immediately.
The choice is stored in the project's CLAUDE.md and passed
explicitly to every pipeline script as either --group <id> (a
group library) or --user (your personal / My Library), plus
--collection <key> where supported. It is NOT set via the
ZOTERO_GROUP env var — env vars are per-shell, easily lost on a
new terminal, and invisible to future sessions that read the
project's CLAUDE.md to orient themselves.
Procedure:
List available libraries:
mcp__zotero__zotero_list_libraries()
Show the user the personal library (type user) and each group
(type group, with numeric IDs). Ask which to use. Group
libraries are the usual choice for SRs — shared with
collaborators, higher upload quota, cleaner archival than mixing
into personal — but pipeline scripts fully support My Library via
--user, so either works.
Optional: scope to a collection within the chosen library. Ask the user whether this SR's items should go into an existing collection, or a fresh one. For an existing one:
mcp__zotero__zotero_get_collections(library_id=<id>)
For a fresh collection, note the intended name — import_to_zotero.py
creates it on first use when --collection <name> is passed and
no matching key exists.
Write the choice into the project's CLAUDE.md under a
## Zotero library heading (create or extend the file as
needed). Ask the user to confirm the edit before saving. Shape
depends on group vs personal:
## Zotero library
- **Library:** group (or `user` for personal)
- **Group ID:** `<numeric id>` (omit if `Library: user`)
- **Collection key:** `<8-char Zotero key>` (omit if creating
fresh at import time)
All pipeline scripts take `--group <id>` (group library) or
`--user` (personal library) and, where supported,
`--collection <key>`. Do not set `ZOTERO_GROUP` as an env var —
the canonical record is here.
Self-check before every Zotero write: does the project's
CLAUDE.md have a ## Zotero library section with a group ID? If
not, STOP and run the procedure.
Before calling ANY search tool — MCP (mcp__scopus__search_scopus,
mcp__openalex__search_*, mcp__semantic-scholar__*-search*,
mcp__paper-search*__search_*) or script
(scripts/pipelines/search*.py) — including piloting and volume
probes — the scope brief must exist at
.claude/systematic-review/scope.md AND the user must have
explicitly confirmed it in the current session ("proceed", "looks
good", "confirmed", or equivalent). Silence is not confirmation, and
"experiment with X" is not confirmation of the surrounding scope.
The gate exists because "just a pilot search" shapes the methods: keyword combinations get baked into the user's mental model, volume numbers anchor downstream inclusion calls, and reframing after a pilot is more expensive than reframing on paper. Pin down scope on paper, get explicit sign-off, then search.
Brief contents (every section required before asking for confirmation):
Focal construct / phenomenon scope — what is the central topic, and at what breadth? Give the breadth as a narrow / medium / broad choice with a concrete definition of each, and justify the choice. (E.g. a review on "remote work" could go narrow = post-2020 pandemic-induced remote work, medium = any scheduled telework since 2000, broad = all spatially distributed work arrangements.)
Population / unit of analysis / context — what units are in scope (individuals / teams / firms / ventures / SMEs / industries / countries / etc.)? Geographic / sectoral / temporal-era restrictions? If multiple units appear in the scope, name the synthesis strategy (separate strands? single framework?).
Research question(s) — one or more focal questions the review
will answer. If the synthesis will map multiple streams of the
literature — e.g. X-as-antecedent vs X-as-outcome, or phenomenon
used as a tool vs studied as a domain vs applied as a research
method — name the streams. Flag whether streams are a narrative
device only, or whether a per-paper research_stream coding
field should extend FULLTEXT_CODING_FIELDS in
screening_config.py (this is a proposal, not a prescription —
the default template does not include one).
Time window — start year (inclusive), end year (inclusive), and the reason for the start year (a pivot paper, a technology event, a round number with a defence).
Journal set — tier list (AJG/ABS 2024 / FT50 / ABDC), which field codes within it, and whether ISSN-filtering will be used (requires WoS Expanded).
Database access — which databases the formal search will
use. Do NOT ask the user blind. First run the probe (it reads
~/.config/academic-research/config.toml out-of-process and
emits only yes/no status — no keys):
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/check_database_access.py"
Then present the detected set and ask which subset to use. For a formal SR, two or three databases are typical; prefer WoS Expanded + Scopus when both are available, OpenAlex + Semantic Scholar as free fallbacks. Record both the chosen set and any available-but-excluded databases (with the user's reason).
Exclusion criteria — language restriction? editorials / book reviews / proceedings? conference papers? predatory-listed journals?
Search keyword blocks — the literal term lists for each
Boolean block of the query (typically one block per scope
dimension identified in items 1–2). These lists go verbatim into
Scopus / WoS / OpenAlex queries. Present them block-by-block,
with wildcards where stemming is needed (WoS does not stem
phrases; Scopus does), and ask the user to approve each block.
This is the level of detail that actually goes into
search_config.py — do not commit without explicit approval,
because small term choices (e.g. "firm growth" alone vs
"firm growth" OR "venture growth" OR "business growth*")
drastically change recall.
Draft the brief in conversation, ask the user to confirm, then write
.claude/systematic-review/scope.md. Create the directory first:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/ensure_dir.py" .claude/systematic-review
If the user changes scope mid-run, update scope.md and any
affected search_config.py together before further searches.
Self-check before every search call: has scope.md been written?
Has the user said "proceed" (or equivalent) since the brief was
finalised? If either answer is no, STOP and finish the interview.
Self-check before any Write / Edit on search_config.py: is the
keyword list in the current draft of search_config.py identical
(up to formatting) to the block-by-block keyword list the user
approved as item 8 of the scope brief? If the agent revised
keywords after a pilot or reviewer feedback, update scope.md
first, get fresh user confirmation on the revised blocks, then
write search_config.py. Never silently expand keyword coverage
between scope.md and search_config.py.
abstract_screen.py)The abstract-screening system prompt lives in screening_config.py
(ABSTRACT_SCREENING_SYSTEM_PROMPT) and is the record of what
got in and what got out at stage 1. Reviewers will read it to judge
whether the review is reproducible. PRISMA expects it to be fixed
before screening starts, not tuned while decisions accumulate.
Procedure (run the first time the agent is about to call
abstract_screen.py):
Open screening_config.py and locate every <INSERT …> / <…>
placeholder in ABSTRACT_SCREENING_SYSTEM_PROMPT. There are
typically four:
.claude/systematic-review/scope.md item 3.Draft each replacement in conversation. Show the user the resulting prompt in full (prefer a read-back / inline code block over just "here's the diff"). Ask them to approve criterion by criterion — criteria are the review's spine.
Bump ABSTRACT_SCREENING_PROMPT_VERSION to a fresh string (e.g.
vN-YYYY-MM-DD). The version goes into every log row; reviewers
use it to distinguish a re-run under the same protocol from a
re-run under a revised protocol.
Write the revised file. Record a one-line summary of the
protocol in .claude/systematic-review/scope.md (append, don't
replace) so the scope brief and the screening config stay in
lockstep.
Self-check before every abstract_screen.py call: does
ABSTRACT_SCREENING_SYSTEM_PROMPT contain any <INSERT /
<CRITERION / bare <…> placeholders? Has the user approved the
prompt in the current session? If either answer is no, STOP and run
the procedure.
Revision during screening. If the user wants to tighten a
criterion after seeing real decisions: bump
ABSTRACT_SCREENING_PROMPT_VERSION, have the user re-approve, and
re-run with --rerun so the new version replaces prior decisions
on the affected items. The append-only log preserves the original
decisions under the old version for audit.
fulltext_code.py)The full-text coding schema — FULLTEXT_CODING_FIELDS plus
FULLTEXT_CODING_SYSTEM_PROMPT in screening_config.py — is the
record of what data the review extracts from each included paper.
Fields added or reworded after coding starts create inconsistent
columns in coded_papers.csv; PRISMA expects the schema to be
fixed before coding starts.
Procedure (run the first time the agent is about to call
fulltext_code.py):
Draft the coding schema. The template defaults are
key_findings, sample, method; these are safe starters for
nearly any SR. Propose additions based on the scope brief's
research questions — common add-ons for social-sciences SRs
include theories_and_references, direction_of_relationship,
moderators_boundary_conditions, causal_inference_strength,
future_research, and (if the scope brief named streams) a
research_stream enum field. 5–15 fields total is typical;
each field needs a name, a description written for an LLM
reader, and ideally an example.
Fill in FULLTEXT_CODING_SYSTEM_PROMPT placeholders: research
question (same as stage 1), stage-2 criteria (what the full
text must show that the abstract could not), and exclusion
codes FE1…FE5.
Show the user the full schema (every field) and the prompt. Ask them to approve field-by-field — adding a field mid-run costs a re-code of every already-coded paper.
Bump FULLTEXT_CODING_PROMPT_VERSION to a fresh string.
Write the revised file. Append a one-line summary to
.claude/systematic-review/scope.md.
Self-check before every fulltext_code.py call: does
FULLTEXT_CODING_SYSTEM_PROMPT contain any <INSERT /
unpopulated placeholders? Does FULLTEXT_CODING_FIELDS still hold
the template's three starter entries unmodified? Has the user
approved the schema in the current session? If any answer is no or
yes-still-template, STOP and run the procedure. If only 1–3 fields were revised and prior adjudicator edits on other fields must be preserved, use --update-fields rather than a full re-code — see Revision during coding below.
Revision during coding. Two revision paths exist — choose based on scope:
Add or revise specific fields (--update-fields) — preferred for
additive schema changes (new field) or guideline rewrites for 1–3 fields.
This mode selects items already tagged fulltext:include, calls the LLM for
all fields (using the updated config), then merges only the named fields into
the existing SLR Coding note without touching any other field values or the
screening decision. Adjudicator edits to the targeted fields are
overwritten (warn the user); adjudicator edits to all other fields are
preserved. Bump FULLTEXT_CODING_PROMPT_VERSION before invoking so the log
records which config version produced the update.
uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/fulltext_code.py \
--group <id> --collection <key> --config ./screening_config.py \
--pdf-dir ./pdfs --update-fields method,theories_and_references
Combine with --only-keys K1,K2,... to limit to a specific subset.
Full schema overhaul (--full-recode) — for major version changes
where every field needs a fresh extraction under the new prompt. This
removes all fulltext:* tags, backs up the CSV log, and re-codes from
scratch. SLR Coding notes for items that re-include are overwritten unconditionally; adjudicator edits to those notes are lost. Notes on items that re-exclude are left untouched. Treat as a v1 → v2 bump and ask the user to confirm they accept
the re-coding cost. Bump FULLTEXT_CODING_PROMPT_VERSION before invoking.
Field reordering in FULLTEXT_CODING_FIELDS is free (it only affects
column order in the export CSV and note rendering). Field renaming needs
a data migration: the old name stays in existing notes' JSON payloads; use
--update-fields <new_name> as a one-pass migration that populates the new
field name, then rename it in config and regenerate.
Before writing ANY of the following, STOP:
python3 <<'EOF' ... EOF or
python3 - <<'PY' ... PY).python -c "..." for anything beyond a single-line
probe (the four shipped probes under scripts/setup/check_*.py
and ensure_dir.py cover all the legitimate single-line cases).If the task is pipeline-shaped (enumerate Zotero items, summarise a screening CSV, mutate tags, fetch abstracts, filter a search CSV, compute counts), one of two things must be true:
A shipped script covers it. Invoke that script with explicit flags. The Pipeline-scripts table below is the canonical list.
No shipped script covers it. Tell the user:
There is no shipped script for . I can either (a) add a new script under
scripts/pipelines/(recommended — keeps the work auditable and reusable across sessions), or (b) use the Zotero / OpenAlex / etc. MCP tools directly for this one task. Which do you prefer?
Wait for their answer. Do not write the heredoc.
Why this rule is hard:
Bash(python3 ${CLAUDE_PLUGIN_ROOT:-.}/scripts/**) matches paths,
not heredocs). Every heredoc triggers a permission prompt.Common gaps that surface as "I'll just write a quick script":
| Task | Right move |
|---|---|
| Filter / trim a search CSV (top-N by year, year range) | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/filter_search_results.py --input <csv> --output <csv> [--year-min Y] [--year-max Y] [--top-n N] |
| Summarise screening decisions across passes (last-row-wins, decision counts, list re-screened items) | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/screening_report.py <log.csv> [--list <decision>] [--list-rescreened] |
| Inspect tag state for one item | Use mcp__zotero__zotero_get_item_metadata directly — single MCP call, no Python needed. |
| Read a CSV row count | wc -l <path> — already a one-liner. |
Self-check before any Bash heredoc / inline python -c
exceeding one line: is this task in the Pipeline-scripts table or
covered by a scripts/setup/ helper? If the answer is "no, but I
could write a quick one", STOP and propose adding the shipped
script instead.
Every systematic review runs through the same stages:
search → import to Zotero → fetch abstracts → attach PDFs →
abstract screening → full-text screening/coding → QA with evaluator agents →
human adjudication → export results → test suite → manuscript
Principles:
zotero-operations skill for lower-level Zotero patterns.screening/*.csv for provenance
and debugging (who decided what, when, with which model and prompt
version). But "what is the current decision on item X?" is answered
by Zotero, not the CSV. Adjudicator flips happen in Zotero directly;
re-runs read Zotero tags to decide what to skip.abstract:include / abstract:exclude /
abstract:borderline for abstract screening; fulltext:include /
fulltext:exclude for full-text coding), and skip items already
tagged. The CSV log is written in parallel for provenance but is not
consulted for resume decisions.flush=True on
every print; emit [N/total] counters; invoke via | tee to a log
file. Never pipe to /dev/null.--filter-keys-file <path> for enrichment / audit / export scripts,
--only-keys <k1,k2,…> for screening scripts. Either way, the next
stage drives from the previous stage's Zotero tag state (queried via
MCP or pyzotero); the file / CLI filter is a way to narrow further.| Variable | Used by | Purpose |
|---|---|---|
ZOTERO_API_KEY | All scripts | Zotero API authentication (required) |
ANTHROPIC_API_KEY | Screening scripts | Claude API (required for LLM screening) |
ELSEVIER_API_KEY | enrich_pdfs.py, enrich_abstracts.py | Elsevier/ScienceDirect full-text retrieval |
SCOPUS_API_KEY | Search scripts | Scopus API (often same as ELSEVIER_API_KEY; some institutions issue separately) |
WILEY_TDM_TOKEN | enrich_pdfs.py --sources wiley | Wiley TDM UUID token |
OPENALEX_API_KEY | enrich_pdfs.py, enrich_abstracts.py | OpenAlex Content API ($0.01/download, paid) |
SEMANTIC_SCHOLAR_API_KEY | enrich_abstracts.py | Semantic Scholar (higher rate limit with key) |
CROSSREF_MAILTO | All scripts | Crossref polite pool (any email) |
WOS_API_KEY_EXTENDED | Search scripts | WoS Expanded (full Boolean, IS= works) — prefer this |
WOS_API_KEY | Search scripts | WoS Starter (field-limited, no IS=) — piloting only |
The /setup skill writes these to ~/.config/academic-research/config.toml
(mode 0600) on first run. Environment variables take precedence over the
file.
Project-level Zotero selection (group ID, collection key) is not
an env var — it lives in the project's CLAUDE.md per the Zotero
library selection section above, and is passed to every pipeline
script as --group <id> (and --collection <key> where supported).
Scripts fall back to $ZOTERO_GROUP only as a convenience for
command-line invocations; skill agents pass the flag explicitly.
Zotero is the ground truth for screening decisions, coding fields, and adjudication outcomes (see the Core architecture principles above). This section is the canonical catalogue of every tag and child note the pipeline reads or writes. Scripts and skills reference these conventions; the table below is the single source of truth.
Tell you where each item is in the pipeline. Mutually exclusive within
each stage — an item has at most one abstract:* tag and at most one
fulltext:* tag at any given time. Scripts apply these at decision
time via the Zotero API and remove prior stage tags on flip.
| Tag | Applied by | Meaning |
|---|---|---|
abstract:include | abstract_screen.py | Passes title-abstract screening — proceeds to full-text |
abstract:exclude | abstract_screen.py | Excluded at title-abstract stage |
abstract:borderline | abstract_screen.py | Kept for full-text review (missing abstract, or LLM uncertain) |
fulltext:include | fulltext_code.py | Passes full-text screening; has SLR Coding child note |
fulltext:exclude | fulltext_code.py | Excluded at full-text stage |
Set outside the main screening loop — by preflight checks (predatory) or post-screening quality audits (retraction). Both are warnings the adjudicator sees in Zotero, not automatic exclusions.
| Tag | Applied by | Meaning |
|---|---|---|
predatory:flag | Preflight journal check against Beall's list (import_to_zotero.py) | Warning, not exclusion. Author decides during full-text review whether to keep each flagged paper. |
retracted:flag | Post-coding retraction check via mcp__zotero__scite_check_retractions (see Retraction check in Key methodological rules) | Warning, not exclusion. Cited paper has been retracted per Scite's retraction-watch data. Adjudicator decides whether to keep (with a discussion note), replace the citation, or drop the paper. |
pdf:tdm-recovered | enrich_pdfs.py, when Elsevier's TDM API returns only a 1-page preview and the fetcher falls back to the XML endpoint | Warning, not exclusion. The attached "PDF" is text reconstructed from XML, not the publisher's native PDF — may be less complete or lose figures/tables. audit_zotero_library.py lists these under tdm_recovered; review before/during full-text coding. |
Applied during the post-screening QA evaluator pass and the human adjudication loop (see Post-screening QA below).
| Tag | Applied by | Meaning | Removed when |
|---|---|---|---|
qa-flag | Main agent after any evaluator flags an item | Sentinel for filtering in Zotero | After human adjudication (replaced by qa-adjudicated-*) |
qa-hard | Main agent from a HARD evaluator flag | Clear violation of a named inclusion / exclusion criterion | After adjudication |
qa-soft-include | Main agent from an inclusion-validator SOFT flag | Borderline inclusion | After adjudication |
qa-soft-exclude | Main agent from an exclusion-validator SOFT flag | Borderline exclusion | After adjudication |
qa-wrong-code | Main agent from an exclusion-validator WRONG_CODE flag | Exclusion stands but the code is wrong | After the exclusion code is corrected on the item |
qa-adjudicated-include | Human after reviewing flag | Final decision: INCLUDE | Never (permanent adjudication record) |
qa-adjudicated-exclude | Human after reviewing flag | Final decision: EXCLUDE | Never |
If the human adjudicator flips an automated decision, the Zotero tag set is updated atomically:
fulltext:* tag → add the opposite one.qa-* severity tag → add the matching qa-adjudicated-*.screening/fulltext_screening.csv for
provenance (who flipped, when, why). The CSV is run-history; the
tag is the current state.| Note title | Attached to | Written by | Purpose |
|---|---|---|---|
SLR Coding | Every item with fulltext:include | fulltext_code.py after each coding decision | Structured coding fields (constructs, method, findings — see screening_config.py:FULLTEXT_CODING_FIELDS). The adjudicator reads this note directly in Zotero; the CSV row is parallel provenance. Overwritten on --full-recode; selectively updated on --update-fields. |
A SLR Coding note is created on first code, overwritten on
re-code (via --full-recode), and never deleted automatically.
If the adjudicator edits a field inline in Zotero, the edit is
authoritative — subsequent fulltext_code.py runs skip that item
unless --full-recode is passed.
--only-keys / --rerun
/ --full-recode flags are the escape hatches for re-processing
specific items.fulltext_code.py processes items
tagged abstract:include OR abstract:borderline.
export_coded_includes.py reads items tagged fulltext:include
(adjudication flips propagate automatically because tags are
authoritative).fulltext:include
items, manuscript_stats.py queries Zotero, not the CSV.All scripts live under ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/. Invoke
with uv run; first-run uv installs declared deps into an ephemeral
venv automatically. Invocations below show the most common form; run
each script with --help to see the full flag surface (every script
has additional options for re-processing, parallelism, caching, and
single-item debugging).
| Stage | Script | Invocation |
|---|---|---|
| Multi-database formal search | search.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/search.py --config ./search_config.py [--databases scopus,wos,openalex,semantic_scholar] |
| Single-database piloting (Scopus) | search_scopus.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/search_scopus.py --config ./search_config.py |
| Single-database piloting (Web of Science) | search_wos.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/search_wos.py --config ./search_config.py |
| Single-database piloting (OpenAlex, free) | search_openalex.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/search_openalex.py --config ./search_config.py |
| Single-database piloting (Semantic Scholar) | search_semantic_scholar.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/search_semantic_scholar.py --config ./search_config.py |
| Filter / trim a search CSV (top-N by year, year range) | filter_search_results.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/filter_search_results.py --input <csv> --output <csv> [--year-min Y] [--year-max Y] [--top-n N] |
| Import deduplicated search CSV into Zotero | import_to_zotero.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/import_to_zotero.py --group <id> --input <search.csv> [--collection <key>] |
| Abstract screening (Claude Haiku on title+abstract) | abstract_screen.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/abstract_screen.py --group <id> --collection <key> --config ./screening_config.py |
| Full-text screening + structured coding (Claude Sonnet) | fulltext_code.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/fulltext_code.py --group <id> --collection <key> --config ./screening_config.py --pdf-dir ./pdfs |
| Update specific coding fields on already-coded items | fulltext_code.py --update-fields | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/fulltext_code.py --group <id> --collection <key> --config ./screening_config.py --pdf-dir ./pdfs --update-fields FIELD1,FIELD2 |
| Summarise screening / coding decisions across passes | screening_report.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/screening_report.py <log.csv> [--list <decision>] [--list-rescreened] |
| Fetch missing abstracts (multi-source cascade) | enrich_abstracts.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_abstracts.py --filter-keys-file <keys> |
| Attach missing PDFs (multi-source cascade) | enrich_pdfs.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --filter-keys-file <keys> |
| Attach Wiley PDFs only (TDM token) | enrich_pdfs.py --sources wiley | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --sources wiley --filter-keys-file <keys> |
| Attach Cloudflare-gated PDFs (Sage, APA, T&F, Emerald, …) | enrich_pdfs.py --sources browser | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --sources browser --filter-keys-file <keys> |
| Audit library (missing abstracts / PDFs / stubs) | audit_zotero_library.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/audit_zotero_library.py --group <id> |
| Export includes-only coded view | export_coded_includes.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/export_coded_includes.py --log-csv <screening.csv> --out <coded.csv> |
Generate references.bib from manuscript keys | generate_bib.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/generate_bib.py <project_dir> |
Additional templates shipped with the plugin:
${CLAUDE_PLUGIN_ROOT:-.}/templates/search_config.py — journal
list, query definitions, year window. Read by search.py and
search_openalex.py.${CLAUDE_PLUGIN_ROOT:-.}/templates/screening_config.py — system
prompts for abstract screening and full-text coding, plus the
FULLTEXT_CODING_FIELDS list that drives the coding schema.test_common.py into your
project's scripts/ directory). One file per skill so failures map
back cleanly to the rule-book the regression violated:
${CLAUDE_PLUGIN_ROOT:-.}/templates/test_systematic_review.py —
this skill's 11 pipeline invariants (PRISMA arithmetic,
search_run.json integrity, decision-state whitelists,
temperature=0, screening_config round-trip, ghost handling).${CLAUDE_PLUGIN_ROOT:-.}/templates/test_citations.py — @citekey
resolution, bare Author (YYYY) detection, BBT-key uniqueness.
Owned by the grounded-citations / fact-check skills.${CLAUDE_PLUGIN_ROOT:-.}/templates/test_empirical_integrity.py —
forbidden-literal grep, label uniqueness, inline s['…'] key
resolution against the live build_stats() dict, figure-file
existence, manuscript_stats.json ↔ build_stats() content
check. Owned by the empirical-integrity skill.${CLAUDE_PLUGIN_ROOT:-.}/templates/test_common.py — shared
TestRunner infra the three test files import.${CLAUDE_PLUGIN_ROOT:-.}/templates/manuscript_stats.py —
flat-dict builder that reads every pipeline output and returns keys
like screen.n_included, search.unique_dois, etc. for inline
lookup in the manuscript. Copy into the project's
analysis/manuscript_stats.py; extend as the manuscript needs new
facts. Output: analysis/results/manuscript_stats.json (written by
the script's CLI mode; never hand-edited).${CLAUDE_PLUGIN_ROOT:-.}/templates/manuscript_tables.py —
pandas-based table functions (methods, regions, exclusion reasons,
construct families) for Quarto code chunks. Keeps prose readable.
Copy into the project's manuscript/manuscript_tables.py so the
.qmd can from manuscript_tables import ....${CLAUDE_PLUGIN_ROOT:-.}/templates/manuscript.qmd — Quarto
scaffold with setup chunk importing build_stats(), placeholder
sections, and example inline expressions showing every methodology
number wired to s['key'] rather than hand-typed.A project CLAUDE.md template for new SLR projects lives at
${CLAUDE_PLUGIN_ROOT:-.}/templates/sr_claude_md.md. A
manuscript-only variant (no SLR-pipeline scaffolding, for research-report
editing projects) lives at
${CLAUDE_PLUGIN_ROOT:-.}/templates/manuscript_claude_md.md.
Pilot before the formal run. Before committing to the formal search
parameters, probe each candidate database with a handful of keyword
combinations to surface volume estimates and construct-coverage gaps.
Per the Scripted searches only principle above, MCP tools are
acceptable for piloting (they are fast and session-scoped), and are
the only way to probe Scopus / OpenAlex / Semantic Scholar without
first spinning up the full scripted-search machinery. The formal run
then uses the scripted searchers under scripts/pipelines/.
Source preference ordering. Which databases to include depends on what the user's institution provides. Degrade gracefully rather than blocking on a missing subscription:
| Preference | Source | Access | Notes |
|---|---|---|---|
| 1 (preferred) | Web of Science Expanded | Script only, via WOS_API_KEY_EXTENDED. No MCP. | Strongest field coverage for social-sciences SR. Use WOS_API_KEY_EXTENDED, not WOS_API_KEY — Starter's IS= ISSN filter returns 0 results and blocks journal-list filtering. |
| 2 | Scopus | Script + MCP (mcp__scopus__*). Requires ELSEVIER_API_KEY or SCOPUS_API_KEY. | Strong alternative when WoS is unavailable. Covers the same journal set as WoS with different dedup patterns. |
| 3 | OpenAlex | Script + MCP (mcp__openalex__*). Free, no subscription. | Open-access baseline; always usable. Weaker field-precision for niche social-sciences topics, but improves year over year. |
| 4 | Semantic Scholar | Script + MCP (mcp__semantic-scholar__*). Free tier available. | Good for recent work and preprints; complementary to the above. |
A formal SR typically combines two or three sources from this list
— the exact mix depends on access. A user without WoS or Scopus can
still run a defensible SR using OpenAlex + Semantic Scholar, provided
the coverage gaps are disclosed in the methods section (pulled from
search_metadata.json via the stats dictionary; never typed in prose).
Technical tips for search design:
TS="growth aspiration" misses plural "aspirations". Always
write TS=("growth aspir*" OR ...).Cascade in order: Crossref → Semantic Scholar (DOI) → Semantic Scholar (title) → Scopus → ScienceDirect → OpenAlex GROBID.
abstract_inverted_index. Often reconstructed
from GROBID full-text parsing — returns body-text fragments, not
abstracts. See https://bmkramer.github.io/SesameOpenScience_site/thought/202411_open_abstracts/.<abstract> element is the acceptable last-resort
OpenAlex source; still verify length > 60 chars and sense-check.A four-phase cascade that enrich_pdfs.py runs automatically. Each
phase handles a class of item the previous phase can't; nothing is
ever silently dropped.
Phase 1 — API cascade (enrich_pdfs.py default mode). Works for
most open-access and publisher-TDM-enabled items:
publisher TDM API (Elsevier, Wiley) → Crossref TDM → PMC
→ OpenAlex Content → Unpaywall → OpenAlex OA metadata
Elsevier and Wiley TDM require ELSEVIER_API_KEY and
WILEY_TDM_TOKEN. OpenAlex Content is paid ($0.01 per download, gated
on OPENALEX_API_KEY).
Phase 2 — browser cascade for Cloudflare-gated publishers
(enrich_pdfs.py --sources browser). HTTP clients cannot solve the
Cloudflare JS challenge, so for Sage, OUP, Taylor & Francis, Emerald,
and similar CF-gated publishers, a Playwright-driven Chromium opens
visibly. The user passes the Cloudflare challenge once per publisher;
the authenticated session then captures subsequent downloads
automatically. First-time use needs a one-time browser install:
uvx playwright install chromium (the setup wizard pre-approves
this command). If the browser cascade regresses, file an issue and
attach the run log (--log-csv) so the failure can be reproduced.
Run the browser cascade in your own terminal — not via the Bash tool. The Playwright window opens visibly and prompts you for Cloudflare / SSO confirmation. The agent's Bash subprocess has no controlling TTY, so the script detects this on startup and exits with a paste-in command rather than silently hanging on the first prompt. For unattended runs (cron, agent loops) pass
--no-prompt— the script then auto-skips publishers that would prompt and records them in the run log.
Phase 3 — Zotero Connector + institutional SFX/OpenURL
(enrich_pdfs.py with Connector handlers). For items the browser
cascade can't reach directly — typically paywalled content accessed via
library proxy — the script launches Zotero Desktop's Connector
extension and routes requests through the institution's SFX/OpenURL
resolver (scripts/pipelines/fetchers/library_resolver.py). Requires:
Zotero Desktop running locally, Zotero Connector installed in the
Chromium profile, and the institution's OpenURL base URL configured.
Phase 4 — graceful failure. Items that all three phases fail on
are logged with a status code (connector_zotero_unavailable,
connector_save_failed, connector_sw_timeout,
connector_extension_missing, and others defined in enrich_pdfs.py)
so the user can surface the residual list for manual retrieval.
Never silently drop items — a paper with no attached PDF after all
phases is a data-quality signal, not a failure to hide.
Cross-cutting tips (apply at every phase):
%PDF magic bytes and parse-test the PDF
before caching. Some downloaders save HTML-with-200 or corrupted PDFs.user_data_dir
with plugins.always_open_pdf_externally=true in Preferences —
otherwise PDFs open inline and neither expect_download nor
expect_response captures the bytes."temperature": 0
in screening scripts.ThreadPoolExecutor + threading.Lock on the
CSV log. Default 8 workers for Haiku / Gemini Flash, 5 for Sonnet / Gemini Pro.llm_helpers.extract_json_from_response() which walks for the first
balanced {...}. Errored rows write decision=error with truncated
response in reason; --rerun retries only those.Before screening, query a predatory-journal list (Beall's archive at
https://beallslist.net/ or equivalent) for each journal ISSN. Papers
from listed journals get a predatory:flag tag in Zotero. This is a
warning, not an exclusion — the author decides during full-text
review whether to keep each flagged paper. Transparent flagging
(not silent removal) is the rule.
PRISMA quality assessment should catch retracted papers in the
included set — citing a retracted paper is a fact-check failure mode.
Run this check after full-text coding is complete and before
exporting coded_papers.csv, so retractions don't slip into the
manuscript's bibliography.
The mechanism (mcp__zotero__scite_check_retractions, the
retracted:flag tag convention, "flag — don't silently drop")
lives in zotero-operations — see its Optional: retraction check
section. The SR-specific twist is scope and timing:
fulltext:include so it runs against papers that matter for the
synthesis, not the full library.export_coded_includes.py; the
adjudicator decides whether to keep retracted items (with a
prominent discussion note), replace the citation, or exclude.After every automated full-text screening / coding run (and after every re-run following prompt changes), launch three parallel evaluator agents, then run a human adjudication loop on whatever they flag. Abstract screening is typically not re-QAed — its errors surface at Stage 2 anyway — but the pattern works identically if you want to.
Launch in a single message, multiple Agent tool calls so they
run in parallel (≈max-of-three latency instead of sum-of-three).
Every evaluator flags items; no evaluator ever re-decides.
include, with
the automated reason and the key coding fields. Prompt asks it to
flag false positives — papers that slipped through despite
failing one of the inclusion criteria. Each flag marks severity
HARD (clearly fails a named criterion) or SOFT (borderline,
defensible). Returns a bulleted list, one per flagged item, with
item_key, severity, and a one-sentence reason.WRONG_CODE:
the exclusion stands but the code is wrong (e.g. exclusion E3 when
the real reason is E1).The 20 % threshold for coding-quality spot-checks is the plugin's default. Smaller corpora (< 40 includes) warrant 100 % review; larger corpora (> 200 includes) can drop to 10 % with a quality audit built in.
Evaluators run as Agent calls in the main session — they cannot
write to Zotero themselves. The main agent takes each flag the
evaluators return and applies the appropriate qa-* tag via
mcp__zotero__zotero_update_item (with an add_tags parameter) or
mcp__zotero__zotero_batch_update_tags for the bulk case.
See Zotero tag and note conventions above for the full tag
vocabulary — qa-flag, qa-hard, qa-soft-include,
qa-soft-exclude, qa-wrong-code, and the two post-adjudication
qa-adjudicated-* tags.
Required: each evaluator emits a decisions.json alongside the
markdown report. The markdown is for human review; the JSON is the
machine-actionable input that apply_qa_adjudications.py consumes
once the human has reviewed and (optionally) overridden the
evaluator's calls. Schema:
[
{
"item_key": "ABCD0001",
"verdict": "include" | "exclude" | "borderline",
"reason": "(optional) free-text rationale, written to apply log",
"flip_fulltext": false
}
]
flip_fulltext is true only when the adjudication overrides
the screener's fulltext:* tag — leave false for confirmations of
the original decision. The evaluator subagent prompt must instruct
the model to emit one JSON entry per flagged item.
The human opens Zotero, filters the collection by qa-flag, and for
each flagged item:
SLR Coding child note.qa-hard / qa-soft-* /
qa-wrong-code) and qa-flag; adds qa-adjudicated-include or
qa-adjudicated-exclude.fulltext:* tag and adds the opposite one. Tags are the
authoritative state — a flip that doesn't update the fulltext:*
tag leaves Zotero inconsistent with the adjudication.SLR Coding child note and removes
qa-wrong-code.screening/fulltext_screening.csv (who flipped, when, why). The
CSV is run-history; the Zotero tag is the current state. Downstream
scripts (export_coded_includes.py, manuscript_stats.py,
test_systematic_review.py) read from Zotero, not from the CSV.screening/qa_review.md recording the decision
(format below).Bulk-applying many adjudications: when the human-curated decision list is non-trivial (more than ~5 items), use the shipped pipeline script rather than per-item MCP tag calls:
uv run "${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/apply_qa_adjudications.py" \
--group <id> --decisions .claude/qa/decisions.json
The script consumes decisions.json (the schema in Applying QA
tags above), routes through zotero_io.batch_update_tags —
constructing the full PATCH payload with the right version field
per item — and writes an audit log to output/qa_adjudications_log.csv.
Replaces the user's earlier ad-hoc adjudication script that hit a
silent-write pyzotero footgun (calling add_tags() with a stub item
dict drops the write).
screening/qa_review.md structureA single markdown file in the project's screening/ directory with
two sections.
Scope clarifications. Protocol-level decisions the adjudicator made while working through flags. These apply going forward and propagate back into the screening prompt version for any future re-run. Format:
- <one-line rule> — <paragraph rationale>. (YYYY-MM-DD)
Example: "Cross-country GEM studies at country-year level are in scope. Rationale: the GEM cluster is a coherent strand; fragmenting it weakens synthesis."
Adjudication log. One line per flagged item, in processing order. Format:
{item_key}{short citation} — {kept DECISION / flipped to DECISION [EXCLUSION_CODE]} — <one-to-two-sentence rationale>. (YYYY-MM-DD)
Group related flips onto one line when the rationale is identical (e.g. "10 GEM studies — all kept INCLUDE — see scope clarification 1"). Individual contentious flips get their own line.
This file is the methods-section evidence for the manuscript's QA paragraph. Without it, the adjudication is not reproducible.
You are about to silently drop a qa-flaged item — remove the
flag without recording a disposition in the adjudication log. Never.
Every flagged item gets one line in qa_review.md, even if the
decision is "kept without change". Silent drops break the
reproducibility invariant that makes the QA step worth the effort.
These rules supplement the empirical-integrity skill with SR-specific
patterns:
search_metadata.json. Never
import scripts (side effects); parse with
re.search(r'CONSTANT\s*=\s*"([^"]+)"', source). Keywords, year
bounds, model names, prompt versions all live in the metadata file.
analysis/manuscript_stats.py then ingests search_metadata.json and exposes
each field under s['search.*'] or s['provenance.*'] in the
manuscript's stats dictionary — the manuscript never reads
search_metadata.json directly.claude-haiku, claude-sonnet), keyword strings, year bounds.
These must use inline expressions from the stats dictionary
(s['search.databases'], s['provenance.fulltext.model'], …).include + borderline + exclude = total screened; coded include + exclude = total coded. Catches missing
items or pipeline drops.search_run.json records the
canonical count of unique DOIs from the scripted search. Post-import
invariant: Zotero DOIs == search DOIs. Abort if extras exist (items
added outside the pipeline).See empirical-integrity for the overall approach and file layout.
SR-specific invariants live in
${CLAUDE_PLUGIN_ROOT:-.}/templates/test_systematic_review.py (copy into
the project's scripts/). The file ships 14 active tests:
| Test | What it catches |
|---|---|
| Pipeline artefacts exist and non-empty | Pipeline didn't run |
search_run.json marker matches dedup CSV | Stale or missing integrity gatekeeper |
search_metadata.json has required fields | Export bug |
| No duplicate DOIs in dedup CSV | Dedup gap |
| Abstract log uses allowed decision states | Pipeline emitted an unexpected abstract-stage decision |
| Fulltext log final decisions | Non-final (error) decision left at the end of the fulltext log |
| PRISMA arithmetic | Screening funnel inconsistency |
| Coded count == fulltext includes | Export/coding drift |
temperature=0 pinned in Claude calls | Reproducibility regression |
screening_config constants match logs | Config changed without a re-run |
No decision=error left in fulltext log | Unresolved screening errors |
| No ghost keys (fulltext log ⊆ Zotero) | Items removed or renamed outside the pipeline |
| Fulltext tags consistent with CSV log | Zotero tag state diverges from CSV decisions — tag write-back failed, or an out-of-band CSV edit wasn't mirrored in Zotero |
Every fulltext:include item has an SLR Coding note | Include-tag set without a coded note — export script has nothing to read for that paper |
BBT-key uniqueness and coded_papers.csv → references.bib resolution
live in test_citations.py (citation concerns). Manuscript-prose
invariants — forbidden literals, label uniqueness, inline s['…']
resolution, figure-file existence — live in
test_empirical_integrity.py. Zotero-collection dedup checks are run
via mcp__zotero__zotero_find_duplicates, not as a static test.
Grow the suite with the pipeline. When you find a new SR-pipeline
regression a static check could catch — a new metadata field that
should round-trip, a new PRISMA edge case, a new Zotero-drift pattern —
add the test to scripts/test_systematic_review.py before closing
out the task. The failure becomes the sentinel so the same class of
mistake can't silently return across runs.
This skill targets social-sciences systematic reviews (management, entrepreneurship, IS, organizational behavior). Medical / clinical SLR instruments — evidence hierarchies (I–VII), RoB 2, ROBINS-I, PRISMA-P preregistration — are out of scope for v0.1. A medical-SLR variant would need those plugged in; forcing them into social-science reviews is domain-inappropriate.
<abstract> element.%PDF
magic bytes.~/.config/academic-research/config.toml via
cat, head, tail, grep, less, more, awk, sed, a
Python script, or any other command. NEVER read that file. It
holds API keys. Pipeline scripts read it via Python's open()
outside your tool layer; you have no legitimate reason to inspect
it. If debugging feels like it needs a look inside the file, ask
the user to re-run /setup — that's the reset path.scripts/pipelines/ covers the task, invoke
it. If none does, tell the user which task is missing and propose
adding a shipped script — do not write a one-off. Improvised
scripts leak keys through your context and sidestep pre-approved
permissions.npx claudepluginhub mronkko/claude-academic-research --plugin academic-researchBuilds systematic literature databases for sociology research using OpenAlex API. Guides through scope definition, initial search, screening, snowballing, annotation, and synthesis with user pauses.
Guides systematic, scoping, and narrative literature reviews using PRISMA/PRISMA-ScR protocols, Boolean/MeSH search strategies, databases (PubMed, Scopus, Web of Science, Embase), screening, extraction, synthesis, and reporting.
Conducts systematic literature reviews across PubMed, arXiv, bioRxiv, and Semantic Scholar, producing markdown and PDF output with verified citations. Use for meta-analysis, research synthesis, or broad literature searches in biomedical and scientific domains.