From smp-kat-tools
Fetch medical references (PubMed, PMC, EuropePMC, RACGP/ACRRM/AMC exam reports, AJGP, AIHW, ClinicalKey, UpToDate, AMH, eTG, Murtagh, plus 8 generic publishers via USyd OpenAthens) using a unified Python framework. Use when the user provides a DOI, PMID, PMC ID, RACGP exam cycle, AU clinical reference, or asks to fetch any academic / examiner-feedback / clinical-guideline content. Returns standardised AcquisitionResult with tier-tagged content. Single auth gateway (USyd OpenAthens) prompts fresh per run; graceful OA fallback on no-subscription.
How this skill is triggered — by the user, by Claude, or both
Slash command
/smp-kat-tools:medical-acquisitionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Fleet-wide unified scraper for medical references. One contract, 21 backends, single import surface. The framework lives at `~/claude-library/acquisition-client/`. Compatibility shim at `~/projects/personal/scrape-med-references/src/acquisition/` keeps existing absolute-path imports working.
Fleet-wide unified scraper for medical references. One contract, 21 backends, single import surface. The framework lives at ~/claude-library/acquisition-client/. Compatibility shim at ~/projects/personal/scrape-med-references/src/acquisition/ keeps existing absolute-path imports working.
Activate when the request involves any of:
Do NOT activate for: general SERP search (use Tavily / Brightdata search), screenshot-driven extraction, or any non-medical content.
~/tools/bin/acquisition list-backends
~/tools/bin/acquisition fetch "10.31128/AJGP-04-23-6803"
~/tools/bin/acquisition fetch "PMID:38825755"
~/tools/bin/acquisition fetch '{"cycle":"2024.2","exam_type":"akt"}' --backend racgp_exam
~/tools/bin/acquisition fetch-batch specs.json --max-concurrent 5
The shim is invocable from any working directory and any Python environment.
from acquisition import fetch, fetch_batch, list_backends
res = fetch("https://pubmed.ncbi.nlm.nih.gov/12345678/")
print(res.tier, res.source_type, res.success)
res = fetch({"pmid": "12345678"}, backend_name="pubmed")
results = fetch_batch(
[
"https://pubmed.ncbi.nlm.nih.gov/12345678/",
{"spec": "https://racgp.org.au/ajgp/...", "backend": "ajgp"},
],
max_concurrent=5,
)
If installed via pip install -e ~/claude-library/acquisition-client, no path manipulation is needed. Otherwise inject ~/claude-library/acquisition-client/src into sys.path first.
Open access, no auth:
pubmed (P2, abstract via E-utilities)pmc (P2, full text from PMC OA subset)europepmc (P2, full text mirror with broader coverage)unpaywall (P2, OA PDF locator by DOI)doi (P2, dispatcher that walks PMC then Unpaywall then EuropePMC then Wayback)wayback (P3, Internet Archive fallback)Australian clinical, no auth:
racgp_exam (P1, examiner reports KFP / AKT / OSCE / RCE by cycle)acrrm_exam (P1, ACRRM Rural Generalist examiner feedback)ajgp (P1, Australian Journal of General Practice articles)aihw (P1, Institute of Health and Welfare reports)choosing_wisely (P1, Choosing Wisely Australia recommendations)Authed clinical, USyd-brokered, cache-first:
clinicalkey (P0, Talley Examination Med 10e and other ClinicalKey content)uptodate (P0, topic cards via Camoufox crawler)amh (P0, Australian Medicines Handbook drug monographs)etg (P0, Therapeutic Guidelines topics)murtagh (P0, Murtagh's General Practice 9e chapters)USyd-brokered generic publishers (paywall fallback to OA chain):
wiley, elsevier, springer, nature, taylor_francis, bmj, cambridge_oxford, institutional_proxyUse list_backends() for the live JSON catalogue including requires_auth, default_tier, rate_limit_per_minute, and description per backend.
Every backend returns an AcquisitionResult:
| field | type | notes |
|---|---|---|
url | str | canonical URL the content came from |
retrieval_date | str | YYYY-MM-DD UTC |
source_type | str | e.g. pubmed_abstract, ajgp_article, etg_topic |
tier | str | P0 / P1 / P2 / P3 |
content_md | str | Markdown rendering |
content_html | Optional[str] | raw HTML when available |
metadata | dict | backend-specific (authors, year, DOI, ...) |
success | bool | False on any failure |
error | Optional[str] | error message when success is False |
fallback_used | Optional[str] | name of fallback path taken, if any |
Failures inside a backend are caught at the framework level and folded into a result with success=False. Batch callers never need to wrap individual calls.
The owner's only institutional auth route is USyd OpenAthens. No direct subscriptions to Wiley, Elsevier, Springer, Nature, Taylor and Francis, BMJ, Cambridge, Oxford, ClinicalKey, UpToDate, AMH, eTG, or Murtagh exist. Every paywalled backend routes through *.usyd.idm.oclc.org via acquisition.auth.usyd_openathens.
Behaviour:
get_session() in a Python process invokes prompt_oauth("usyd_openathens", ...). The user pastes cookies, a verbatim Cookie header, and an optional User-Agent.requests.Session without re-prompting. Never persisted to disk across runs.success=False, error="usyd_no_subscription_or_paywall". The DOI dispatcher catches this and falls through to Unpaywall, then Europe PMC, then Wayback.proxy_url("https://onlinelibrary.wiley.com/doi/10.1002/x") returns "https://onlinelibrary-wiley-com.usyd.idm.oclc.org/doi/10.1002/x". Dots in the publisher hostname become hyphens; usyd.idm.oclc.org is appended.acquisition.auth.usyd_openathens.reset_session() clears the cached session and forces a fresh prompt on the next get_session() call. Use when an OpenAthens session expires mid-batch.from acquisition.auth.usyd_openathens import (
get_session, proxy_url, is_paywall_interstitial, reset_session,
)
session = get_session()
proxied = proxy_url(direct_url)
resp = session.get(proxied, timeout=60)
if resp.status_code in (401, 402, 403) or is_paywall_interstitial(resp.text):
pass # fall through to OA chain
ClinicalKey, UpToDate, AMH, eTG, and Murtagh keep their cache-first behaviour against local SQLite databases (talley-exam-med-10e.db, uptodate.db, amh-online.db, etg-complete.db, murtagh-gp-9e.db). On cache miss the backend calls get_session() to confirm the USyd session is live, then returns a structured fail-loud message pointing the caller at the dedicated CDP / Camoufox crawler script. The framework never silently replays cookies and never invents content.
Use this skill when:
AcquisitionResult shape for downstream pipelinesUse Brightdata / Firecrawl / Tavily / Exa MCP tools instead when:
fetch_batch uses asyncio plus a Semaphore to cap simultaneous fetches. Each backend's synchronous fetch() is wrapped in asyncio.to_thread so a single blocking source does not stall the loop. Per-source rate limits are the backend's own responsibility; the framework caps wall-clock concurrency only.
~/claude-library/acquisition-client/~/tools/bin/acquisition~/projects/personal/scrape-med-references/src/acquisition/~/.claude/rules/medical-acquisition.md~/claude-library/claude-md-fragments/medical-acquisition.mdGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub anon2023-halmoni/claude-fleet-marketplace --plugin smp-kat-tools