From thinking-frameworks-skills
Scores candidate papers against a keyword watchlist and relevance criteria, returning KEEP/DROP/REVIEW with a 0-100 score and rationale. Domain-neutral, for literature-scan workflows after fetching papers from bioRxiv, medRxiv, or PubMed.
How this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-frameworks-skills:paper-relevance-filterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Decide whether a fetched paper belongs in this week's digest. Output is a per-paper decision (KEEP / DROP / REVIEW), a 0-100 score, and a one-line rationale that the user can audit.
Decide whether a fetched paper belongs in this week's digest. Output is a per-paper decision (KEEP / DROP / REVIEW), a 0-100 score, and a one-line rationale that the user can audit.
- [ ] Step 1: Load relevance criteria + the watchlist + last-4-weeks kept-paper IDs
- [ ] Step 2: Score each paper on three axes (match, criteria, novelty)
- [ ] Step 3: Combine to a 0-100 score; map to KEEP / REVIEW / DROP via thresholds
- [ ] Step 4: Apply tie-breakers (cap output at requested max kept count)
- [ ] Step 5: Return decisions + a calibration summary
Step 1 — Inputs
The caller hands the skill:
papers: list of normalized paper records (output of fetch-preprint-recent or fetch-pubmed-recent)watchlist: list of keywords/phrases (with optional weights — default weight 1.0)criteria: text from relevance-criteria.md describing what fits and what doesn'tprior_ids: set of id values that appeared as KEEP in any of the last 4 digests (used for novelty)max_kept: target ceiling, e.g. 25 (the digest will not exceed this)Step 2 — Three-axis scoring
For each paper, compute three sub-scores in [0, 1]:
Axis 1 — Match strength (0-1)
If keywords carry weights (some matter more than others), use the max weight among matched keywords as a multiplier capped at 1.0.
Axis 2 — Criteria fit (0-1) This is the qualitative axis. Read the abstract against the relevance-criteria document. The criteria typically state:
Score:
If the criteria document is silent on a paper's territory, default to 0.7 and flag for REVIEW.
Axis 3 — Novelty (0-1)
id is not in prior_ids and the title doesn't fuzzy-match any prior titleprior_ids (e.g., same DOI prefix 10.1101/... matched, this is a journal version) — KEEP-worthy but tag as "journal version of preprint covered YYYY-WW"id match in prior_ids (already covered)Use normalized title (lowercase, strip punctuation, collapse whitespace) for fuzzy matching. A Levenshtein ratio > 0.9 against any prior title counts as a match.
Step 3 — Combine and threshold
score = 100 * (0.45 * match + 0.45 * criteria + 0.10 * novelty)
Match and criteria carry equal weight (a paper that mentions your keywords once but is wildly out-of-topic should not score higher than one that's deeply on-topic with a single mention). Novelty is a small finger on the scale — enough to demote already-covered work but not enough to drop a genuinely important journal-version-of-preprint update.
Decision thresholds (default; the calling agent may override):
| Score | Decision | Notes |
|---|---|---|
| 70-100 | KEEP | Goes into the digest |
| 50-69 | REVIEW | Boundary cases — caller decides whether to escalate to user |
| 0-49 | DROP | Filtered out, reason logged in the dropped-papers section |
Special-case override: if novelty == 0.0 (already in a prior digest), force DROP regardless of score. The papers section may still list it as "already covered" for traceability.
Step 4 — Tie-breakers when KEEP > max_kept
When more papers score ≥ 70 than max_kept:
max_kept, and demote the rest to REVIEW (not DROP — they were good enough; just couldn't fit). Surface this in the calibration summary.Never demote to DROP what scored ≥ 70 unless explicitly forced.
Step 5 — Return
{
"decisions": [
{
"id": "10.1101/2026.05.07.123456",
"decision": "KEEP",
"score": 84,
"axes": {"match": 0.9, "criteria": 1.0, "novelty": 1.0},
"rationale": "Title + abstract hit 'protein language model' twice; in-scope (primary methods paper, empirical); novel.",
"tags": []
},
{
"id": "PMID:39000000",
"decision": "KEEP",
"score": 72,
"axes": {"match": 1.0, "criteria": 0.7, "novelty": 0.5},
"rationale": "Strong keyword match; review article (criteria penalty); journal version of preprint covered 2026-15.",
"tags": ["journal-version-of:2026-15"]
},
{
"id": "PMID:39111111",
"decision": "DROP",
"score": 31,
"axes": {"match": 0.5, "criteria": 0.0, "novelty": 1.0},
"rationale": "'protein language model' appears once in abstract but the paper is a clinical trial enrollment report — out of scope.",
"tags": ["look-alike-trap"]
}
],
"calibration": {
"kept": 17,
"review": 3,
"dropped": 84,
"force_dropped_already_covered": 2,
"demoted_for_cap": 0,
"stricter_pass_applied": false
}
}
Pattern A — Strict weekly digest: defaults above. Tight thresholds; max_kept=25.
Pattern B — Catch-up over multiple weeks: run per-week with the same prior_ids growing each iteration. Don't pool all 3 weeks of papers and filter once — you'll lose the historical-context signal.
Pattern C — Topic deep-dive (user wants more, not less): relax max_kept to a high number (e.g. 100), keep thresholds, return the full ranked list. Only do this on explicit user request.
Pattern D — Sanity-check the watchlist itself: run with prior_ids = [] and look at calibration.dropped. If the same theme keeps getting dropped for criteria reasons, the watchlist may be drifting away from intent.
relevance-criteria.md text. If something's not in the criteria, the answer is REVIEW with rationale "criteria silent" — not "I think it fits."| Decision | Score | Action by caller |
|---|---|---|
| KEEP | 70+ | Include in digest, cluster, synthesize |
| REVIEW | 50-69 | Surface in a "boundary cases" section, ask user |
| DROP | 0-49 | Log in dropped-papers list with rationale |
| Axis | Default weight | What it measures |
|---|---|---|
| Match | 0.45 | How strongly watchlist keywords appear in title/abs |
| Criteria | 0.45 | Qualitative fit against relevance-criteria.md |
| Novelty | 0.10 | Not already in last-4-weeks digests |
npx claudepluginhub lyndonkl/claude --plugin thinking-frameworks-skillsCollaboratively builds and refines paper screening rubrics through brainstorming, test-driven development, and iterative feedback. Use when starting literature searches with 50+ papers or fixing misclassifications.
Discovers, filters, and deep-reads academic papers via Scholar Inbox API and NotebookLM. Use for browsing today's papers, getting recommendations, rating/collecting, and asking questions about papers.
Searches, verifies, and organizes real academic papers with traceable links (DOI, publisher, CNKI, Google Scholar). Handles literature reviews, core-paper lists, and mixed Chinese/English sources.