From research-helper
Search the academic literature for a CS/ML/NLP topic and produce a ranked, clustered, annotated reading list. Use when the user asks "what's been done on X", "lit review on Y", "what's the SOTA for Z", "is W still the standard approach", "what papers should I read about V", or runs /research-helper:lit-scan. Pulls from Semantic Scholar and arXiv with NLP/LLM venue filtering (ACL, EMNLP, NAACL, COLM, NeurIPS, ICML, ICLR, etc.).
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-helper:lit-scanThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Given a topic, produce a structured reading list the user can actually use:
Given a topic, produce a structured reading list the user can actually use: clustered by approach, ranked by impact and recency, with notes on what's seminal vs. what's just-arrived vs. what's missing.
The skill is opinionated for CS/ML/NLP/LLM research. Default venue filter is core NLP (ACL/EMNLP/NAACL/COLM/TACL/CL/Findings) plus top ML conferences that publish lots of NLP (NeurIPS/ICML/ICLR/AAAI/IJCAI/JMLR). arXiv-only papers from the last 6 months are always included so recent work isn't missed.
Triggers:
/research-helper:lit-scan <topic>Do NOT invoke for:
lit-digest skill (until it exists, just read the PDF directly).lit-scan reads two optional environment variables:
OPENALEX_API_KEY — enables authenticated access to the OpenAlex API. Free tier; mostly unnecessary for low-volume use.SEMANTIC_SCHOLAR_API_KEY — strongly recommended; reduces 429 throttling. Get one at https://www.semanticscholar.org/product/api.If a .env file is present in the current working directory (or any parent directory), unset keys are loaded from it before the script runs. Existing environment variables always win — .env is only a fallback.
Format:
OPENALEX_API_KEY=your-key-here
SEMANTIC_SCHOLAR_API_KEY=another-key
# comments and KEY="quoted values" supported
No shell interpolation. The export prefix is not supported.
Most topic queries the user supplies are too broad. "Transformers" returns noise; "parameter-efficient fine-tuning of 7-13B causal LMs, 2024+" returns signal. Before calling the script:
If the user has already given a very specific query, skip refinement.
Example:
User: "lit review on attention efficiency"
You: "That'll return ~thousands of papers. Three narrower options:
A. Linear-attention variants for long context (Performer/Linformer/Mamba/RWKV lineage) B. KV-cache optimization for inference (paged attention, quantized KV, eviction) C. FlashAttention-style kernel-level efficiency (memory-IO-aware kernels)
Which one — or all three?"
python skills/lit-scan/scripts/search.py \
--query "your refined query here" \
--since 2023 \
--max 60 \
--venues default \
--output research/<topic-slug>-<YYYY-MM-DD>.json
Defaults are usually right. Notable knobs:
--since YEAR: how far back to look. Default = 3 years ago. For
fast-moving subfields (LLMs, RLHF) use last 2 years. For more mature
subfields use 5+.--venues {core,default,all,none}:
core = ACL/EMNLP/NAACL/COLM/TACL/CL/Findings only — strictest, NLP-puredefault = core + NeurIPS/ICML/ICLR/AAAI/IJCAI/JMLR + recent arXiv (recommended)all = adds COLING/CoNLL/SIGIR/WMT/BlackboxNLP/LREC/*SEM/NLE/TASLPnone = no venue filtering, sorts purely by citation velocity--max INT: cap. 60 is right for a usable reading list; 200 if exhaustive.--no-s2: disable the Semantic Scholar source.--no-arxiv: disable the arXiv source.--no-openalex: disable the OpenAlex source (enabled by default).--openalex-key KEY: override OPENALEX_API_KEY env var (mostly for testing).--offline: uses fixtures for testing — don't use in real searches.arXiv coverage now includes cs.AI, cs.IR, and stat.ML in addition to cs.CL/cs.LG. The API call uses server-side submittedDate filtering for efficiency and paginates if you ever request more than 2,000 results. Inter-request throttling follows arXiv's "1 request / 3 sec" ToU.
The script writes JSON. Output schema:
[
{
"id": "abc123",
"title": "...",
"authors": ["..."],
"year": 2024,
"venue": "ACL",
"venue_tier": 1,
"citations": 142,
"citation_velocity": 8.4,
"abstract": "...",
"url": "...",
"doi": "...",
"arxiv_id": "2401.12345",
"tldr": "...",
"source": ["s2", "arxiv", "openalex"],
"topics": [
{
"name": "Natural language processing methods",
"score": 0.95,
"subfield": "Artificial Intelligence",
"field": "Computer Science",
"domain": "Physical Sciences"
}
]
}
]
The topics field is contributed by OpenAlex (only present when an OpenAlex
match is found). It uses OpenAlex's 4-level taxonomy (domain → field →
subfield → topic) and is truncated to the top 3 topics per paper.
Already sorted: tier asc, citation_velocity desc, year desc.
Read the JSON yourself — don't just pipe it to the user. Group papers by approach, not by author or institution. Typical clusters for an NLP topic:
For each cluster, identify:
Tag each entry. A paper can have multiple tags.
Output to research/<topic-slug>-<YYYY-MM-DD>.md. Structure:
# Lit-scan: <refined query>
**Date:** YYYY-MM-DD
**Source:** Semantic Scholar + arXiv + OpenAlex, --venues default, --since YYYY
**N papers after dedup/filter:** XX
## Refined query
(What you actually searched, and what the user's original phrasing was if
different.)
## Reading order (start here)
3–5 papers, with one-sentence reason each. This is the "if you read nothing
else, read these" list.
## Clusters
### <Cluster name>
Two-sentence framing of what the cluster is about.
- **\[seminal\] Paper Title** (Author et al., YEAR, Venue) — citations: N
- One-sentence summary of the contribution.
- `<url>`
- **\[sota?, recent\] Paper Title** (Author et al., YEAR, Venue) — citations: N
- ...
### <Another cluster>
...
## What seems missing
Bulleted list of gaps. Be specific:
- "No one in this set evaluates on \<benchmark\>."
- "All approaches assume \<assumption\>; no one challenges it."
- "Cluster X is dominated by one lab — independent replication absent."
## Raw data
Full JSON at `research/<topic-slug>-<YYYY-MM-DD>.json`.
Don't dump the whole reading list into the chat. Say: "Reading list at
<path> — N papers, K clusters, top 3 to start: A, B, C." Let them open the
file.
If they want to drill into one paper, that's the future lit-digest skill's
job (until it exists, just open the PDF and read it together).
S2 rate limit (HTTP 429) — the script retries 3x with backoff. If it still fails, it falls back to arXiv-only. Your reading list will be skewed toward recent preprints; tell the user.
Empty result set — usually means the query is too narrow or uses non-standard terminology. Suggest 2–3 rephrasings using more common terms (e.g., "in-context learning" instead of "few-shot prompting").
Suspiciously few results for a known-hot topic — possible misspelling, or the user is using terminology from a different subfield. Ask them to confirm the spelling and offer alternative phrasings.
All results are recent, no seminal paper appears — widen --since
(e.g., to 2017 if the topic is anything transformer-related).
Result is dominated by one author/lab — flag it explicitly in the "what seems missing" section. Often a sign the user should also search for the critique literature.
| Knob | Default | When to change |
|---|---|---|
--since | now - 3 years | LLMs/RLHF: 2 years. Mature topics: 5+. |
--max | 60 | 100-200 for exhaustive surveys. 20 for a focused starting set. |
--venues | default | core for NLP-purity; all for IR/dialogue/speech; none only if you're stuck. |
| Output path | research/<slug>-<date>.md | Override only if user asks. |
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub mhburg/research-helper --plugin research-helper