From research-pipeline
Full BibTeX import and abstract enrichment pipeline running entirely in Cowork. Parses BibTeX files, creates research libraries, enriches missing abstracts via OpenRouter/Gemini, deduplicates, and inserts structured citations into Supabase. Use this skill whenever the user says "import my BibTeX," "upload my bibliography," "import this into my research library," "process this BibTeX," "add these papers," "start a new research project" with an attached file, "load my references," or any request to import academic references from a BibTeX file into the research pipeline. Also trigger when a .bib file is uploaded or when the user mentions BibTeX in the context of getting started with the research pipeline.
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-pipeline:import-bibtexThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Parse, enrich, deduplicate, and load a BibTeX file into the research pipeline — entirely
Parse, enrich, deduplicate, and load a BibTeX file into the research pipeline — entirely within Cowork.
The user will either:
.bib file directly in the conversationIf the file is uploaded, read it from the uploads path. If pasted, capture the full text.
Ask the user what to name this research project, or let them pick an existing library.
New library:
INSERT INTO research_libraries (name, description, metadata)
VALUES ('{name}', '{description}', '{"created_by": "dorian"}'::jsonb)
RETURNING id, name
Existing library:
SELECT id, name FROM research_libraries ORDER BY created_at DESC
Store the library_id for all subsequent operations.
Use the bundled parser script at references/bibtex-parser.py:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/research-pipeline/references/bibtex-parser.py /path/to/input.bib /tmp/parsed_citations.json
Or parse inline if the file is small (<50 entries):
Split the content on @type{ patterns. For each entry extract:
citation_key — the key after @type{entry_type — article, misc, inproceedings, etc.title, authors, year, journal, abstract, doi, arxiv_id, urlbibtex_raw — the original entry textDeduplicate by citation_key + abstract signature. Report to the user:
"Found [X] entries, [Y] unique after deduplication, [Z] already have abstracts."
For entries without meaningful abstracts (empty, placeholder text, or missing entirely):
Group entries into batches of 10-15 for efficient LLM processing. For each batch, construct a prompt and send to OpenRouter via Rube MCP:
Via Rube MCP (preferred):
Use RUBE_REMOTE_WORKBENCH or RUBE_MULTI_EXECUTE_TOOL to call the OpenRouter
chat completions endpoint:
POST https://openrouter.ai/api/v1/chat/completions
Authorization: Bearer {OPENROUTER_API_KEY}
Content-Type: application/json
{
"model": "google/gemini-3-pro-preview",
"messages": [
{"role": "system", "content": "...enrichment prompt..."},
{"role": "user", "content": "...batch of entries..."}
]
}
Via direct LLM call (fallback):
If Rube is unavailable, process entries directly in conversation. For each entry missing an abstract:
https://api.crossref.org/works/{doi}For each batch sent to the LLM:
You are processing BibTeX entries that need abstracts. For each entry:
1. If a meaningful abstract exists, preserve it exactly
2. If the abstract is missing or placeholder text:
a. If a DOI or URL is provided, describe what the paper likely covers based on
the title, authors, journal, and year
b. Generate a 2-4 sentence academic abstract that:
- States the central aim or problem
- Synthesizes likely main arguments or methods
- Highlights potential implications
c. Mark generated abstracts with [AI-generated] prefix
Output each entry as JSON with fields: citation_key, abstract, abstract_source
where abstract_source is "original", "crossref", or "generated"
After each batch completes, report progress: "Enriched batch [N] of [M] — [X] abstracts found via API, [Y] generated, [Z] preserved."
For each parsed and enriched citation, insert into the citations table:
INSERT INTO citations (
library_id, citation_key, entry_type, title, authors, year,
journal, abstract, doi, arxiv_id, url, bibtex_raw,
verification_status, source
) VALUES (
{library_id}, '{citation_key}', '{entry_type}', '{title}', '{authors}',
{year}, '{journal}', '{abstract}', '{doi}', '{arxiv_id}', '{url}',
'{bibtex_raw}', 'unverified', 'bibtex_import'
)
ON CONFLICT (library_id, citation_key) DO UPDATE SET
abstract = EXCLUDED.abstract,
updated_at = now()
Use ON CONFLICT ... DO UPDATE so re-imports update abstracts without creating duplicates.
Process in batches of 50 for efficiency.
Generate an annotated bibliography markdown file:
# Annotated Bibliography: {library_name}
Generated: {date}
Total citations: {count}
## Citations
### {citation_key_1}
**{title}** ({year})
{authors}
*{journal}*
DOI: {doi}
{abstract}
---
### {citation_key_2}
...
Upload to Google Drive in the user's preferred folder. Store the Google Drive link.
Present the final results:
BibTeX Import Complete
══════════════════════
Library: {name} (ID: {library_id})
Total entries parsed: {total}
Duplicates removed: {dupes}
Citations inserted: {inserted}
├─ With original abstracts: {original_count}
├─ Abstracts from CrossRef: {crossref_count}
├─ AI-generated abstracts: {generated_count}
└─ No abstract available: {no_abstract_count}
Annotated Bibliography: {google_drive_link}
Next steps:
→ "Verify my citations" — run 4-layer reference validation
→ "Find more papers" — discover related literature
→ "Synthesize my research" — build thematic maps + perspective
For large BibTeX files:
This way the user has their library available immediately and can start synthesis while enrichment continues. Report progress periodically: "Enrichment progress: [X]/[Y] entries processed..."
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub moxywolfllc/moxywolf-plugins --plugin research-pipeline