From pdf-research
Indexes PDF documents with LightRAG, extracts text via PyMuPDF, builds embeddings and knowledge graphs, enables hybrid semantic searches with citations for document Q&A.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pdf-research:pdf-researchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
When user invokes /pdf-research, Claude should:
python pdf_research.py status to see current configuration# Always run from scripts directory
cd ~/.claude/skills/pdf-research/scripts
# Check current status
python pdf_research.py status
# Index PDFs (when user provides a directory)
python pdf_research.py index /path/to/pdfs
# Search (single query)
python pdf_research.py search "user's question" --mode hybrid
# Interactive search session
python pdf_research.py search
Before running commands, ensure:
# Activate Python environment with dependencies
source /path/to/venv/bin/activate # or use system Python with deps installed
# Ensure OpenAI API key is set
export OPENAI_API_KEY=sk-...
index command)search command)status command)config command)python pdf_research.py index <path>python pdf_research.py statuspython pdf_research.py search "<question>"python pdf_research.py config --pdf-dir <path> --storage-dir <path># Configure defaults (run once)
python pdf_research.py config --pdf-dir /path/to/pdfs --storage-dir ./rag_storage
# Index PDFs
python pdf_research.py index [pdf_dir] [--storage <path>]
# Search (single query)
python pdf_research.py search "query" [--mode hybrid|local|global|naive]
# Search (interactive)
python pdf_research.py search
# Check status
python pdf_research.py status
| Mode | Best For | Description |
|---|---|---|
hybrid | General queries | Combined local + global (default) |
local | Specific facts | Names, numbers, definitions |
global | Summaries | Themes, trends, overviews |
naive | Exact terms | Simple keyword matching |
After indexing, rag_storage/ contains:
| File | Description |
|---|---|
config.json | User configuration |
kv_store_full_docs.json | Full document text |
kv_store_text_chunks.json | Semantic chunks |
kv_store_full_entities.json | Extracted entities |
vdb_*.json | Vector embeddings |
graph_*.graphml | Knowledge graph |
User: /pdf-research ~/Documents/papers 인덱싱해줘
Claude: [Runs indexing]
Indexing complete!
- Documents: 5
- Chunks: 247
- Storage: 32.5 MB
User: AI 인재 양성 전략에 대해 알려줘
Claude: [Runs search]
Based on the indexed documents...
[Detailed response with references]
export OPENAI_API_KEY=sk-your-key
python pdf_research.py index /path/to/pdfs
pip install lightrag-hku[api] pymupdf python-dotenv
lightrag-hku[api]>=1.4.9pymupdf>=1.24.0python-dotenv>=1.0.0npx claudepluginhub hongsw/plugin-for-claude-research --plugin pdf-researchIngests PDF datasheets or reference manuals into the embedded docs search index via ingest_docs tool. Reports chunks ingested and tables found.
Searches indexed local document folders using natural language queries on Markdown/text files. Activates for file content questions, 'find document about...', or indexing requests.
Extracts key insights, gems, and dense knowledge from PDFs, academic papers, and book chapters into a structured memory graph.