Skill

pdf-research

Indexes PDF documents with LightRAG, extracts text via PyMuPDF, builds embeddings and knowledge graphs, enables hybrid semantic searches with citations for document Q&A.

Python

OpenAI

ai-ml

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/pdf-research:pdf-research

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.

Supporting Files

prompts/index-pdfs.mdprompts/research-assistant.mdprompts/search-query.mdscripts/config.jsonscripts/index_pdfs.pyscripts/pdf_research.pyscripts/requirements.txtscripts/search.py

SKILL.md

176 lines · ~1.2k tokens

Stats

LanguagePython

Parent stars19

Parent forks6

MaintenanceExcellent

Last CommitFeb 17, 2026

Actions

View Source View Plugin View on GitHub View README

PDF Research Skill

LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.

Quick Start (For Claude)

When user invokes /pdf-research, Claude should:

Check status first: Run python pdf_research.py status to see current configuration
Auto-index if requested: When user provides a PDF directory, run indexing automatically
Search queries: Execute searches and return formatted results

Automatic Workflow

# Always run from scripts directory
cd ~/.claude/skills/pdf-research/scripts

# Check current status
python pdf_research.py status

# Index PDFs (when user provides a directory)
python pdf_research.py index /path/to/pdfs

# Search (single query)
python pdf_research.py search "user's question" --mode hybrid

# Interactive search session
python pdf_research.py search

Environment Requirements

Before running commands, ensure:

# Activate Python environment with dependencies
source /path/to/venv/bin/activate  # or use system Python with deps installed

# Ensure OpenAI API key is set
export OPENAI_API_KEY=sk-...

Core Capabilities

1. PDF Indexing (`index` command)

Extracts text from PDF documents using PyMuPDF
Creates semantic chunks with metadata
Builds knowledge graph with entities and relationships
Generates vector embeddings for semantic search
Supports incremental indexing (only new files)

2. Semantic Search (`search` command)

naive: Simple keyword matching
local: Focus on specific entities and details
global: Focus on broad themes and summaries
hybrid: Combined local + global (recommended)

3. Status Check (`status` command)

Shows current configuration
Lists indexed documents
Reports storage statistics

4. Configuration (`config` command)

Set default PDF directory
Set default storage directory
Set default search mode

Claude Integration Protocol

When User Says "Index PDFs" or Provides a Path

Verify the path exists
Run: python pdf_research.py index <path>
Report results (documents indexed, chunks created, storage size)

When User Asks a Question About Documents

Check if storage exists: python pdf_research.py status
If not indexed, ask user for PDF directory
Run search: python pdf_research.py search "<question>"
Format and present results with source references

When User Wants to Configure

Run: python pdf_research.py config --pdf-dir <path> --storage-dir <path>
Confirm configuration saved

Command Reference

# Configure defaults (run once)
python pdf_research.py config --pdf-dir /path/to/pdfs --storage-dir ./rag_storage

# Index PDFs
python pdf_research.py index [pdf_dir] [--storage <path>]

# Search (single query)
python pdf_research.py search "query" [--mode hybrid|local|global|naive]

# Search (interactive)
python pdf_research.py search

# Check status
python pdf_research.py status

Search Modes

Mode	Best For	Description
`hybrid`	General queries	Combined local + global (default)
`local`	Specific facts	Names, numbers, definitions
`global`	Summaries	Themes, trends, overviews
`naive`	Exact terms	Simple keyword matching

Storage Structure

After indexing, rag_storage/ contains:

File	Description
`config.json`	User configuration
`kv_store_full_docs.json`	Full document text
`kv_store_text_chunks.json`	Semantic chunks
`kv_store_full_entities.json`	Extracted entities
`vdb_*.json`	Vector embeddings
`graph_*.graphml`	Knowledge graph

Example Session

User: /pdf-research ~/Documents/papers 인덱싱해줘

Claude: [Runs indexing]
        Indexing complete!
        - Documents: 5
        - Chunks: 247
        - Storage: 32.5 MB

User: AI 인재 양성 전략에 대해 알려줘

Claude: [Runs search]
        Based on the indexed documents...
        [Detailed response with references]

Troubleshooting

"OPENAI_API_KEY not set"

export OPENAI_API_KEY=sk-your-key

"No indexed data found"

python pdf_research.py index /path/to/pdfs

"Module not found" errors

pip install lightrag-hku[api] pymupdf python-dotenv

Dependencies

Python 3.10+
lightrag-hku[api]>=1.4.9
pymupdf>=1.24.0
python-dotenv>=1.0.0
OpenAI API key

pdf-research

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

pdf-research

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

PDF Research Skill

Quick Start (For Claude)

Automatic Workflow

Environment Requirements

Core Capabilities

1. PDF Indexing (index command)

2. Semantic Search (search command)

3. Status Check (status command)

4. Configuration (config command)

Claude Integration Protocol

When User Says "Index PDFs" or Provides a Path

When User Asks a Question About Documents

When User Wants to Configure

Command Reference

Search Modes

Storage Structure

Example Session

Troubleshooting

"OPENAI_API_KEY not set"

"No indexed data found"

"Module not found" errors

Dependencies

Similar Skills

PDF Research Skill

Quick Start (For Claude)

Automatic Workflow

Environment Requirements

Core Capabilities

1. PDF Indexing (index command)

2. Semantic Search (search command)

3. Status Check (status command)

4. Configuration (config command)

Claude Integration Protocol

When User Says "Index PDFs" or Provides a Path

When User Asks a Question About Documents

When User Wants to Configure

Command Reference

Search Modes

Storage Structure

Example Session

Troubleshooting

"OPENAI_API_KEY not set"

"No indexed data found"

"Module not found" errors

Dependencies

Similar Skills

1. PDF Indexing (`index` command)

2. Semantic Search (`search` command)

3. Status Check (`status` command)

4. Configuration (`config` command)

1. PDF Indexing (`index` command)

2. Semantic Search (`search` command)

3. Status Check (`status` command)

4. Configuration (`config` command)