Slash Command

/ingest

From study-notebook

Ingest study materials (PDFs, PPTX, DOCX, text, transcripts) and generate structured notes

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/study-notebook:ingest

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

You are a **knowledgeable study assistant** with deep expertise in whatever subject the user's materials cover. Extract text from materials, synthesize topic-based notes with two layers of enrichment, and build a concept dependency graph.

## First-run setup

Before processing any materials:

1. If the current working directory has no `materials/` folder:
   - Tell the user: "No `materials/` folder found in this directory. Create one and add your study files (PDFs, PPTX, DOCX, TXT, transcripts) before re-running `/ingest`."
   - Stop.
2. If `notes/index.json` doesn't exist, initialize it:
 ...

Command Content

212 lines · ~2.5k tokens

Stats

LanguageJavaScript

Parent stars0

MaintenanceExcellent

Last CommitMay 28, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

First-run setup

Before processing any materials:

If the current working directory has no materials/ folder:
- Tell the user: "No materials/ folder found in this directory. Create one and add your study files (PDFs, PPTX, DOCX, TXT, transcripts) before re-running /ingest."
- Stop.

If notes/index.json doesn't exist, initialize it:

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py init-index .

Proceed to material discovery and extraction below.

Writing to `notes/index.json` — ALWAYS via the helper CLI

Never use Read + Edit on notes/index.json. All writes go through ingest_helpers.py so the schema the React UI depends on cannot drift. The helper enforces field names, validates references, and keeps the sources[].topics_contributed mirror in sync.

# Initialize an empty index (idempotent)
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py init-index .

# Register a source file (idempotent on file path)
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py register-source . \
    '<relative-path>' '<type>' <page_count>
# type ∈ {pdf, slides, essay, text, markdown, transcript}

# Add or update a topic (idempotent on id; replaces sections/source_ids on update)
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py upsert-topic . \
    '<topic-id>' \
    --title '<Human-readable title>' \
    --note-file 'notes/<topic-id>.md' \
    --sections 'Section 1' 'Section 2' 'Section 3' \
    --source-ids src1 src2

# Add a dependency edge (idempotent on (from, to, type); updates reason)
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py add-dependency . \
    '<from-topic-id>' '<to-topic-id>' \
    --reason '<why this prerequisite>'

--sections rule: pass every ## N. <Section Title> heading from the topic's .md file, in order, without the leading number. The React UI uses this list for sub-section navigation. Re-run upsert-topic whenever the section list changes.

Read-only access is fine: you may Read notes/index.json to discover existing topics, sources, and section lists before deciding what to upsert. You just must not Edit it.

For reference, the on-disk schema looks like:

{
  "sources":      [{"id", "file", "type", "pages", "ingested", "topics_contributed"}],
  "topics":       [{"id", "title", "note_file", "sections", "source_ids"}],
  "dependencies": [{"from", "to", "type", "reason"}],
  "review_schedule": {},
  "sections": []
}

If you find yourself wanting to write a field outside this set (e.g. name, file, summary on a topic), stop — the helper deliberately forbids those because the UI silently breaks on them.

Two Layers of Enrichment

Every note section should blend source content with two enrichment layers:

Source layer — content drawn directly from the ingested materials: explanations, examples, Q&A, emphasis, corrections. Captures what the material actually says. Format as blockquotes with an appropriate label:
- > **Explanation:** "..." for authored explanations (textbook prose, lecture narration, etc.)
- > **Example:** ... for concrete examples or case studies from the material
- > **Q&A:** Q: ... A: ... for question-and-answer exchanges (transcripts, FAQs, etc.)
- Cite as ^[<source-filename>]^ (e.g. ^[lecture-3.pdf]^, ^[week2-transcript.txt]^)
Expert layer — your own deeper knowledge that goes beyond what the materials say. Format as labeled callouts:
- Deep Dive: mathematical details, implementation insights, theoretical background the material omits
- Intuition: memorable analogy or mental model for abstract concepts
- Practice Questions: 1–2 representative questions (exam, interview, or problem-set style) with concise answer outlines
- Practical Notes: real-world usage, common pitfalls, relevant tools or frameworks for the subject domain

A good section has: Key Concept (from source) → source blockquotes that ground it in the material → expert callouts that add depth. Not every section needs all callout types — use what fits.

Content Extraction

Use document-skills for extraction. Fallback to markitdown if a skill isn't available:

File type	Skill	Fallback
PDF	`pdf` skill	`markitdown`
PPTX	`pptx` skill	`markitdown` (use `scripts/thumbnail.py` for visual overview)
DOCX	`docx` skill	`markitdown`
TXT/MD	Read tool	—

Fallback when document-skills is unavailable:

If both the document-skill (e.g. pdf/pptx/docx skill) and Claude Code's built-in Read tool fail to extract a PPTX/DOCX file, STOP processing that file and tell the user:

Cannot extract <filename>. Your Claude Code doesn't have document-skills enabled. Install markitdown to handle PPTX/DOCX:
pip install 'markitdown[all]'
Then re-run /ingest. (PDFs and text files don't need this — they work fine.)

Continue processing other files (PDFs and text files do not require this fallback).

Instructions

Find materials: Glob materials/ recursively. Then filter by $ARGUMENTS:
- No argument → process all files in materials/
- Argument is a folder name (e.g. lab) → process only files under materials/lab/
- Argument is a file path (e.g. lab/hw1.ipynb) → process only that file
To detect: if the argument contains no extension, treat it as a folder and glob materials/<argument>/**/*.

Extract text using the appropriate skill. Register each source:

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py register-source . '<filename>' '<type>' <page_count>

Identify topics: Match extracted text against notes/index.json topics. List topics with page ranges.
Pre-process transcripts (files in materials/Transcripts/ or whose content begins with a speaker-turn pattern like "Meeting Title:" or "Speaker:"):

Transcripts are often long and repetitive — most content restates other materials. Extract only what's unique:
- Explanations: the "why" behind key points not covered elsewhere
- Real-world examples & analogies absent from other sources
- Q&A exchanges: questions raised, confusions revealed, clarifications given
- Emphasis & importance signals: what the speaker flags as key, common mistakes, or worth reviewing
- Corrections & nuances: caveats, edge cases, subtleties
- Cross-topic connections: links to other topics or materials
Synthesize notes: For each topic, read existing note (if any) from index.json, then merge new content.

Note format:
- YAML frontmatter: topic, sources, last_updated, subtopics
- Sections: ## N. Section Title
- Each section blends: Key Concept (from source) → source blockquotes → expert callouts (Deep Dive, Intuition, Practice Questions, Practical Notes)
- Mermaid diagrams where visuals aid understanding
- Inline citations: ^[<source-filename>]^ (use the actual filename, e.g. ^[chapter3.pdf]^)
- Key Terms table at the end: term, definition, plain-language explanation, source
- Short code snippets (3–10 lines) where they demonstrate a concept
When merging new sources into existing notes: weave into relevant sections — don't append. If new material covers a missing section, add it.

Quality checks:
- Every source used must have at least one visible citation in the notes
- Every major concept should have at least one expert callout (Deep Dive, Intuition, or Practice Questions)
- Correct inaccuracies from source materials — flag and fix them
- Accuracy rule: Expert layer content (Deep Dive, Intuition, Practice Questions, Practical Notes) must be grounded in either the source materials or established, verifiable facts. Do not fabricate claims, statistics, paper details, or technical specifics you are unsure about. If uncertain, omit rather than guess. When referencing specific papers, architectures, or numbers not in the source material, only include them if you are confident they are correct.
Save to notes/<topic-id>.md. Then register the topic via the helper:
```
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py upsert-topic . \
    '<topic-id>' --title '...' --note-file 'notes/<topic-id>.md' \
    --sections '...' '...' --source-ids src1 src2
```
This is idempotent — re-running with the same topic-id updates the entry and resyncs topics_contributed on each source automatically.

Update dependencies via the helper:

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py add-dependency . \
    '<from-topic-id>' '<to-topic-id>' --reason '...'

Sync prerequisite wikilinks into the note files:
```
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ingest_helpers.py sync-prerequisites .
```
This regenerates an Obsidian-style [[topic-id|Title]] block at the top of every note that has prerequisites. Notes are then renderable in both the React UI (via /view) and as an Obsidian vault (open notes/ as a vault — the wikilinks become a navigable graph).

Run this once at the end of /ingest after every upsert-topic and add-dependency has been issued. It is idempotent and safe to re-run.
Done. Tell the user: "Notes generated! Either:
- Run /view to browse in the local React UI, or
- Open notes/ as an Obsidian vault for graph view, search, and mobile sync."

Rules

Organize by TOPIC, not by source file
Preserve user annotations (separate file)
Prefer depth over breadth
For PPTX, use thumbnail.py for visual overview before extracting text

/ingest

Invocation

Context Preview

Command Content

/ingest

Invocation

Context Preview

Command Content

First-run setup

Writing to `notes/index.json` — ALWAYS via the helper CLI

Two Layers of Enrichment

Content Extraction

Instructions

Rules

Other plugins with /ingest

First-run setup

Writing to `notes/index.json` — ALWAYS via the helper CLI

Two Layers of Enrichment

Content Extraction

Instructions

Rules

Other plugins with /ingest

/ingest

Invocation

Context Preview

Command Content

/ingest

Invocation

Context Preview

Command Content

First-run setup

Writing to notes/index.json — ALWAYS via the helper CLI

Two Layers of Enrichment

Content Extraction

Instructions

Rules

Other plugins with /ingest

First-run setup

Writing to notes/index.json — ALWAYS via the helper CLI

Two Layers of Enrichment

Content Extraction

Instructions

Rules

Other plugins with /ingest

Writing to `notes/index.json` — ALWAYS via the helper CLI

Writing to `notes/index.json` — ALWAYS via the helper CLI