From thinking-frameworks-skills
Normalizes inbox files (markdown, Claude JSON/JSONL, Readwise, transcripts, link captures) into clean markdown with partial frontmatter. Handles format-specific edge cases like content-block arrays and timestamp stripping.
How this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-frameworks-skills:normalize-formatThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- [Supported formats](#supported-formats)
Related skills: Called by ingest-inbox-item as step 1. Upstream of tag-by-topic, score-intuition-density, dedupe-against-corpus.
| Extension | Format | Notes |
|---|---|---|
.md, .txt | plain markdown | Default; passes through |
.json | Claude.ai export | Conversation with messages array |
.jsonl | Claude Code session | Content-block array per response |
.md (Readwise-shaped) | Readwise export | Highlights + user notes |
.csv | Readwise CSV | Per-row highlight |
.vtt, .srt, .md (diarized) | Transcript | May include timestamps + speaker labels |
.md with URL + commentary | Link capture | User's framing is the signal |
Normalize one file:
- [ ] Step 1: Detect format by extension + first-line sniff
- [ ] Step 2: Apply format-specific parse
- [ ] Step 3: Split long transcripts at topic boundaries (>3000 words)
- [ ] Step 4: Emit [{body, partial_frontmatter}, ...] list (usually one item)
.jsonl with "type":"assistant" → Claude Code session..json with "conversation" / "messages" top-level key → Claude.ai export..md starting with # and Readwise boilerplate (**Highlights first synced by Readwise...**) → Readwise..vtt / .srt, or .md with [HH:MM:SS] timestamp pattern, or lines prefixed with speaker labels like Me: → transcript..md with ≤50 words and a prominent URL → link capture.Plain markdown: pass body through unchanged. Title = first H1 or filename-derived.
Claude.ai JSON: flatten content blocks to markdown. Preserve user/assistant turn labels (**Me:** / **Claude:**). Strip system-reminder blocks. provenance.author: claude, confidence: paraphrased.
Claude Code JSONL: flatten content-block array. Drop tool_use blocks unless the adjacent user message references the tool output. Strip system reminders.
Readwise: split per-book file into one seed per highlight. Body = highlight + user note. Boilerplate stripped. For bare highlights (no user note), set provenance.confidence: quoted, density capped at 3 downstream. For user-annotated highlights, confidence: owned.
Transcript: strip timestamps. Preserve speaker labels as **Speaker:** prefixes. If >3000 words, split at topic shifts — emit multiple outputs sharing parent_source. Target ~1500 words per chunk.
Link capture: separate URL from commentary. Body = user's commentary. Frontmatter adds source.linked_url. If <50 words of commentary, flag low_commentary: true so the scorer caps density.
Split heuristic: paragraph break + topic-vocabulary shift (measured by tag overlap drop across adjacent paragraphs). Each chunk ~1500 words. Preserve parent_source across chunks.
From: ..., Date: ..., Subject: ...) — reclassify as plain markdown or link capture..json file that isn't a Claude export — treat as plain markdown and wrap in code fences.Input (inbox/2026-04-21-claude-bnn.json):
{"conversation":{"name":"BNN variational","messages":[
{"role":"user","content":[{"type":"text","text":"help me intuit why variational inference..."}]},
{"role":"assistant","content":[{"type":"text","text":"Think of it as fitting a simple distribution..."}]}
]}}
Output:
# BNN variational
**Me:** help me intuit why variational inference...
**Claude:** Think of it as fitting a simple distribution...
With partial_frontmatter = {id: 2026-04-21-bnn-variational, title: "BNN variational", source: {type: claude-conversation, ...}, provenance: {author: claude, confidence: paraphrased}}.
WARN | malformed CSV row in <file> line N to changelog.messages key missing, fall back to recursive text extraction; mark confidence: paraphrased regardless.[image: awaiting user annotation] and status: dead with reason image-only..processed/ only on success).[{body, partial_frontmatter}, ...] — always a list, usually of length 1.parent_source.npx claudepluginhub lyndonkl/claude --plugin thinking-frameworks-skillsIngests inbox files into corpus/seeds/ as normalized markdown seeds with full frontmatter. Handles normalization, topic tagging, scoring, deduplication, changelog, and ledger updates.
Converts heavy document formats (PDF, Word, Excel, PowerPoint, and others) to token-efficient Markdown/CSV with structurally-aware digest compression. Use when Claude needs to read documents without excessive context budget.
Converts files and URLs to clean Markdown using MarkItDown. Supports PDF, DOCX, XLSX, PPTX, HTML, images (OCR), audio, CSV, and YouTube transcripts. Optimized for LLM ingestion pipelines.