From fastCRW
Parses local or remote PDF files into markdown or structured JSON using fastCRW. Supports CLI, MCP, and REST interfaces with options for AI summaries and structured extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crw:crw-parseThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- The source is a **file on disk** (PDF), not a web page.
CLI — crw scrape auto-detects a local file path and routes to the PDF
parser; there is no separate crw parse subcommand:
crw scrape report.pdf # → markdown to stdout
crw scrape report.pdf --format json --extract '{"type":"object","properties":{"title":{"type":"string"}}}' -o out.json
MCP (inside an agent harness):
crw_parse_file(
contentBase64="<base64-encoded PDF bytes>",
filename="report.pdf",
formats=["markdown"],
maxLength=0
# For structured JSON output:
# formats=["json"],
# jsonSchema={"type":"object","properties":{"title":{"type":"string"}}}
)
REST — multipart upload, 50 MB limit, PDF only:
curl -X POST "$CRW_API_URL/v2/parse" \
-H "Authorization: Bearer $CRW_API_KEY" \
-F "[email protected]" \
-F 'options={"formats":["markdown"]}'
| Need | CLI (crw scrape <path>) | MCP field | REST options field |
|---|---|---|---|
| Output format | --format markdown|json|text|links | formats | formats |
| Structured JSON | --extract '<schema>' | jsonSchema + formats:["json"] | jsonSchema + formats:["json"] |
| AI summary | --summary | formats:["summary"] | formats:["summary"] |
| Summary prompt | --prompt "TEXT" | — | summaryPrompt |
| Limit output chars | — | maxLength (0 = unbounded) | maxContentChars |
| Force parser | — | parsers:["pdf"] | parsers:["pdf"] |
Formats json and summary require a server-side LLM configured in
[extraction.llm] of the server config (or via crw setup for the CLI).
%PDF- magic header.attempt_scanned option —
scanned PDFs are a known gap.json/summary. Without a configured LLM the request
returns a 400..crw/ and
grep/head the output: crw scrape big.pdf -o .crw/big.md.contentBase64. The filename field is optional but helps with
error messages.warning field in the envelope. A warning (e.g. warning: pdf_partial_text)
only appears on the CLI's stderr, never in the REST/MCP response. If you get
empty markdown, assume a scanned/image-only PDF and handle it at call-site.npx claudepluginhub us/crwParses local files (PDF, DOCX, XLSX, HTML, etc.) into clean markdown on disk. Offers AI summaries and Q&A over document content.
Parses PDF, Office, and image files into structured Markdown using the MinerU API. Supports OCR, formula/table recognition, batch processing, and multi-format export (DOCX/HTML/LaTeX).
Converts PDFs to structured Markdown preserving headings, tables, lists, reading order. Use for text extraction, batch processing, RAG ingestion, LLM context, or PDF analysis tasks.