From llamacloud
This skill should be used when the user wants to "use LiteParse", "parse locally without an API key", do "offline/OSS document parsing", or work with the "LiteParse library/CLI/server/WASM". Owns local/OSS no-key document parsing and the LiteParse-vs-cloud-LlamaParse decision; defers cloud/agentic/scaled parsing, typed extraction, and managed RAG to sibling skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llamacloud:liteparseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LiteParse is an open-source, Rust-based document parser that extracts text with
LiteParse is an open-source, Rust-based document parser that extracts text with spatial layout, per-line bounding boxes, and page screenshots — entirely on the local machine. No cloud calls, no LLMs, no API key. Reach for it when privacy, air-gapping, latency, cost, or tight dev loops matter more than the highest achievable parse quality on messy documents.
This is distinct from cloud LlamaParse (the hosted Parse v2 service, which
needs a LLAMA_CLOUD_API_KEY and uses models for hard layouts). LiteParse trades
that ceiling for zero dependencies and full local control.
Choose LiteParse when:
Hand off to cloud LlamaParse (sibling skill llamacloud-parse) when the job
needs the highest-quality, model-driven, or agentic parsing — dense tables,
complex multi-column layouts, handwriting, or chart/figure understanding at scale.
The two compose well: prototype locally with LiteParse, escalate hard documents to
the cloud. See references/liteparse-vs-llamaparse.md for the full decision table.
LiteParse ships the same engine across four modes — pick by where the code runs:
LiteParse instance, call its parse method,
read text / pages / bounding boxes.@llamaindex/liteparse-wasm) for client-side,
zero-server parsing where documents never touch a backend.Mode-selection trade-offs (init cost, concurrency, deployment) live in
references/liteparse-vs-llamaparse.md.
Confirm the exact current package name and signatures via the MCP before writing production code — package names and parameters shift. Illustrative only:
Python (liteparse):
from liteparse import LiteParse
parser = LiteParse() # construct once, reuse
result = parser.parse("document.pdf") # also accepts raw bytes
print(result.text) # layout-preserving text
for page in result.pages: # per-page items with bounding boxes
print(page.page_num, len(page.text_items))
TypeScript (@llamaindex/liteparse):
import { LiteParse } from "@llamaindex/liteparse";
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse("document.pdf");
console.log(result.text);
Common knobs (names vary by binding — verify live): OCR enable/language/server URL,
render dpi, output_format (plain text vs JSON with x/y/width/height),
target_pages (e.g. "1-10"), and password for encrypted PDFs. Built-in
Tesseract OCR handles scans; a custom OCR server can be pointed to instead.
For chunking, embedding, and retrieval over these outputs, this skill stops at parse; the downstream RAG pipeline belongs to sibling skills.
Package names, class/method signatures, parameter names, CLI flags, and supported formats change. Look them up live:
mcp__plugin_llamacloud_docs__search_docs — concept/BM25 search.mcp__plugin_llamacloud_docs__read_doc — read a known doc page.mcp__plugin_llamacloud_docs__grep_docs — exact symbol/regex search.index.md to any https://developers.llamaindex.ai/... URL and
WebFetch it.Anchor docs for this pillar:
https://developers.llamaindex.ai/liteparse/ — overview, modes, formats.https://developers.llamaindex.ai/liteparse/guides/library-usage/ — library API.https://developers.llamaindex.ai/liteparse/guides/ for CLI, server, WASM,
and OCR specifics (verify exact subpaths via the MCP rather than guessing).llamacloud-parse.llamacloud-extract.llamacloud-index.Keep LiteParse scoped to local, no-key parsing and the local-vs-cloud decision.
npx claudepluginhub jbaham2/llamacloud-plugin --plugin llamacloudGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.