From llamacloud
This skill should be used when the user wants to "extract structured data from a document", "design/validate an extraction schema", write a "Pydantic/JSON/Zod extract schema", configure "LlamaExtract options/citations", or "pull typed fields from PDFs". Owns LlamaExtract v2 schema design and configuration judgment; defers raw text/markdown parsing, RAG/retrieval, and account/key/region setup to sibling skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llamacloud:llamacloud-extractThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LlamaExtract turns unstructured documents (PDFs, images, text) into typed output
LlamaExtract turns unstructured documents (PDFs, images, text) into typed output that conforms to a schema you define. Use it when the goal is schema-conformant data for a database, dashboard, spreadsheet, or downstream model — not document text for a human or a RAG pipeline.
llamacloud-parse.llamacloud-index.llamacloud.Extract is built on parsing, but its contract is schema conformance and data quality, which is why schema design is where most of the work and most of the wins are.
Extract v2 is the primary, GA path — design all new work against it. The
/v1/ extract API is legacy and exists only as a migration reference; never make
it the default. If you encounter v1 code, treat porting it to v2 as the goal.
references/schema-design.md: object root, shallow nesting, strong
field names, descriptions that carry format/examples, optional booleans and
integers, prune auto-generated fields, start minimal and iterate.references/options-and-tiers.md.Author the schema as a typed model in your language — Pydantic in Python, Zod in TypeScript — or hand-write JSON Schema; all compile to JSON Schema for the API. Key durable rules:
default/format are not enforced — encode them in
descriptions or post-process.false/0.Full rules, the optional-field syntaxes, caps, and the split-and-merge strategy
for oversized schemas are in references/schema-design.md.
Decision tables and the tuning loop are in references/options-and-tiers.md.
Python (confirm the current package name via the MCP — it has shifted between releases):
from pydantic import BaseModel
from typing import Optional
class Invoice(BaseModel):
invoice_number: str # field name = JSON key
total_amount: float
paid: Optional[bool] = None # optional so absent != False
# Submit `Invoice` as the schema to an extraction agent, then validate the
# returned dict back through Invoice(**result) and post-process.
TypeScript (Zod):
import { z } from "zod";
const Invoice = z.object({
invoice_number: z.string(),
total_amount: z.number(),
paid: z.boolean().nullable(), // optional/absent-safe
});
These show shape, not the current API surface. Confirm the exact package name, agent/client classes, and method/parameter names via MCP (below) — the SDK package name has shifted between releases, so do not assert it from memory.
Look up exact, stale-prone details live (package name, class/method signatures, parameter and enum values, the current caps, page-range syntax, pricing):
mcp__plugin_llamacloud_docs__search_docs — concept/BM25 search.mcp__plugin_llamacloud_docs__read_doc — read a known doc page.mcp__plugin_llamacloud_docs__grep_docs — exact symbol/regex search.index.md to any https://developers.llamaindex.ai/... page
URL and WebFetch it.Anchor docs for this pillar:
https://developers.llamaindex.ai/llamaparse/extract/https://developers.llamaindex.ai/llamaparse/extract/guides/schema_design/https://developers.llamaindex.ai/llamaparse/extract/guides/options/For exact, current SDK signatures, prefer the MCP over any snippet in this skill.
llamacloud-parse.llamacloud-index.llamacloud.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub jbaham2/llamacloud-plugin --plugin llamacloud