From fastCRW
Extracts typed JSON from one or more web pages using a JSON Schema with fastCRW's scrape-based extraction pipeline. Use for structured data like prices, stock status, or job listings.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crw:crw-extractThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- You need a **structured JSON object** from a page, not prose.
formats:["json"] instead.prompt to describe what you want.crw has no /v1/extract endpoint (Firecrawl's dedicated extract API). Instead,
extraction runs in two ways, both backed by the same LLM pipeline:
| Path | When to use | Sync? |
|---|---|---|
Per-page: formats:["json"] + jsonSchema on scrape | Single URL, or inline during a crawl | Synchronous |
Async multi-URL: POST /v2/extract → poll GET /v2/extract/{id} | Many URLs, fire-and-forget | Async |
/v2/extract is marked deprecated in the server (it recommends /v2/scrape
with formats:["json"]), but it works and is useful for multi-URL batches.
Requires a server-side LLM. Set [extraction.llm] in the server config
(provider, api_key, model). Without it, requests return an error (HTTP 4xx) —
e.g. 422 "no LLM configured". Use crw setup to configure the LLM for the CLI.
CLI — per-page extraction via --extract:
# Inline schema
crw scrape "https://example.com/product" \
--extract '{"type":"object","properties":{"price":{"type":"number"},"inStock":{"type":"boolean"}}}'
# Schema from file
crw scrape "https://example.com/job" --extract @schema.json -o result.json
MCP — pass formats:["json"] with jsonSchema on a scrape:
crw_scrape(
url="https://example.com/product",
formats=["json"],
extract={"schema": {"type":"object","properties":{"price":{"type":"number"}}}}
)
Note: the MCP crw_scrape accepts extract.schema (Firecrawl style). The
REST API also accepts jsonSchema as a top-level alias.
REST — per-page (synchronous):
curl -X POST "$CRW_API_URL/v1/scrape" \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"formats": ["json"],
"jsonSchema": {
"type": "object",
"properties": {
"price": {"type": "number"},
"inStock": {"type": "boolean"}
}
}
}'
REST — async multi-URL (deprecated endpoint, still functional):
# Start job
curl -X POST "$CRW_API_URL/v2/extract" \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/p1", "https://example.com/p2"],
"schema": {"type":"object","properties":{"price":{"type":"number"}}}
}'
# → {"success":true,"id":"<uuid>","warnings":["...use /v2/scrape..."], ...}
# Poll until completed
curl "$CRW_API_URL/v2/extract/<uuid>" -H "Authorization: Bearer $CRW_API_KEY"
# → {"success":true,"status":"completed|scraping|failed","data":{...}}
| Need | CLI | MCP / REST |
|---|---|---|
| JSON schema | --extract '<schema>' or @file.json | jsonSchema / extract.schema |
| Free-text prompt (no schema) | — | prompt on /v2/extract |
| Save output | -o FILE | write the response yourself |
| Multi-URL async | not available | POST /v2/extract with urls:[...] |
| LLM override | --llm-provider, --llm-key, --llm-model | server config only |
prompt is useful for exploration but less reliable.
Start with a schema when you know the fields you want.crw_crawl accepts a jsonSchema parameter
— each page in the crawl gets extracted against the schema, saving a second
round-trip.data.json in the scrape response. Per-page extraction lands in
data.json, not data.markdown. The markdown field is also populated for
reference.npx claudepluginhub us/crwNavigates complex multi-page websites and returns structured JSON data. Autonomous AI agent suitable for extracting pricing tiers, product listings, and directory entries with or without a JSON schema.
Automates web crawling and data extraction using Firecrawl: scrape pages, crawl sites, extract structured data with AI, batch URLs, and map site structures.
Extracts structured data (tables, lists, prices) from web pages via multi-strategy scraping with pagination, validation, transforms, and CSV/JSON/Markdown export.