From fastCRW
Scrapes a single known URL into clean markdown, HTML, links, or structured JSON via the fastCRW CLI or MCP. Handles JavaScript-rendered SPAs automatically.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crw:crw-scrapeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- You have one (or a handful of) **known URLs** and want their content.
CLI (binary on PATH):
crw scrape "https://example.com" # → markdown to stdout
crw scrape "https://example.com" --format json -o page.json
crw scrape "https://example.com" --js --css "article.main"
crw scrape "https://example.com" --format links -o .crw/links.txt
MCP (inside an agent harness):
crw_scrape(url="https://example.com", formats=["markdown"], onlyMainContent=true)
REST (drop-in for Firecrawl SDKs — just swap the base URL):
curl -X POST "$CRW_API_URL/v1/scrape" -H "Authorization: Bearer $CRW_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com","formats":["markdown"],"onlyMainContent":true}'
| Need | CLI flag | MCP / REST field |
|---|---|---|
| Output format | --format markdown|html|rawhtml|text|links|json | formats: [...] |
| Strip nav/footer/sidebar | (on by default; --raw to disable) | onlyMainContent: true |
| Force JS rendering | --js | renderJs: true (null = auto) |
| Wait after load | — | waitFor: 2000 (ms) |
| Keep only selectors | --css "article" / --xpath … | includeTags: ["article"] |
| Drop selectors | — | excludeTags: ["nav","footer"] |
| Pick renderer | — | renderer: "auto|lightpanda|chrome|chrome_proxy|playwright" (auto is default) |
| Save to file | -o FILE | (write the response yourself) |
| Structured JSON | --extract '<schema>' | extract: {schema: {...}} — see crw-extract |
| Use a proxy | --proxy URL --stealth | proxy, proxyRotation, stealth |
? and & are shell-special. Always wrap in quotes.crw scrape … & and
wait, or issue parallel MCP calls.--js / renderJs: true, optionally a
waitFor. crw's auto-detect covers most SPAs without it..crw/, then grep/head.
MCP truncates to ~15 000 chars (maxLength: 0 to opt out).--format json returns the raw full-page
object (metadata + content), not schema-extracted data. For structured
extraction against a schema use --extract '<schema>' — this calls an LLM
and requires a configured LLM provider. See the dedicated
crw-extract skill.npx claudepluginhub us/crwExtracts clean markdown from any URL, including JavaScript-rendered SPAs. Supports concurrent scraping, JS wait times, and content filtering.
Scrapes web pages and websites using Firecrawl API, converting to clean markdown. Handles JavaScript rendering, anti-bot protection, paywalled content, and dynamic sites for articles, blogs, docs.
Extracts clean markdown or text from URLs via the Tavily CLI. Handles JavaScript-rendered pages, supports query-focused chunking, and processes up to 20 URLs per call.