By us
Self-host a web scraping and data extraction pipeline with a Firecrawl-compatible API, enabling crawl, map, scrape, search, PDF parsing, and change detection via CLI, MCP, or REST, using a low-memory Rust binary with SearXNG integration.
Reference skill for building production-ready crw integrations. Covers verb selection, call surfaces (CLI/MCP/REST), post-filtering strategies, context-window hygiene, Hybrid RAG patterns, common pitfalls, and crw-specific operational considerations (SearXNG limits, renderer pool, proxy rotation). Load this when writing application code that embeds crw, designing a multi-step agent workflow, or debugging an integration that isn't behaving as expected.
Crawl an entire website or section and extract content from every page. Use when you need content from many pages under a common URL prefix: "crawl the whole site", "get all docs pages", "scrape every blog post", "download the full docs for RAG", "extract all pages under /api". Async BFS — starts a job and polls for results. Step 4 of the crw workflow ladder.
Programmatic web search and scrape with context isolation. Use for any research task where you need to search the web, filter results, and extract specific information — without flooding your context window with raw HTML and boilerplate. This is the single biggest token-saver in the crw skill set. Triggered by "search for", "look up", "find", "research", "what's the latest on", or any query that requires current web information. Also use when asked to "search and filter", "find the important parts", or any task where you suspect the raw output will be large (multi-page scrapes, news aggregation, competitive research).
Extract a typed JSON object from one or more web pages against a JSON Schema with fastCRW. Use when you need structured data — "get the price and stock status", "extract all job listings as JSON", "pull structured fields from this page". Step 6 of the crw workflow ladder.
Discover all URLs on a website without fetching content — fast, low-cost URL inventory via sitemap.xml + link extraction BFS. Use when you need to know which pages exist before deciding what to scrape or crawl: "list all pages", "find URLs on this site", "discover links", "what pages does this site have", "map the site". Step 3 of the crw workflow ladder.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Self-hosted, Rust-native web crawler & scraper for AI agents
The open-source alternative to Firecrawl. One static binary, ~50 MB RAM idle,
Firecrawl-compatible REST API on both /v1/* and /v2/* (scrape, crawl,
map, search, extract, plus v2 batch & parse) — a drop-in for the official
Firecrawl SDKs — plus first-class MCP. Self-host free under
AGPL-3.0, or hit our managed API at api.fastcrw.com. Reproducible 63.74%
truth-recall on the public 1,000-URL dataset (diagnose_3way.py,
2026-05-08) — see fastcrw.com/benchmarks.
Built in Rust because every millisecond of agent latency compounds.
Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI
/v1/* and /v2/* surfaces (scrape, crawl, map, search, extract; plus v2-only batch & parse) with compatible request/response shapes. The v2 API is a drop-in for the official firecrawl-py v4 SDK (FirecrawlApp(api_url="https://api.fastcrw.com")) — swap the base URL and keep your code.changeTracking primitive in the engine; scheduled monitors + signed-webhook/email alerts on the managed platform. See the Monitoring docs.api.fastcrw.com for managed proxy network, dashboard, and SLA without the AGPL obligations on your application code.Qualitative positioning vs. the three most-cited alternatives. Numerical claims trace to the inline sources noted; everything else is descriptive.
npx claudepluginhub us/crwScrape, search, crawl, and map the web with a single command.
The best web scraping tool for LLMs. USE --smart-extract to give your AI agent only the data it needs from any web page — extracts from JSON/HTML/XML/CSV/Markdown using path language with recursive search, filters, and regex. Handles JS, CAPTCHAs, anti-bot automatically. AI extraction in plain English. Google/Amazon/Walmart/YouTube/ChatGPT APIs. Batch, crawl, cron scheduling.
Claude Code skill pack for FireCrawl (30 skills)
Official Apify agent skills for web scraping, data extraction, and automation
Parallel Web Search MCP and Task API integration for Claude Code. Provides web search, content extraction, deep research, data enrichment, entity discovery (FindAll), and web monitoring.
Build AI applications with real-time web data using Tavily's search, extract, crawl, and research APIs.