From fastCRW
Self-host a crw API server via binary, Docker, or docker-compose with SearXNG sidecar. Configure renderers, proxies, auth, and LLM extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crw:crw-self-hostThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
One static binary, one config file. No Redis, no Postgres, no Node.js runtime in
One static binary, one config file. No Redis, no Postgres, no Node.js runtime in
the request path. Self-host free under AGPL-3.0, or point at api.fastcrw.com
for managed.
Choose one. All install paths run the same Rust binary.
crw-mcp) — recommended for AI agent usenpx crw-mcp # zero install (npm; embedded engine, ~6 MB RAM)
brew install us/crw/crw-mcp # Homebrew
cargo install crw-mcp # Cargo (~17 MB, full embedded)
docker run -i ghcr.io/us/crw crw-mcp # Docker
pip install crw # Python SDK (auto-downloads binary on first use)
Lean build (~4.2 MB, no headless browser engine — proxy/cloud-only mode):
cargo build --profile release-small --no-default-features -p crw-mcp
crw) — scrape from the terminalbrew install us/crw/crw
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh
cargo install crw-cli
# APT (Debian/Ubuntu):
curl -fsSL https://apt.fastcrw.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/crw.gpg
echo "deb [signed-by=/usr/share/keyrings/crw.gpg] https://apt.fastcrw.com stable main" \
| sudo tee /etc/apt/sources.list.d/crw.list
sudo apt update && sudo apt install crw
crw-server) — Firecrawl-compatible REST endpointFor serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.
brew install us/crw/crw-server
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw-server sh
docker run -p 3000:3000 ghcr.io/us/crw
Binary directly (crw serve):
crw serve # reads config.default.toml, listens on :3000
crw serve --port 3001 # custom port (-p short form also accepted)
crw serve --config myconfig.toml # load specific config file
CRW_PORT=3001 crw-server # env-var alternative for the standalone binary
Docker (single container, no search):
docker run -p 3000:3000 ghcr.io/us/crw
curl http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com"}'
Docker Compose — full stack with SearXNG search:
docker compose up -d # http + LightPanda + SearXNG
docker compose --profile heavy up -d # + Chrome fallback
docker compose -f docker-compose.yml \
-f docker-compose.stealth.yml --profile stealth up -d # + browserless stealth tier
docker compose up starts three services: crw (API server), lightpanda
(JS renderer), and searxng (search backend). The crw service waits for
searxng to pass its healthcheck before accepting traffic, so the first search
request after startup doesn't race the cold start.
After compose up:
curl http://localhost:3000/health # → {"status":"ok",...}
curl -X POST http://localhost:3000/v1/search \
-H "Content-Type: application/json" \
-d '{"query":"fastCRW","limit":5}' # SearXNG-backed, no API key
docker-compose.yml bundles a SearXNG container (searxng/searxng:2026.5.9-0cba32c15)
that backs /v1/search with no API key and no per-query cost. It is internal-only —
the port is not published to the host by default. The crw container points at it as
http://searxng:8080 via the Docker bridge network (set in config.docker.toml
as [search] searxng_url = "http://searxng:8080").
To debug SearXNG directly, add to docker-compose.override.yml:
services:
searxng:
ports:
- "127.0.0.1:8888:8080"
For a standalone binary pointing at an external SearXNG:
CRW_SEARCH__SEARXNG_URL=http://my-searxng:8080 crw-server
When searxng_url is unset, /v1/search returns HTTP 400 search_disabled.
config.default.toml)Config is a TOML file. The binary loads config.default.toml by default; override
with CRW_CONFIG=<name> (loads <name>.toml). Docker uses config.docker.toml
(set via CRW_CONFIG=config.docker in docker-compose.yml).
Every key can be overridden by environment variable using the pattern
CRW_<SECTION>__<KEY> (double underscore between section and key):
CRW_SERVER__PORT=8080
CRW_RENDERER__MODE=chrome
CRW_RENDERER__POOL_SIZE=8
CRW_SEARCH__SEARXNG_URL=http://localhost:8080
CRW_CRAWLER__PROXY=http://user:pass@proxy:8080
CRW_EXTRACTION__LLM__API_KEY=sk-...
CRW_AUTH__API_KEYS='["key-one","key-two"]' # JSON array as string
[renderer] — JS rendering[renderer]
mode = "auto" # auto | lightpanda | chrome | playwright | none
pool_size = 4 # browser context pool size
page_timeout_ms = 30000
auto — tries LightPanda first (fast, ~64 MB), falls back to Chrome for
complex SPAs and Cloudflare challenges. Recommended for production.lightpanda — LightPanda only; fast p50, lower recall on heavy SPAs.chrome — Chromium only via CDP.playwright — Playwright-controlled browser.none — HTTP-only, no JS rendering.Configure renderer endpoints:
[renderer.lightpanda]
ws_url = "ws://127.0.0.1:9222/"
[renderer.chrome]
ws_url = "ws://127.0.0.1:9223/"
[search] — web search[search]
enabled = true
# searxng_url = "http://localhost:8080" # required for /v1/search to work
timeout_ms = 15000
default_limit = 5
max_limit = 20
[crawler] — proxy and stealth[crawler]
# Single proxy:
# proxy = "http://user:pass@proxy:8080"
# proxy = "socks5://user:pass@proxy:1080"
# Pool with rotation:
# proxy_list = ["http://user:pass@a:8080", "http://user:pass@b:8080"]
# proxy_rotation = "sticky_per_host" # sticky_per_host | round_robin | random
# Stealth mode (rotate UA + inject browser-like headers):
# [crawler.stealth]
# enabled = true
# inject_headers = true
# jitter_factor = 0.2
Proxy rotation applies to both the HTTP path and the Chrome/CDP path for scrape, crawl, and map. LightPanda has no proxy support and is skipped (fail-closed) when a proxy is active. SOCKS5 proxies with credentials are not supported on the Chrome path — use http/https proxies there.
[extraction.llm] — structured JSON extraction + summariesRequired to enable formats: ["json"] (schema extraction), formats: ["summary"],
and /v1/search with answer: true.
[extraction.llm]
provider = "anthropic" # "anthropic" | "openai" | "deepseek" | "azure" | "openai-compatible"
api_key = "sk-..." # or CRW_EXTRACTION__LLM__API_KEY env var
model = "claude-sonnet-4-20250514"
max_tokens = 4096
# base_url = "https://custom-endpoint.example.com" # for openai-compatible APIs
max_concurrency = 4
max_html_bytes = 100000
# DeepSeek example:
# provider = "deepseek"
# api_key = "..."
# model = "deepseek-chat"
# base_url = "https://api.deepseek.com/v1"
# Azure OpenAI example:
# provider = "azure"
# api_key = "..."
# model = "gpt-4o-mini" # Azure deployment name
# base_url = "https://<resource>.openai.azure.com"
# azure_api_version = "2024-05-01-preview"
[auth] — API key authBy default, self-hosted crw requires no auth. Add keys to lock it down:
[auth]
api_keys = ["crw-key-one", "crw-key-two"]
Empty (or absent) api_keys = no auth required — good for local/trusted-network
deployments. On managed api.fastcrw.com, Bearer auth is always required.
[document] — PDF parsing[document]
enabled = true
max_pages = 0 # 0 = no limit
max_upload_bytes = 52428800 # 50 MiB cap for POST /v2/parse uploads
upload_concurrency = 4
max_concurrent_parses = 4
parse_timeout_ms = 30000
sandbox = false # set true in Docker (untrusted uploads)
crw-mcp runs in one of two modes determined by the CRW_API_URL env var:
Embedded mode (no CRW_API_URL): the Rust engine runs in-process inside the
MCP process. No separate server needed. ~6 MB RAM. Zero setup.
npx crw-mcp # embedded
claude mcp add crw -- npx crw-mcp # Claude Code, embedded
Proxy mode (CRW_API_URL set): MCP forwards all calls to the REST endpoint.
Use this when you want a shared crw-server to serve multiple agents, or to
point at api.fastcrw.com.
CRW_API_URL=https://api.fastcrw.com CRW_API_KEY=crw_live_... npx crw-mcp
# Claude Code:
claude mcp add -e CRW_API_URL=https://api.fastcrw.com -e CRW_API_KEY=crw_live_... \
crw -- npx crw-mcp
# Health (no auth)
curl http://localhost:3000/health
# → {"status":"ok","version":"..."}
# Scrape (no auth on default self-host)
curl -X POST http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | jq .success
# → true
# Search (requires SearXNG sidecar via docker compose up)
curl -X POST http://localhost:3000/v1/search \
-H "Content-Type: application/json" \
-d '{"query":"hello","limit":3}' | jq .success
# → true (or 400 search_disabled if SearXNG not wired up)
SEARXNG_SECRET_KEY via .env (openssl rand -hex 32) — the compose default
is a placeholder acceptable only for local use.[document] sandbox = true if you accept untrusted PDF uploads (Docker
images do this by default in config.docker.toml).[auth] api_keys and put a reverse proxy
(nginx/Caddy) in front for TLS.rate_limit_rps = 10 in config.default.toml (global). Set to 0 to disable
(Docker config does this for bench/production load).Full production hardening guide: docs.fastcrw.com/self-hosting-hardening/
npx claudepluginhub us/crwReference for building production-ready crw integrations: verb selection, CLI/MCP/REST call surfaces, post-filtering, context-window hygiene, Hybrid RAG patterns, and operational pitfalls.
Scrapes URLs to markdown/HTML/JSON, crawls websites for multi-page extraction, searches the web, maps sites, and extracts structured data using Firecrawl MCP tools.
Deploys Firecrawl apps to Vercel serverless, Cloud Run containers, and Docker self-hosted setups. Configures secrets and provides Next.js API route example.