From fastCRW
Discovers all URLs on a website via sitemap.xml and link extraction BFS without fetching page content. Use to inventory pages before deciding what to scrape or crawl.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crw:crw-mapThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- You need to know **which URLs exist** on a site before committing to a
crw map docs.example.com | wc -l tells you how
many pages a subsequent crawl would touch before you commit.grep instead of
crawling the whole site.CLI (binary on PATH):
crw map "https://docs.example.com" # URLs to stdout
crw map "https://docs.example.com" -d 3 --format json # JSON object, depth 3
crw map "https://example.com" --sitemap-only # sitemap.xml only
crw map "https://example.com" --no-sitemap # link crawl only
crw map "https://example.com" --format json > .crw/urls.json
MCP (inside an agent harness):
crw_map(url="https://docs.example.com")
crw_map(url="https://docs.example.com", maxDepth=3, limit=200)
crw_map(url="https://example.com", useSitemap=false) # link crawl only
crw_map(url="https://example.com", crawlFallback=false) # sitemap only
REST (drop-in for Firecrawl SDKs — just swap the base URL):
curl -X POST "$CRW_API_URL/v1/map" -H "Authorization: Bearer $CRW_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"url":"https://docs.example.com","maxDepth":2,"limit":200}'
# Response: {"success":true,"data":{"links":[...]}} — links are under data.links
# jq tip for REST: jq '.data.links[]'
| Need | CLI flag | MCP / REST field |
|---|---|---|
| Discovery depth | -d/--depth N (default 2) | maxDepth (default 2) |
| Result format | --format text|json | — (always JSON) |
| Sitemap only (no link crawl) | --sitemap-only | crawlFallback: false |
| Link crawl only (no sitemap) | --no-sitemap | useSitemap: false |
| Cap URL count | — | limit (default 100; 0 = unbounded) |
| JS rendering | --js | — |
| Proxy | --proxy URL | — |
| Stealth mode | --stealth | — |
| Rate limit | --rate-limit N (default 5.0 req/s) | — |
| Concurrency | --concurrency N (default 10) | — |
| Per-page timeout | --timeout MS (default 15000) | — |
MCP truncates to 100 URLs by default (truncated: true + totalDiscovered
in response). Pass limit: 0 to opt out.
# 1. Map to see what's there
crw map "https://docs.example.com" --format json > .crw/urls.json
# 2a. Grep for the section you need
grep '"authentication"' .crw/urls.json
# 2b. Scrape a single page
crw scrape "https://docs.example.com/api/authentication"
# 2c. Or crawl the whole /api section
crw crawl "https://docs.example.com/api" -d 2 -l 50
With MCP in a single agent turn:
crw_map(url="https://docs.example.com", limit=0)
# inspect links[], pick the /changelog/* subset
crw_crawl(url="https://docs.example.com/changelog", maxDepth=1, maxPages=20)
/blog/*,
scope the crawl to that sub-path.--sitemap-only only when you trust the sitemap is complete.grep, jq '.links[]' (CLI
JSON output), or wc -l rather than loading the full list into model context.
For REST responses use jq '.data.links[]' (links are nested under data).crw scrape or crw crawl.npx claudepluginhub us/crwDiscovers and lists all URLs on a website with optional search filtering. Use when you know the site but need a specific page.
Discovers and lists all URLs on a website via the Tavily CLI without extracting content. Use when you know the site but not the exact page, or to find subpages on large domains.
Crawls websites or sections to extract all pages via async BFS. Step 4 of crw workflow – map first to estimate scope. Supports depth, limits, JS rendering, and multiple output formats.