From tavily
Crawls websites and extracts content from multiple pages via the Tavily CLI. Supports depth/breadth control, path filtering, semantic instructions, and saving pages as local markdown files.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tavily:tavily-crawlThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
/docs/)# Basic crawl
tvly crawl "https://docs.example.com" --json
# Save each page as a markdown file
tvly crawl "https://docs.example.com" --output-dir ./docs/
# Deeper crawl with limits
tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json
# Filter to specific paths
tvly crawl "https://example.com" --select-paths "/api/.*,/guides/.*" --exclude-paths "/blog/.*" --json
# Semantic focus (returns relevant chunks, not full pages)
tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json
| Option | Description |
|---|---|
--max-depth | Levels deep (1-5, default: 1) |
--max-breadth | Links per page (default: 20) |
--limit | Total pages cap (default: 50) |
--instructions | Natural language guidance for semantic focus |
--chunks-per-source | Chunks per page (1-5, requires --instructions) |
--extract-depth | basic (default) or advanced |
--format | markdown (default) or text |
--select-paths | Comma-separated regex patterns to include |
--exclude-paths | Comma-separated regex patterns to exclude |
--select-domains | Comma-separated regex for domains to include |
--exclude-domains | Comma-separated regex for domains to exclude |
--allow-external / --no-external | Include external links (default: allow) |
--include-images | Include images |
--timeout | Max wait (10-150 seconds) |
-o, --output | Save JSON output to file |
--output-dir | Save each page as a .md file in directory |
--json | Structured JSON output |
For agentic use (feeding results to an LLM):
Always use --instructions + --chunks-per-source. Returns only relevant chunks instead of full pages — prevents context explosion.
tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json
For data collection (saving to files):
Use --output-dir without --chunks-per-source to get full pages as markdown files.
tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/
--max-depth 1, --limit 20 — and scale up.--select-paths to focus on the section you need.--limit to prevent runaway crawls.npx claudepluginhub tavily-ai/skills --plugin tavilyCrawls websites or sections to extract all pages via async BFS. Step 4 of crw workflow – map first to estimate scope. Supports depth, limits, JS rendering, and multiple output formats.
Bulk extracts content from an entire website or site section using firecrawl. Supports depth limits, path filtering, and concurrent crawling.
Web search, content extraction, site crawling, URL discovery, and deep research via the Tavily CLI. Returns LLM-optimized JSON. Escalates from search to extract to map to crawl to research.