Skill

web-scraping

Web scraping guide for sub-agents. Covers Firecrawl CLI fallback scraping when WebFetch fails (JS-heavy sites, anti-bot walls, 403 errors, empty content) and advanced capabilities like structured data extraction with Zod schemas, multi-page crawls, and search-plus-scrape. Use when WebFetch returns garbage or empty pages, when you need typed data from a page (prices, features, specs), or when you need to ingest multiple pages from a site.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/newsroom:web-scraping

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Required tools for consuming agents**: WebFetch, Bash(bunx firecrawl-cli *), Read

Supporting Files

references/crawling.mdreferences/structured-extraction.md

SKILL.md

86 lines · ~835 tokens

Stats

Parent stars0

MaintenanceGood

Last CommitFeb 27, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Web Scraping Field Card

Required tools for consuming agents: WebFetch, Bash(bunx firecrawl-cli *), Read

Integration: Any newsroom sub-agent should consult this skill when WebFetch fails or when structured/multi-page scraping is needed.

What Do You Need?

Need	Tool	Details
Page content as markdown	WebFetch first, then Firecrawl CLI	See below
Structured data from a page (prices, features, specs)	Firecrawl extract	Read references/structured-extraction.md
Multiple pages from one site	Firecrawl crawl	Read references/crawling.md
Search the web + scrape results	Firecrawl search	Read references/crawling.md

Getting Page Content

Step 1: Try WebFetch First

WebFetch is free, fast, and already available. Use it by default.

Works for: blogs, news articles, documentation, static pages, most forum threads.

Step 2: Recognize Failure

Switch to Firecrawl CLI when WebFetch returns:

Empty or near-empty content (page requires JavaScript rendering)
403/429 errors (anti-bot protection)
Mangled HTML with no useful text (client-side rendered SPA)
Login walls or cookie consent overlays blocking content

Do NOT retry WebFetch on the same URL -- it will fail again.

Step 3: Firecrawl CLI Scrape

Requires: firecrawl-cli (install: npm install -g firecrawl-cli or use via bunx firecrawl-cli). Authenticates via FIRECRAWL_API_KEY env var or firecrawl auth --api-key <key>.

If firecrawl-cli is not installed or FIRECRAWL_API_KEY is unset, skip to Step 4 (Report Gaps). Do not retry or attempt workarounds.

Output to stdout (default -- pipe or capture as needed):

bunx firecrawl-cli scrape "<url>"

Output to file (more token-efficient -- read from disk instead of context):

bunx firecrawl-cli scrape "<url>" -o /tmp/scrape-output.md

Then use the Read tool on /tmp/scrape-output.md to pull only what you need into context.

Handles: JS rendering, dynamic content, basic anti-bot bypass, clean Markdown output (strips nav, headers, footers with --only-main-content).

Does NOT handle: login-gated content, CAPTCHAs, form filling, aggressive Cloudflare Turnstile.

For multiple URLs, scrape each separately to different files:

bunx firecrawl-cli scrape "<url1>" -o /tmp/scrape-1.md
bunx firecrawl-cli scrape "<url2>" -o /tmp/scrape-2.md

The CLI is beta (released Jan 2026) -- expect quirks and flag changes. Run bunx firecrawl-cli scrape --help for current options.

Step 4: Report Gaps Honestly

If both WebFetch and Firecrawl fail:

Note which URL was inaccessible and why
Do not fabricate content or silently skip the source
Move on to other sources

web-scraping

Invocation

Context Preview

Supporting Files

SKILL.md

web-scraping

Invocation

Context Preview

Supporting Files

SKILL.md

Web Scraping Field Card

What Do You Need?

Getting Page Content

Step 1: Try WebFetch First

Step 2: Recognize Failure

Step 3: Firecrawl CLI Scrape

Step 4: Report Gaps Honestly

Similar Skills

Web Scraping Field Card

What Do You Need?

Getting Page Content

Step 1: Try WebFetch First

Step 2: Recognize Failure

Step 3: Firecrawl CLI Scrape

Step 4: Report Gaps Honestly

Similar Skills