From jarrettmeyer
Scrape websites using the scrapling library — static pages, JS-rendered content, and anti-bot protected sites
How this skill is triggered — by the user, by Claude, or both
Slash command
/jarrettmeyer:scraplingThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Scrapling is an adaptive Python web scraping library that handles static pages, JavaScript-rendered content, and anti-bot protected sites through a unified API.
Scrapling is an adaptive Python web scraping library that handles static pages, JavaScript-rendered content, and anti-bot protected sites through a unified API.
Unless the user specifies otherwise, write all scraping files to:
.scratch/jarrettmeyer/scrapling/<name>.py.scratch/jarrettmeyer/scrapling/<name>.<ext>Ensure .scratch/ is listed in .gitignore to keep scraping artifacts out of version control.
Always run scripts with uv run --with "scrapling[all]" — no installation, venv, or pyproject.toml required. Works in any project.
uv run --with "scrapling[all]" python .scratch/jarrettmeyer/scrapling/scrape.py
Select based on the target site's characteristics:
| Site type | Fetcher | When to use |
|---|---|---|
| Static HTML | Fetcher | Fast, no JS needed, no bot protection |
| Anti-bot / Cloudflare | StealthyFetcher | Bot detection, Cloudflare, fingerprinting |
| JavaScript-rendered | DynamicFetcher | Content only visible after JS executes |
| Multi-page with login | *Session variants | Cookie persistence across requests |
If unsure, start with Fetcher. If you get blocked or see empty content, escalate to StealthyFetcher, then DynamicFetcher.
Fetcherfrom scrapling.fetchers import Fetcher
page = Fetcher.get('https://example.com', stealthy_headers=True)
print(page.status) # 200
StealthyFetcherfrom scrapling.fetchers import StealthyFetcher
page = StealthyFetcher.fetch('https://example.com', headless=True)
print(page.status)
DynamicFetcherfrom scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch(
'https://example.com',
wait_selector='.target-element', # wait for element before parsing
headless=True,
)
print(page.status)
from scrapling.fetchers import FetcherSession # or StealthySession, DynamicSession
with FetcherSession(impersonate='chrome') as session:
session.post('https://example.com/login', data={'user': '...', 'pass': '...'})
page = session.get('https://example.com/dashboard')
All fetchers return the same Selector object — extraction syntax is identical regardless of which fetcher was used.
# Single value
title = page.css('h1::text').get()
# All matching values
links = page.css('a::attr(href)').getall()
# Element with nested access
for item in page.css('.product'):
name = item.css('.name::text').get()
price = item.css('.price::text').get()
print(name, price)
title = page.xpath('//h1/text()').get()
rows = page.xpath('//table/tr').getall()
# Find element containing specific text
el = page.find_by_text('Add to Cart', first_match=True)
import json
results = []
for item in page.css('.result'):
results.append({
'title': item.css('h2::text').get(),
'url': item.css('a::attr(href)').get(),
'desc': item.css('p::text').get(),
})
print(json.dumps(results, indent=2))
Use Spider when scraping more than a handful of pages.
from scrapling.spiders import Spider, Response
class MySpider(Spider):
name = "my_spider"
start_urls = ["https://example.com/listings"]
async def parse(self, response: Response):
for item in response.css('.listing'):
yield {
'title': item.css('h2::text').get(),
'price': item.css('.price::text').get(),
'url': item.css('a::attr(href)').get(),
}
# Follow pagination
next_page = response.css('a.next::attr(href)').get()
if next_page:
yield response.follow(next_page, callback=self.parse)
result = MySpider().start()
result.items.to_json('.scratch/jarrettmeyer/scrapling/output.json')
print(f"Scraped {len(result.items)} items → .scratch/jarrettmeyer/scrapling/output.json")
Run with:
uv run --with "scrapling[all]" python scrape.py
# HTTP method
page = Fetcher.post('https://api.example.com/data', json={'key': 'value'})
# Custom headers / proxy
page = Fetcher.get(
'https://example.com',
headers={'Accept-Language': 'en-US'},
proxy='http://user:pass@proxy:8080',
)
# DynamicFetcher: scroll before capturing
page = DynamicFetcher.fetch('https://example.com', scroll_down=True)
# DynamicFetcher: execute JS before capturing
page = DynamicFetcher.fetch(
'https://example.com',
wait_selector='.loaded',
execute_js="window.scrollTo(0, document.body.scrollHeight)",
)
| Symptom | Fix |
|---|---|
| Empty selectors / missing content | Switch to DynamicFetcher — page likely needs JS |
| 403 / CAPTCHA / bot block | Switch to StealthyFetcher |
| Inconsistent results across runs | Use find_by_text or similarity matching instead of brittle CSS paths |
| Need to stay logged in | Use a *Session variant |
| Large crawl is slow | Increase Spider concurrency settings |
npx claudepluginhub jarrettmeyer/skills --plugin jarrettmeyerExtracts data from JS-rendered, Cloudflare-protected, or dynamic SPA pages using the scrapling Python library with three-tier fetcher selection (HTTP, stealth Chromium, full browser automation) and CSS selectors. Use when WebFetch or simple HTTP requests fail due to anti-bot defenses or DOM-traversal needs.
Builds production-ready web scrapers for any site using Bright Data infrastructure. Guides site analysis, API selection, selector extraction, pagination, and implementation.
Scrapes web pages and websites using Firecrawl API, converting to clean markdown. Handles JavaScript rendering, anti-bot protection, paywalled content, and dynamic sites for articles, blogs, docs.