From ScraperAPI
Guides setup and usage of ScraperAPI's DataPipeline for scheduled, managed scraping projects with webhook or dashboard delivery.
How this skill is triggered — by the user, by Claude, or both
Slash command
/scraperapi:scraperapi-datapipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
DataPipeline is a managed scraping product. You define a project (what to scrape, how often,
DataPipeline is a managed scraping product. You define a project (what to scrape, how often, where to send results), and ScraperAPI runs it on your schedule without you managing proxies, retries, or infrastructure.
Use DataPipeline when: scraping runs on a fixed schedule, the input list is large (up to 100,000 items), results should flow to a webhook automatically, or you want email notifications on job completion.
Base URL: https://datapipeline.scraperapi.com/api
Auth: ?api_key=YOUR_KEY (query parameter on every request)
Set projectType in the create request to choose what to scrape:
| Type | Input |
|---|---|
urls | Raw HTML from any URL |
urls_with_js | Same but with JavaScript rendering |
google_search | Search queries |
google_news | Search queries |
google_jobs | Search queries |
google_shopping | Search queries |
google_maps | Search queries |
amazon_product | ASINs |
amazon_search | Search queries |
amazon_offers | ASINs |
walmart_product | Product IDs |
walmart_search | Search queries |
walmart_category | Category IDs |
walmart_reviews | Product IDs |
ebay_product | 12-digit product IDs |
ebay_search | Search queries |
redfin_listing_for_sale | Listing URLs |
redfin_listing_for_rent | Listing URLs |
redfin_listing_search | Search result URLs |
redfin_agent_details | Agent profile URLs |
import os, requests
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
BASE = "https://datapipeline.scraperapi.com/api"
project = requests.post(
f"{BASE}/projects",
params={"api_key": API_KEY},
json={
"name": "Weekly Amazon price monitor",
"projectType": "amazon_product",
"schedulingEnabled": True,
"scrapingInterval": "weekly",
"scheduledAt": "now",
"projectInput": {
"type": "list",
"list": ["B09V3KXJPB", "B08N5WRWNW"] # ASINs
},
"apiParams": {
"country_code": "us"
},
"webhookOutput": {
"url": "https://yourapp.com/pipeline-results",
"webhookEncoding": "multipart_form_data_encoding"
},
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
).json()
print(f"Project created: id={project['id']}")
| Field | Required | Description |
|---|---|---|
name | No | Human-readable project name |
projectType | Yes | What to scrape (see table above) |
schedulingEnabled | No | true to enable recurring schedule |
scrapingInterval | Yes (if scheduled) | See scheduling options below |
scheduledAt | No | "now" to run immediately on create |
projectInput | Yes | Input data (see input methods below) |
apiParams | No | Standard ScraperAPI parameters |
webhookOutput | No | Webhook delivery config |
notificationConfig | No | Email notification settings |
{
"projectInput": {
"type": "list",
"list": ["query one", "query two", "B09V3KXJPB"]
}
}
Upload a CSV with one URL/query/ASIN per line — no header rows, no commas. Do this through the dashboard when creating a project; the API accepts list inputs only.
{
"projectInput": {
"type": "webhook",
"webhookUrl": "https://yourapp.com/input-items"
}
}
ScraperAPI polls your webhook URL for the item list when the job starts. One item per line; no commas. Useful for dynamically generated lists (e.g., new ASINs added since the last run).
scrapingInterval | Description |
|---|---|
"once" | Run a single job immediately |
"hourly" | Every hour |
"daily" | Once per day |
"weekly" | Once per week |
"monthly" | Once per month |
"cron" | Custom cron expression (use cron field instead of interval) |
Recurring schedules (hourly, daily, weekly, monthly, cron) require a paid plan.
Set "scheduledAt": "now" to trigger the first run immediately when the project is created.
Results are POSTed to your webhook URL as they complete. The webhookEncoding field controls
the format:
{
"webhookOutput": {
"url": "https://yourapp.com/results",
"webhookEncoding": "multipart_form_data_encoding"
}
}
Omit webhookOutput and results are saved for download in the
DataPipeline dashboard. Results are retained for
30 days then automatically deleted.
Output formats by project type:
urls / urls_with_js → HTML wrapped in JSONL# List all projects
projects = requests.get(f"{BASE}/projects", params={"api_key": API_KEY}).json()
# Get a single project
project = requests.get(f"{BASE}/projects/525", params={"api_key": API_KEY}).json()
# Update (partial update — only include fields to change)
requests.patch(
f"{BASE}/projects/525",
params={"api_key": API_KEY},
json={
"scrapingInterval": "daily",
"apiParams": {"premium": True},
"notificationConfig": {"notifyOnSuccess": "never"}
}
)
# Delete / archive (irreversible without support)
requests.delete(f"{BASE}/projects/525", params={"api_key": API_KEY})
Updatable fields: scrapingInterval, scheduledAt, outputFormat, apiParams, notificationConfig.
# List jobs for a project
jobs = requests.get(
f"{BASE}/projects/525/jobs",
params={"api_key": API_KEY}
).json()
# Cancel a running job
requests.delete(
f"{BASE}/projects/525/jobs/{job_id}",
params={"api_key": API_KEY}
)
# Running requests within the job finish first; final status becomes "Cancelled"
A new job can only start if no other job for that project is currently running.
{
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
Options for both fields: "never", "with_every_run", "daily", "weekly".
apiParams ReferenceAll standard ScraperAPI parameters are supported inside apiParams:
| Parameter | Purpose |
|---|---|
country_code | Geotarget (e.g. "us", "gb") |
render | JavaScript rendering |
premium | Premium residential proxies |
ultra_premium | Ultra-premium proxies (mutually exclusive with premium) |
device_type | "desktop" or "mobile" |
output_format | "text" or "markdown" for LLM pipelines |
autoparse | Structured JSON extraction for supported sites |
keep_headers | Forward custom headers |
follow_redirect | Control redirect handling |
wait_for_selector | Wait for CSS selector (requires render: true) |
screenshot | Capture screenshot (auto-enables rendering) |
retry_404 | Retry 404 responses |
DataPipeline uses the same underlying credit rates as the Standard API. Cost is the sum of all requests in a job run. Preview the estimated cost before launching a project from the dashboard.
Only successful 200 and 404 responses are charged; failed requests are not.
| Limit | Value |
|---|---|
| Max input items | 100,000 per job |
| Direct list input | 500 items |
| Data retention | 30 days |
| Free plan concurrency | 5 connections |
| Free plan scheduling | One-time runs only |
npx claudepluginhub scraperapi/scraperapi-skills --plugin scraperapiProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.