From ScraperAPI
Reference for ScraperAPI's Async Jobs API: submit background scraping jobs, poll results, use webhooks, and handle batch jobs up to 50k URLs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/scraperapi:scraperapi-asyncThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The Async API submits scraping jobs in the background and retries them for up to 24 hours to
The Async API submits scraping jobs in the background and retries them for up to 24 hours to maximize success. Results are retrieved by polling a status URL or received automatically via webhook.
api.scraperapi.com) — simpler and returns inline.Use Async when: scraping 20+ URLs, the target site is slow or flaky, you want webhook delivery, or you need to scrape PDFs/images.
| Action | Method | URL |
|---|---|---|
| Submit single job | POST | https://async.scraperapi.com/jobs |
| Submit batch (up to 50k) | POST | https://async.scraperapi.com/batchjobs |
| Check / retrieve job | GET | https://async.scraperapi.com/jobs/<jobId> |
| Cancel job | DELETE | https://async.scraperapi.com/jobs/<jobId> |
Auth: apiKey in the JSON request body (note: apiKey camelCase, unlike the Standard API's api_key).
import os, requests, time
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
# Submit
r = requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://example.com/product/123",
"apiParams": {
"render": True,
"country_code": "us",
}
}
)
job = r.json()
# {"id": "...", "status": "running", "statusUrl": "...", "url": "..."}
# Poll
def poll(status_url, interval=5, max_wait=120):
deadline = time.time() + max_wait
while time.time() < deadline:
data = requests.get(status_url).json()
if data["status"] == "finished":
return data["response"]["body"]
if data["status"] == "failed":
raise RuntimeError(f"Job failed: {data.get('failReason')}")
time.sleep(interval)
raise TimeoutError("Job did not finish in time")
html = poll(job["statusUrl"])
Finished job response shape:
{
"id": "...",
"status": "finished",
"statusUrl": "...",
"url": "https://example.com/product/123",
"response": {
"headers": { "content-type": "text/html", "sa-final-url": "...", "sa-statuscode": "200" },
"body": "<!doctype html>...",
"statusCode": 200
}
}
jobs = requests.post(
"https://async.scraperapi.com/batchjobs",
json={
"apiKey": API_KEY,
"urls": [
"https://example.com/page/1",
"https://example.com/page/2",
# ... up to 50,000
],
"apiParams": {"country_code": "us"}
}
).json()
# Returns a list of {id, status, statusUrl, url} — one per submitted URL
results = [poll(job["statusUrl"]) for job in jobs]
For workloads over 50,000 URLs, split into multiple batch requests. Use webhooks (below) instead of polling when batches are large — polling 10,000 status URLs serially is slow.
Use webhooks to receive results without polling. ScraperAPI POSTs the completed job payload to your URL when the scrape finishes.
requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://example.com/",
"callback": {
"type": "webhook",
"url": "https://yourapp.com/scraperapi/callback"
}
}
)
Webhook mechanics:
"expectUnsuccessReport": true to also receive failed job payloads.Failed job callback payload:
{
"id": "...",
"attempts": 50,
"status": "failed",
"failReason": "failed_due_to_timeout",
"url": "https://example.com/"
}
{
"apiKey": "YOUR_KEY",
"url": "https://example.com",
"urls": ["url1", "url2"],
"method": "GET",
"headers": { "Accept-Language": "en-US" },
"body": "foo=bar",
"callback": { "type": "webhook", "url": "https://..." },
"expectUnsuccessReport": false,
"timeoutSec": 600,
"meta": { "jobLabel": "batch-42" },
"apiParams": {
"autoparse": false,
"country_code": "us",
"keep_headers": false,
"device_type": "desktop",
"follow_redirect": true,
"premium": false,
"ultra_premium": false,
"render": false,
"wait_for_selector": ".content",
"screenshot": false,
"retry_404": false,
"output_format": "html",
"max_cost": 10
}
}
| Parameter | Type | Purpose |
|---|---|---|
expectUnsuccessReport | boolean | Receive webhook payload for failed jobs too |
timeoutSec | integer | Override default job timeout (seconds) |
meta | object | Custom metadata — echoed back in every response/callback for correlation |
meta is especially useful for tracking which batch or workflow a job belongs to:
{ "meta": { "batchId": "run-2024-06", "sourceFile": "urls.csv" } }
requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://api.example.com/search",
"method": "POST",
"headers": {"content-type": "application/x-www-form-urlencoded"},
"body": "query=scraperapi&page=1",
}
)
When the target URL returns binary content, the response body is Base64-encoded in
response.base64EncodedBody.
import base64
r = requests.post(
"https://async.scraperapi.com/jobs",
json={"apiKey": API_KEY, "url": "https://example.com/report.pdf"}
)
job = r.json()
# ... wait or poll ...
result = requests.get(job["statusUrl"]).json()
pdf_bytes = base64.b64decode(result["response"]["base64EncodedBody"])
with open("report.pdf", "wb") as f:
f.write(pdf_bytes)
Job results are stored for up to 72 hours (24 hours guaranteed) after the job finishes. After that, the data is deleted — resubmit the job if you need it again.
Retrieve results before the retention window closes. For long pipelines, prefer webhooks so results are pushed to your system immediately upon completion.
| Status | Meaning | Action |
|---|---|---|
Job finished, statusCode: 200 | Success | Use response.body |
Job finished, statusCode: 403 | Target blocked the scrape | Retry with premium: true in apiParams |
Job failed, failReason: failed_due_to_timeout | Timed out after 24h retries | Check if target is reachable; try render: false |
| HTTP 401 on submission | Bad API key | Check SCRAPERAPI_API_KEY |
| HTTP 403 on submission | Out of credits or plan limit | Check dashboard |
| HTTP 429 on submission | Too many concurrent submissions | Back off and re-submit in batches |
Use max_cost in apiParams to cap per-request credit spend — requests that would exceed the
cap return a 403 rather than consuming more credits than expected.
The Async API uses the same credit costs as the Standard API:
| Request type | Credits |
|---|---|
| Standard | 1 |
render: true | 10 |
premium: true | 10 |
ultra_premium: true | 30 |
| Failed requests | 0 |
Async jobs that fail after exhausting all retries are not charged.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
npx claudepluginhub scraperapi/scraperapi-skills --plugin scraperapi