From workflows
Access Bright Data datasets, Web Archive search/dump, Web Unlocker zones, and FINRA/SEC coverage. Only for users with a Bright Data account and token.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:bright-dataThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- [Cost Enforcement](#cost-enforcement)
GET /datasets/list; POST /webarchive/search + polling GET /webarchive/search/<id> (returns counts + dump_cost_usd WITHOUT charging).POST /webarchive/dump (NEVER call /webarchive/dump, trigger a dataset collection, or create/use a Web Unlocker zone unless the user has explicitly approved the spend for THAT operation. Always run a free search first to get the exact dump_cost_usd and show it to the user before any dump.
The default read-only token (BRIGHTDATA_API_TOKEN) can list datasets and run archive searches but CANNOT create zones. Do not attempt zone creation with it.
All endpoints use a Bearer token. NEVER hardcode it. Read from env or a gitignored key file:
# preferred: env var
export BRIGHTDATA_API_TOKEN=... # set in shell profile / .env (gitignored)
# fallback used during this project:
TOKEN=$(cat ~/projects/batm/scratch/brd_token.txt) # gitignored key file
import os
TOKEN = os.environ.get("BRIGHTDATA_API_TOKEN") or open(os.path.expanduser("~/.config/brightdata/token")).read().strip()
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}
Three relevant products:
GET /datasets/list). Heavily social/company/people (LinkedIn 115M people, Instagram 620M, Crunchbase 2.3M, Glassdoor, etc.). No government/regulatory/licensing products except a "US lawyers directory" (1.4M). See references/datasets-catalog.md.references/webarchive-api.md.Base: https://api.brightdata.com/webarchive. Async — search returns a search_id, poll until status == "done".
# 1. Launch a FREE search (returns {"search_id": "..."})
curl -s -X POST https://api.brightdata.com/webarchive/search \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"filters":{"min_date":"2015-01-01","max_date":"2026-06-10","domain_whitelist":["brokercheck.finra.org"]}}'
# 2. Poll (FREE) — when done returns files_count, dump_cost_usd, estimate_batch_count
curl -s https://api.brightdata.com/webarchive/search/<search_id> \
-H "Authorization: Bearer $TOKEN"
Filters (body {"filters":{...}}):
max_age OR min_date+max_date (YYYY-MM-DD).domain_whitelist — exact host match, array.domain_like_whitelist — SQL LIKE, e.g. ["%finra%"].url_like_whitelist — SQL LIKE on full URL (use to scope a cheap subset dump).unique_url (bool) — count/dump distinct URLs only (dedupes repeat snapshots).Searches can take 9+ minutes. Launch many in parallel, then poll every ~20s. See references/webarchive-api.md for a working parallel-poll Python harness.
dump_cost_usd ≈ files_count / 1000 confirms the ~$0.001/page dump price.
curl -s https://api.brightdata.com/datasets/list -H "Authorization: Bearer $TOKEN"
# -> [{"id":"gd_...","name":"...","size":<approx record count>}, ...]
size is approximate record count. Pulling records (snapshot/trigger) is a separate paid step — not covered by the read-only token. See references/datasets-catalog.md for the categorized highlights.
| Action | Cost | Notes |
|---|---|---|
GET /datasets/list | free | metadata only |
POST /webarchive/search + poll | free | returns count + dump_cost_usd |
POST /webarchive/dump | ~$0.001 / page | the paid step; confirm cost first |
| Web Unlocker request | ~$1.5–3 / 1k successes | needs writable token + zone |
| Dataset records | per-record | snapshot/trigger; varies by dataset |
Verified 2026-06-10 via free Web Archive searches. Bright Data IS a viable source for current BrokerCheck/IAPD data — via the Web Archive, not the marketplace.
brokercheck.finra.org — 1,434,501 snapshots / 714,614 distinct URLs (~$715 to dump distinct).adviserinfo.sec.gov — 1,635,389 snapshots / 664,043 distinct URLs (~$664 to dump distinct).api.brokercheck.finra.org and reports.adviserinfo.sec.gov = 0 (only the HTML profile pages were captured, not the JSON API or PDF reports).Full numbers, year brackets, and verdict in references/finra-sec-coverage.md. For deep disclosure history (pre-2024), use FINRA/SEC bulk downloads or WRDS instead (see the wrds skill, Form ADV).
references/webarchive-api.md — full Web Archive API reference, filters, the parallel-poll Python harness, url_like_whitelist subset-dump pattern, cost arithmetic.references/datasets-catalog.md — categorized highlights of the 1,576-dataset marketplace (company/people/finance/professional), with ids and sizes; how to re-fetch the catalog.references/finra-sec-coverage.md — verified FINRA BrokerCheck + SEC IAPD coverage: totals, distinct, year-by-year temporal spread, dump costs, and the viability verdict.npx claudepluginhub edwinhu/workflows --plugin workflowsExtracts web data from platforms (Amazon, LinkedIn, Instagram, etc.) and generic sites using the Bright Data Python SDK. Covers scraping, search, datasets, browser automation.
Queries the Dewey Data academic data marketplace for foot traffic, POI, mobility, consumer, and real estate datasets via API key. Downloads partitioned Parquet/CSV files for local analysis with DuckDB.
Conducts market research using Y Combinator, SEC filings, social media, and web scraping via anysite MCP. Useful for startup discovery, industry analysis, public company research, and competitive intelligence.