Produces structured technical briefs for web scraping projects, specifying data fields, output schema, edge cases, and legal/ethical considerations for developer handoff.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autopunk-media-skills:scraper-briefThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Writes a clear, technical brief describing exactly what data needs to be collected from a website or set of web pages, how it should be structured, and what edge cases and legal/ethical considerations apply — for handoff to a developer or data team.
Writes a clear, technical brief describing exactly what data needs to be collected from a website or set of web pages, how it should be structured, and what edge cases and legal/ethical considerations apply — for handoff to a developer or data team.
Required: The URL or website description. What data you need — the specific fields visible on the page. The scope — how many pages or records you expect to collect. The reason you need it (news investigation, research, ongoing monitoring). Optional: How frequently the data needs to be updated; whether pagination, search filters, or login walls are involved; any known technical obstacles (JavaScript rendering, anti-scraping measures); the output format you need.
A technical brief (400–600 words) structured under six headings: Data Source, Fields to Collect (with example values), Output Schema, Scope and Frequency, Technical Considerations, and Legal and Ethical Notes. Tone: clear, precise, technical — written to be acted on by a developer. Includes a simple example row showing the expected output structure.
Website: A national government procurement portal that lists all awarded contracts. Each contract has its own page with: contract title, awarding authority, contractor name, contractor registration number, contract value, award date, contract category, and a link to the procurement notice PDF. There are approximately 45,000 contracts currently listed, going back to 2015. The site has pagination (50 results per page). I need the full dataset, then weekly updates of new awards. Investigation purpose: Corporate concentration analysis — tracking which companies are winning the most public contracts. Output format needed: CSV.
Data Source
URL: [Government procurement portal — to be confirmed] Description: National government procurement portal listing all awarded contracts. Each contract is on a dedicated page accessible from a paginated index. Approximately 45,000 records spanning 2015 to present.
Fields to Collect
Collect the following fields from each contract page:
| Field name | Description | Example value |
|---|---|---|
contract_id | Unique identifier visible in the page URL or on the page | GPC-2024-00482 |
contract_title | Full title of the contract | Maintenance of Central Regional Roads 2024–2027 |
awarding_authority | Name of the public body awarding the contract | National Roads Authority |
contractor_name | Name of the winning contractor | Bridgepoint Construction Ltd |
contractor_reg_no | Company registration number | IE-12345678 |
contract_value | Total contract value in native currency | €4,200,000 |
award_date | Date of contract award | 2024-03-15 |
contract_category | Procurement category or CPV code | Road construction works |
notice_pdf_url | URL of the linked procurement notice PDF | https://... |
Output Schema
CSV file with one row per contract. Column names as above. contract_value to be stored as numeric (strip currency symbol). award_date to be stored as ISO 8601 (YYYY-MM-DD). notice_pdf_url as full absolute URL.
Scope and Frequency
award_date or a contract_id range to identify new records.Technical Considerations
Legal and Ethical Notes
robots.txt before any scraping begins. If scraping is disallowed in robots.txt, seek legal advice before proceeding.npx claudepluginhub ur-grue/autopunk-media-skills --plugin autopunk-media-skillsExtracts structured data (tables, lists, prices) from web pages via multi-strategy scraping with pagination, validation, transforms, and CSV/JSON/Markdown export.
Builds production-ready web scrapers for any site using Bright Data infrastructure. Guides site analysis, API selection, selector extraction, pagination, and implementation.
Automates web crawling and data extraction using Firecrawl: scrape pages, crawl sites, extract structured data with AI, batch URLs, and map site structures.