Skill

web-scraping

Extracts structured data from websites using Python browser automation with openbrowser-ai. Handles JS-rendered content, pagination, multi-page scraping, and data processing with pandas.

Python

Popularity

Parent stars

234

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/openbrowser:web-scraping

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bash(openbrowser-ai:*) Bash(curl:*) Bash(uv:*) Bash(irm:*) Read Write

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.

SKILL.md

206 lines · ~1.3k tokens

Stats

LanguagePython

Parent stars234

Parent forks19

MaintenanceExcellent

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Web Scraping

Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.

All code runs via openbrowser-ai -c. The daemon starts automatically and persists variables across calls. All browser functions are async -- use await.

Setup

Before running, verify openbrowser-ai is installed:

openbrowser-ai --help

If not found, install:

# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex

Workflow

Step 1 -- Navigate and get content overview

openbrowser-ai -c - <<'EOF'
await navigate("https://example.com/data")

# Get browser state to see page title, URL, element count
state = await browser.get_browser_state_summary()
print(f"Title: {state.title}")
print(f"URL: {state.url}")
print(f"Elements: {len(state.dom_state.selector_map)}")
EOF

Step 2 -- Extract data with JavaScript

Use evaluate() to run JS in the browser and return structured data directly as Python objects:

openbrowser-ai -c - <<'EOF'
data = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".product-card")).map(el => ({
    name: el.querySelector(".title")?.textContent?.trim(),
    price: el.querySelector(".price")?.textContent?.trim(),
    url: el.querySelector("a")?.href
  }))
})()
""")

import json
print(json.dumps(data, indent=2))
EOF

Step 3 -- Process data with Python

Use pandas, regex, or other Python tools to clean and transform extracted data:

openbrowser-ai -c - <<'EOF'
import json

# Filter and transform
filtered = [item for item in data if item.get("price")]
for item in filtered:
    # Extract numeric price
    price_str = item["price"].replace("$", "").replace(",", "")
    item["price_float"] = float(price_str)

# Sort by price
filtered.sort(key=lambda x: x["price_float"])
print(json.dumps(filtered, indent=2))
EOF

Or with pandas if available:

openbrowser-ai -c - <<'EOF'
import pandas as pd
df = pd.DataFrame(data)
print(df.to_string())
EOF

Step 4 -- Handle pagination

openbrowser-ai -c - <<'EOF'
results = []
page = 1

while True:
    # Extract data from current page
    page_data = await evaluate("""
    (function(){
      return Array.from(document.querySelectorAll(".item")).map(el => ({
        name: el.textContent.trim()
      }))
    })()
    """)
    results.extend(page_data)
    print(f"Page {page}: {len(page_data)} items")

    # Check for next button
    has_next = await evaluate("""
    (function(){ return !!document.querySelector(".pagination .next:not(.disabled)") })()
    """)

    if not has_next:
        break

    # Replace with the actual index from browser.get_browser_state_summary()
    await click(next_button_index)
    await wait(2)
    page += 1

print(f"Total: {len(results)} items")
EOF

Step 5 -- Handle infinite scroll

openbrowser-ai -c - <<'EOF'
results = []
prev_count = 0

for _ in range(20):  # Max 20 scroll attempts
    # Get current items
    count = await evaluate("""
    (function(){ return document.querySelectorAll(".item").length })()
    """)

    if count == prev_count:
        break  # No new content loaded

    prev_count = count
    await scroll(down=True, pages=3)
    await wait(1)

# Now extract all loaded items
results = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".item")).map(el => ({
    text: el.textContent.trim()
  }))
})()
""")
print(f"Extracted {len(results)} items")
EOF

Step 6 -- Multi-page scraping

openbrowser-ai -c - <<'EOF'
urls = [
    "https://example.com/page-1",
    "https://example.com/page-2",
    "https://example.com/page-3",
]

all_data = []
for url in urls:
    await navigate(url)
    await wait(1)

    page_data = await evaluate("""
    (function(){
      return document.querySelector("h1")?.textContent?.trim()
    })()
    """)
    all_data.append({"url": url, "title": page_data})
    print(f"{url}: {page_data}")

import json
print(json.dumps(all_data, indent=2))
EOF

Tips

Code is piped via stdin using heredoc (-c - <<'EOF'), so all Python syntax works without shell escaping issues.
Use evaluate() for structured DOM extraction -- it returns Python objects directly.
Use Python for post-processing: filtering, sorting, deduplication, format conversion.
For large datasets, process pages incrementally rather than loading everything into memory.
Check for rate limiting; add await wait(2) between page loads if needed.
Variables persist between -c calls while the daemon is running, so you can build up results across multiple calls.

web-scraping

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

web-scraping

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Web Scraping

Setup

Workflow

Step 1 -- Navigate and get content overview

Step 2 -- Extract data with JavaScript

Step 3 -- Process data with Python

Step 4 -- Handle pagination

Step 5 -- Handle infinite scroll

Step 6 -- Multi-page scraping

Tips

Similar Skills

Web Scraping

Setup

Workflow

Step 1 -- Navigate and get content overview

Step 2 -- Extract data with JavaScript

Step 3 -- Process data with Python

Step 4 -- Handle pagination

Step 5 -- Handle infinite scroll

Step 6 -- Multi-page scraping

Tips

Similar Skills