From openbrowser
Extracts structured data from websites using Python browser automation with openbrowser-ai. Handles JS-rendered content, pagination, multi-page scraping, and data processing with pandas.
How this skill is triggered — by the user, by Claude, or both
Slash command
/openbrowser:web-scrapingThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.
Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.
All code runs via openbrowser-ai -c. The daemon starts automatically and persists variables across calls. All browser functions are async -- use await.
Before running, verify openbrowser-ai is installed:
openbrowser-ai --help
If not found, install:
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex
openbrowser-ai -c - <<'EOF'
await navigate("https://example.com/data")
# Get browser state to see page title, URL, element count
state = await browser.get_browser_state_summary()
print(f"Title: {state.title}")
print(f"URL: {state.url}")
print(f"Elements: {len(state.dom_state.selector_map)}")
EOF
Use evaluate() to run JS in the browser and return structured data directly as Python objects:
openbrowser-ai -c - <<'EOF'
data = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".product-card")).map(el => ({
name: el.querySelector(".title")?.textContent?.trim(),
price: el.querySelector(".price")?.textContent?.trim(),
url: el.querySelector("a")?.href
}))
})()
""")
import json
print(json.dumps(data, indent=2))
EOF
Use pandas, regex, or other Python tools to clean and transform extracted data:
openbrowser-ai -c - <<'EOF'
import json
# Filter and transform
filtered = [item for item in data if item.get("price")]
for item in filtered:
# Extract numeric price
price_str = item["price"].replace("$", "").replace(",", "")
item["price_float"] = float(price_str)
# Sort by price
filtered.sort(key=lambda x: x["price_float"])
print(json.dumps(filtered, indent=2))
EOF
Or with pandas if available:
openbrowser-ai -c - <<'EOF'
import pandas as pd
df = pd.DataFrame(data)
print(df.to_string())
EOF
openbrowser-ai -c - <<'EOF'
results = []
page = 1
while True:
# Extract data from current page
page_data = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".item")).map(el => ({
name: el.textContent.trim()
}))
})()
""")
results.extend(page_data)
print(f"Page {page}: {len(page_data)} items")
# Check for next button
has_next = await evaluate("""
(function(){ return !!document.querySelector(".pagination .next:not(.disabled)") })()
""")
if not has_next:
break
# Replace with the actual index from browser.get_browser_state_summary()
await click(next_button_index)
await wait(2)
page += 1
print(f"Total: {len(results)} items")
EOF
openbrowser-ai -c - <<'EOF'
results = []
prev_count = 0
for _ in range(20): # Max 20 scroll attempts
# Get current items
count = await evaluate("""
(function(){ return document.querySelectorAll(".item").length })()
""")
if count == prev_count:
break # No new content loaded
prev_count = count
await scroll(down=True, pages=3)
await wait(1)
# Now extract all loaded items
results = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".item")).map(el => ({
text: el.textContent.trim()
}))
})()
""")
print(f"Extracted {len(results)} items")
EOF
openbrowser-ai -c - <<'EOF'
urls = [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3",
]
all_data = []
for url in urls:
await navigate(url)
await wait(1)
page_data = await evaluate("""
(function(){
return document.querySelector("h1")?.textContent?.trim()
})()
""")
all_data.append({"url": url, "title": page_data})
print(f"{url}: {page_data}")
import json
print(json.dumps(all_data, indent=2))
EOF
-c - <<'EOF'), so all Python syntax works without shell escaping issues.evaluate() for structured DOM extraction -- it returns Python objects directly.await wait(2) between page loads if needed.-c calls while the daemon is running, so you can build up results across multiple calls.npx claudepluginhub billy-enrizky/openbrowser-ai --plugin openbrowserExtracts structured data (tables, lists, prices) from web pages with multi-strategy fallback, pagination, monitoring, and CSV/JSON export.
Extracts structured data from websites like product listings, tables, search results, or profiles, generating an executable Playwright script and JSON/CSV output.
Builds production-ready web scrapers for any site using Bright Data infrastructure. Guides site analysis, API selection, selector extraction, pagination, and implementation.