From superpowers
Generates large-scale labeled image datasets using web scraping and Large Multimodal Models (Gemini Vision) with ~95% accuracy. For object detection and image classification projects.
How this skill is triggered — by the user, by Claude, or both
Slash command
/superpowers:automated-image-dataset-generationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides a scalable, reusable framework for automatically generating labeled image datasets using web scraping combined with Large Multimodal Models (LMMs) for metadata generation. The methodology addresses the challenge of manual data collection being resource-intensive, error-prone, and time-consuming.
This skill provides a scalable, reusable framework for automatically generating labeled image datasets using web scraping combined with Large Multimodal Models (LMMs) for metadata generation. The methodology addresses the challenge of manual data collection being resource-intensive, error-prone, and time-consuming.
Key Capabilities:
Use this skill when:
Define Target Categories:
Design Search Queries:
# Generate diverse search queries
categories = ["structural steel beam", "steel column construction", "roof truss"]
query_variations = [
f"{cat} {mod}"
for cat in categories
for mod in ["photo", "site", "construction", "building"]
]
Set Collection Parameters:
Implement Multi-Source Scraping:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
def scrape_images(query, num_images=1000):
"""
Scrape images from multiple sources:
- Google Images
- Bing Images
- Domain-specific sites
"""
images = []
# Use appropriate rate limiting
# Respect robots.txt
# Store source URLs for attribution
return images
Image Download and Storage:
def download_images(image_urls, output_dir):
"""
Download images with:
- Duplicate detection (hash-based)
- Format validation
- Resolution filtering
- Metadata preservation
"""
pass
Initial Filtering:
Configure LMM (Gemini Vision or equivalent):
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-1.5-flash')
def generate_metadata(image_path, categories):
"""
Use LMM to analyze image and generate metadata
"""
image = PIL.Image.open(image_path)
prompt = f"""
Analyze this image and determine:
1. Does it contain any of these objects: {categories}?
2. If yes, which specific category?
3. Confidence level (high/medium/low)
4. Object location description (for detection tasks)
5. Image quality assessment
Return structured JSON response.
"""
response = model.generate_content([prompt, image])
return parse_response(response.text)
Batch Processing:
def process_dataset(image_dir, categories, batch_size=100):
"""
Process images in batches with:
- Rate limiting
- Error handling
- Progress tracking
- Checkpoint saving
"""
results = []
for batch in get_batches(image_dir, batch_size):
batch_results = [
generate_metadata(img, categories)
for img in batch
]
results.extend(batch_results)
save_checkpoint(results)
return results
Quality Metrics:
Apply Category Rules:
def filter_by_rules(metadata, rules):
"""
Apply domain-specific rules:
- Minimum confidence threshold (e.g., 0.8)
- Category-specific validation
- Cross-reference with search query
"""
filtered = []
for item in metadata:
if item['confidence'] >= rules['min_confidence']:
if validate_category(item, rules):
filtered.append(item)
return filtered
Handle Edge Cases:
Generate Dataset Structure:
dataset/
├── images/
│ ├── category_1/
│ ├── category_2/
│ └── ...
├── annotations/
│ ├── metadata.json
│ └── labels.csv
├── splits/
│ ├── train.txt
│ ├── val.txt
│ └── test.txt
└── README.md
Create Annotation Files:
def create_annotations(filtered_data, output_dir):
"""
Generate standard annotation formats:
- COCO format (for object detection)
- CSV with labels (for classification)
- YOLO format (if needed)
"""
pass
Split Dataset:
Based on the original research:
# Core
pip install requests beautifulsoup4 selenium pillow
# LMM
pip install google-generativeai # or openai for GPT-4V
# Image processing
pip install imagehash opencv-python
# Dataset tools
pip install pandas scikit-learn
npx claudepluginhub lunartech-x/superpowers --plugin superpowersAutomatically builds image datasets from the web using textual metadata for query expansion and CNN-based filtering. Reduces bias and improves cross-dataset generalization.
Uploads images, labels, organizes datasets, creates Roboflow projects (detection/segmentation/keypoint/classification), manages tags, splits, versions, and searches with RoboQL.
Curates FiftyOne datasets: inspect schema, audit annotations, analyze class distributions, find duplicates, create subsets, and build train/val/test splits. Works with any CV dataset type.