From milan-jovanovic
Scrapes new articles from Milan Jovanovic's .NET blog (post-November 2025) using optimized pre-filtering from listing page, Firecrawl scraping, and Python scripts to target only new or changed content.
How this skill is triggered — by the user, by Claude, or both
Slash command
/milan-jovanovic:scrape-postsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Scrape new articles from Milan Jovanovic's .NET blog with **optimized pre-filtering**. Parses dates from listing page to avoid unnecessary per-article scraping.
Scrape new articles from Milan Jovanovic's .NET blog with optimized pre-filtering. Parses dates from listing page to avoid unnecessary per-article scraping.
--force: Re-scrape all articles (compare content hash to skip unchanged)--since YYYY-MM-DD: Custom date filter (default: 2025-11-01)--limit N: Limit number of articles (for testing)--dry-run: Preview what would be scraped without savingInvoke the milan-jovanovic:milan-jovanovic-blog skill to load context and access scripts.
Key efficiency optimization: Parse dates from listing page BEFORE scraping individual articles.
Scrape the blog listing page using firecrawl_scrape:
URL: https://www.milanjovanovic.tech/blog
Format: markdown
Save listing content to temp file (e.g., .claude/temp/milan-listing.md)
Run pre-filter script to identify articles needing scraping:
# Normal mode - only new articles
python scripts/core/check_new_articles.py .claude/temp/milan-listing.md --json --since 2025-11-01
# Force mode - include existing for re-check
python scripts/core/check_new_articles.py .claude/temp/milan-listing.md --json --force --since 2025-11-01
Parse JSON output to get to_scrape list. If empty, skip to Step 5 (no scraping needed).
For each article in to_scrape:
For articles with in_index: false (new):
firecrawl_scrapecanonical/milanjovanovic-tech/blog/{slug}.mdFor articles with in_index: true (force mode re-check):
firecrawl_scrapecontent_hash from pre-filter outputAfter scraping completes:
python scripts/management/refresh_index.py
Report:
The scraper removes these promotional patterns:
Footer patterns (stop processing):
Sponsor patterns (remove section):
Inline patterns (remove):
| Scenario | Without Optimization | With Optimization |
|---|---|---|
| No new articles | 10+ firecrawl requests | 1-2 requests |
| 1 new article | 10+ firecrawl requests | 2-3 requests |
| Force (unchanged) | 10+ requests | 10+ requests but skips writes |
Why this matters: Firecrawl has API costs and rate limits. Pre-filtering saves 80-90% of requests when articles haven't changed.
/milan-jovanovic:scrape-posts
/milan-jovanovic:scrape-posts --limit 3 --dry-run
/milan-jovanovic:scrape-posts --force
/milan-jovanovic:scrape-posts --since 2025-12-01
If firecrawl MCP is not connected, the command will fail. Ensure the firecrawl MCP server is configured and running.
If listing page dates can't be parsed, the script logs them in no_date category. These articles are skipped unless you provide a specific URL.
If check_new_articles.py shows 0 articles to scrape:
--force to re-check)--since)npx claudepluginhub melodic-software/claude-code-plugins --plugin milan-jovanovicSearches Milan Jovanovic's .NET blog (Nov 2025+) for Clean Architecture, DDD, CQRS, EF Core, ASP.NET Core patterns, code examples, and .NET 10 guidance. Invoke for .NET architecture.
Generates original blog posts in HTML from text/URL/topic, adds images via Fal.ai or Playwright screenshots, commits to a GitHub Pages repo, and returns the public URL.
Automatically scrapes websites by analyzing page structure, handling pagination/anti-blocking, discovering article series using Playwright and Crawl4AI. Zero config needed.