From paperboy
Fetch, filter, and summarize news/articles from configured RSS sources into a daily markdown digest in an Obsidian vault. Trigger with "fetch my newspaper", "run paperboy", "check my feed".
How this skill is triggered — by the user, by Claude, or both
Slash command
/paperboy:paperboyThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
I fetch news from the user's configured sources (RSS feeds and the 1440 daily newsletter), filter items against the user's stated interests, summarize the keepers, and write a single markdown digest file into an Obsidian vault. The vault holds the interests file, the source list, per-source state (seen item IDs), and all digest output — nothing lives outside it.
I fetch news from the user's configured sources (RSS feeds and the 1440 daily newsletter), filter items against the user's stated interests, summarize the keepers, and write a single markdown digest file into an Obsidian vault. The vault holds the interests file, the source list, per-source state (seen item IDs), and all digest output — nothing lives outside it.
rss — standard RSS 2.0 feed; one candidate per <item>.1440-sitemap — the 1440 daily newsletter; URL points to a sitemap.xml. fetch.py walks the sitemap, fetches each newsletter page in the backfill window, and parses the "Need To Know" section into per-blurb candidates. These candidates come pre-summarized with inline citation links to the underlying source articles, so Step 7 (summarize) is short-circuited for them.reddit-sub — a subreddit listing. URL is a subreddit page (e.g., https://www.reddit.com/r/<sub>/); fetch.py converts it to the equivalent .json endpoint and parses each post as a candidate. A bare subreddit URL defaults to top-of-day; users can override by appending a listing path (/hot/, /new/, /rising/, /top/, /controversial/). Reddit posts are polymorphic — fetch.py routes the url field to whatever WebFetch can actually summarize: external article for link posts; old.reddit.com/<permalink> for self-text, image, video, or reddit-hosted media posts. The user-facing reddit.com comments page is always exposed as discussion_url.$PAPERBOY_VAULT_DIR (default ~/Documents/PaperboyVault). If missing, run init to seed it.scripts/fetch.py — returns JSON of new-to-us items across all sources, plus the user's alternates and paywall_domains lists for step 5.interests.md using LLM judgment. Output keep/skip + one-line rationale.WebSearch for the same story at a non-paywalled source (preferring the user's listed alternates) and swap to it.WebFetch and writing a 2-4 sentence summary.feed/YYYY-MM-DD-HHMMSS.md.scripts/finalize.py — commits all fetched item IDs as "seen" so they won't resurface.$PAPERBOY_VAULT_DIR/
├── interests.md # User's keep/skip rules — read at every run
├── sources.md # Source list (slug | url | type)
├── state/
│ └── <source-slug>.json # Per-source: seen_ids[], last_fetched_at
└── feed/
└── YYYY-MM-DD-HHMMSS.md # One digest per invocation
Invoke these steps in order. $CLAUDE_SKILL_DIR is the skill's own directory.
Resolve the vault path from PAPERBOY_VAULT_DIR (default ~/Documents/PaperboyVault). The vault holds interests.md, sources.md, state/, and feed/.
Check for <vault>/sources.md. If it exists, the vault is initialized — proceed to Step 2.
Otherwise, this is a first run. Walk the user through setup before doing anything else:
PAPERBOY_VAULT_DIR (default ~/Documents/PaperboyVault) — vault location. To use a different path, the user should set this env var and re-run.PAPERBOY_BACKFILL_DAYS (default 7) — lookback window: caps first-run pulls and rejects items older than this on every run. Most users leave it.python3 "${CLAUDE_SKILL_DIR}/scripts/init.py"
This copies seed interests.md and sources.md from ${CLAUDE_SKILL_DIR}/seeds/ into the vault and creates state/ and feed/. It is idempotent — existing files are never overwritten, so the user can safely edit and re-run.interests.md — the keep/skip rules the classifier uses. The seed is intentionally generic; a tighter, more personal version produces a better feed.sources.md — which feeds to pull from. The seed includes HN, Lobsters, Christian Science Monitor (USA + World), the 1440 newsletter, and one example subreddit. Add, remove, or reslug freely.python3 "${CLAUDE_SKILL_DIR}/scripts/fetch.py" > /tmp/paperboy-candidates.json
The script prints a JSON object {fetched_at, vault, candidates: [...], errors: [...], alternates: [...], paywall_domains: [...]} to stdout. Each candidate has: source, id, title, url, published (raw RFC 822 or ISO), pub_iso (ISO 8601 or null), description. Optional fields: discussion_url (set when the source has a discussion/landing page for the item), pre_summarized (true for 1440 blurbs — skip Step 7's WebFetch), citations (list of [text, url] pairs for pre-summarized items). alternates and paywall_domains carry the user's alternate / paywall entries from sources.md (each an object with slug, url, type) — used in Step 6.
Read /tmp/paperboy-candidates.json. If candidates is empty, tell the user "Nothing new since the last run." Do not write a digest file or call finalize. Stop here.
If errors is non-empty, classify each entry before continuing:
type in sources.md is not one of rss, 1440-sitemap, or reddit-sub; the feed responds but doesn't parse as the declared type (e.g., an rss source returning Atom or HTML); fetch.py reports a structural extraction failure.Track the slugs of unsupported sources in unsupported_sources — Steps 8 and 10 use this list to direct the user to https://github.com/rezrov/claude-marketplace/issues to request official support for the new format. Continue the run regardless; no error type blocks other sources.
Read $PAPERBOY_VAULT_DIR/interests.md
Before classifying, merge candidates that point to the same external article so each story is classified, summarized, and rendered once with a combined source list.
url field, normalized (lowercased host, trailing slash stripped, ?utm_* and #fragment removed). Apply only to RSS-type candidates — never dedupe pre_summarized (1440) blurbs against anything; they're standalone curated entries even when topically overlapping.pub_iso (fall back to any non-null one), the longest non-empty title, and the longest non-empty description.discussion_url into a list on the merged candidate (e.g., sources: [{slug: "hn-frontpage", discussion_url: "..."}, {slug: "lobsters-feed", discussion_url: "..."}]). Preserve order: earliest pub_iso first.id per source — finalize (Step 9) must mark them all seen so neither source resurfaces the article.discussion_url (e.g., [HN](hn-url), [Lobsters](lobsters-url)), per the source-rendering rule below.Evaluate each candidate against the interests file. Prefer batching: send all candidates in a single prompt to yourself, returning one decision per candidate. For each candidate produce:
keep: booleanrationale: one line (used for "Why this matched" if kept; internal note if skipped)Marginal calls should err toward skip — the user prefers a tight feed over a noisy one. Political commentary, sensationalism, punditry, and thin rewrites are always skip unless interests.md explicitly says otherwise.
For each candidate where keep is true and pre_summarized is false (1440 blurbs skip this step), check whether the article URL points to a paywalled site. If yes, look for the same story at a non-paywalled source and swap to it before summarizing.
What counts as paywalled. The paywall_domains array from Step 2 lists the user's paywalled sites (each entry has a url like https://www.nytimes.com). Extract each entry's host (lowercased, leading www. stripped — e.g., nytimes.com) into a set. A candidate is paywalled if its url field, parsed the same way, matches one of these hosts (exact host match or subdomain — e.g., both nytimes.com and cooking.nytimes.com count).
Preferred alternates. The alternates array from Step 2 lists non-paywalled sources the user prefers when looking for an alternate. Extract their hosts the same way (lowercased, leading www. stripped). Treat the order as the user's preference order. If the array is empty, there is no preference list — pick from any non-paywalled result.
Procedure for each paywalled candidate:
WebSearch if not already available:
ToolSearch(query="select:WebSearch")
WebSearch with the article's title as the query. If the title is fewer than 6 words or feels generic, append the most distinctive proper noun or phrase from the candidate's description. Ask for the top ~10 results.news.google.com, news.yahoo.com, flipboard.com, medium.com, generic *.substack.com (unless the user has explicitly listed that subdomain as an alternate).url field alone:
alternate_url: the alternate URL (Step 7 fetches this instead of url)original_paywalled_url: the candidate's original url (Step 8 cites this on the "Via" line)alternate_searched: true and alternate_found: false. Step 7 will still try the original URL — WebFetch's prompt already handles paywalled pages gracefully by summarizing the visible dek/excerpt.Run multiple paywalled candidates' searches in parallel (separate WebSearch calls in the same message) when there's more than one to keep latency down.
For each candidate where keep is true:
pre_summarized is true (1440 blurbs): skip WebFetch entirely. The description field IS the summary — copy it through verbatim into the digest. Do not re-summarize, do not paraphrase, do not trim.WebFetch if not already available:
ToolSearch(query="select:WebFetch")
alternate_url on the candidate, use that; otherwise use the candidate's url.Read this article and return a 2-4 sentence neutral summary of its substantive content. Ignore navigation, ads, related links, and comments. Do not repeat the title and do not editorialize. If the page is paywalled and only a dek/excerpt is visible, briefly note that and summarize what is visible.description field and append " (summary from feed description)" to the end so the user knows the article wasn't reachable. If you fell back specifically because the alternate URL failed, append " (alternate fetch failed; summary from feed description)" instead — the original was paywalled, so going back to it isn't useful.Create $PAPERBOY_VAULT_DIR/feed/YYYY-MM-DD-HHMMSS.md where the timestamp is the current local time at invocation (fetch time, not content time). Use this structure:
# Paperboy — YYYY-MM-DD HH:MM
Fetched N candidates across M sources, kept K.
---
## [Article Title](article-url)
**Source:** [<friendly-name>](<discussion-url>) · **Published:** YYYY-MM-DD HH:MM UTC (or "unknown")
**Why this matched:** <one-line rationale from classifier>
<2-4 sentence summary>
---
## [Next Article Title](url)
...
Article heading link. If the candidate has alternate_url, the heading links to the alternate URL (that's the readable copy). Otherwise it links to the candidate's url.
Order items: most recently published first; items with unknown publish date go at the end.
Unsupported-source notice: if unsupported_sources from Step 2 is non-empty, insert a notice block immediately after the "Fetched N candidates..." line and before the first --- divider. Format:
> **Heads up:** paperboy couldn't handle the source(s) `<slug1>`, `<slug2>` with the current version of the skill. To request official support for new source types or feed formats, file an issue at <https://github.com/rezrov/claude-marketplace/issues>.
List every slug in unsupported_sources, comma-separated, each in backticks. Omit the block entirely when unsupported_sources is empty.
Source rendering:
hn-* → HN, lobsters-* → Lobsters, csm-* → CSM, 1440 → 1440, reddit-* → r/<subreddit> (extract <subreddit> from the candidate's discussion_url, which has the form https://www.reddit.com/r/<sub>/comments/... — preserve the original casing).discussion_url field, link the friendly name to that URL: [HN](https://news.ycombinator.com/item?id=...). (For HN/Lobsters this is the post's discussion page; for 1440 it's the newsletter page; for Reddit it's the comments page.)discussion_url is absent (e.g., CSM), render the friendly name as plain text.[HN](hn-url), [Lobsters](lobsters-url).Alternate-source rendering (only when alternate_url is set):
· **Via:** [<alt-host>](<alternate-url>) (original [<orig-host>](<original-paywalled-url>) is paywalled) segment to the Source: line.<alt-host> and <orig-host> are the URLs' registrable hosts (lowercased, leading www. stripped — e.g., apnews.com, nytimes.com).alternate_searched: true was set but alternate_found: false (no usable alternate was found), append · **Note:** paywalled (no alternate found) instead. The article heading still links to the original url.Pre-summarized item rendering (1440):
description already contains inline [link text](url) citations as part of the body — render it verbatim (do NOT rewrite the inline link text).**Cited:** line listing every entry in citations, but replace each citation's link text with the URL's base domain (registrable host, lowercased, with any leading www. stripped). The original citation text is discarded for this line. Example:
["Acme announces foo", "https://www.news.com/articles/123.html"] → renders as [news.com](https://www.news.com/articles/123.html)["report PDF", "https://reports.example.co.uk/x.pdf"] → renders as [reports.example.co.uk](https://reports.example.co.uk/x.pdf)·:
**Cited:** [news.com](url1) · [bbc.com](url2) · [reuters.com](url3)
Also include a collapsed "Skipped" section at the bottom listing every skipped item by title + source + one-line rationale. This gives the user feedback on the filter so they can tune interests.md.
## Skipped (X items)
- [<title>](<article-url>) — <source> — <rationale>
- ...
Use the candidate's url field for the link — same URL that would have been used if the item were kept.
Build a JSON object mapping each source slug to the list of ALL candidate IDs from that source (both kept and skipped — we don't want skipped items to resurface either). Pipe it to the finalize script:
echo '{"hn-frontpage": ["id1", "id2"], ...}' | python3 "${CLAUDE_SKILL_DIR}/scripts/finalize.py"
Finalize updates each state/<slug>.json with new seen IDs, caps seen lists to most recent 2000 per source, and records last_fetched_at.
Tell the user:
unsupported_sources is non-empty: explicitly state which source slug(s) paperboy couldn't handle and direct the user to https://github.com/rezrov/claude-marketplace/issues to request official support. This duplicates the digest notice intentionally — the user sees it in chat whether or not they open the digest.Do not open the file or summarize its contents — the user will read it in Obsidian.
Run this step only if the user's request expressed intent to view the digest — phrasings like "show it to me", "show me", "open it", "and open it in Obsidian", "let me see it", etc. Skip otherwise.
Also skip if Step 2 produced no candidates (nothing to show) or if the digest was not written.
open "obsidian://open?vault=$(basename "${PAPERBOY_VAULT_DIR:-$HOME/Documents/PaperboyVault}")"
This launches/focuses Obsidian on the vault using its registered name (the vault's directory basename). The user can then navigate to the just-written feed/YYYY-MM-DD-HHMMSS.md. Do not attempt to open a specific file — obsidian://open with a file= parameter requires the file to already be indexed, and the vault may not have refreshed yet.
| Variable | Default | Description |
|---|---|---|
PAPERBOY_VAULT_DIR | ~/Documents/PaperboyVault | Obsidian vault root |
PAPERBOY_BACKFILL_DAYS | 7 | Max lookback for backdated items; also first-run cap |
errors[], continue — do not fail the runtype, parse mismatch, structural extraction failure): track in unsupported_sources; surface in the digest's top notice (Step 8) and the chat report (Step 10) with a link to https://github.com/rezrov/claude-marketplace/issues to request official support. Never blocks other sources.interests.md.PAPERBOY_BACKFILL_DAYS that appear in a feed today are ignored — otherwise a source reshuffling its archive would flood the digest.url to whatever WebFetch can actually summarize: external sites for link posts, old.reddit.com/<permalink> for self-text/image/video posts (the discussion page is where the substance lives, and old.reddit is more scrape-friendly than the JS-heavy new UI). The agent does not need special handling — Step 7's WebFetch prompt asks for a 2–4 sentence summary regardless of payload shape. For image-only posts, the resulting summary may effectively describe the title plus what's visible on the page; that's the best available given a Reddit post that is itself just a picture.interests.md or sources.md automatically. The user owns those files.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub rezrov/claude-marketplace --plugin paperboy