Skill

nod-serp-clusters

Clusters keywords by SERP similarity using Google results (NodesHub) or semantic embeddings (OpenRouter). Includes Louvain-based grouping and LLM-generated cluster names.

Python

automation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/nodeshub-seo-skills:nod-serp-clusters

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadWrite

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Two clustering methods available. **Always ask the user which method they want before running.**

SKILL.md

213 lines · ~2.6k tokens

Stats

LanguagePython

Stars24

Forks4

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

SERP Clusters

Two clustering methods available. Always ask the user which method they want before running.

Clustering Methods

Method 1: SERP-based (default) — `cluster.py`

Keywords sharing the same Google top-10 results belong to the same cluster. Uses Weighted Jaccard + Louvain.

Best for: Content planning, page mapping, cannibalization analysis
Cost: 1 NodesHub token per keyword
Accuracy: High — based on what Google actually shows

Method 2: Semantic — `cluster_semantic.py`

Keywords with similar meaning belong to the same cluster. Uses OpenRouter embeddings + cosine similarity + Louvain.

Best for: Topic discovery, keyword grouping by intent, when you don't want to spend NodesHub tokens
Cost: Only OpenRouter (embeddings + naming) — no NodesHub tokens
Accuracy: Good for meaning, but doesn't reflect actual SERP overlap

When to use which?

Scenario	Recommended method
"Which keywords can target the same page?"	SERP-based — Google decides
"Group keywords by topic/meaning"	Semantic — faster and cheaper
"Find cannibalization"	SERP-based — needs actual SERP data
"Quick clustering, no SERP budget"	Semantic
"Most accurate content mapping"	Both — compare results

Quick Start

# === SERP-based clustering ===
python3 .claude/skills/nod-serp-clusters/scripts/cluster.py keywords.csv --gl pl --hl pl
python3 .claude/skills/nod-serp-clusters/scripts/cluster.py keywords.csv --gl pl --hl pl --levels 3 --report html
python3 .claude/skills/nod-serp-clusters/scripts/cluster.py keywords.csv --gl pl --hl pl --workers 3 --budget 200

# Rerun without re-fetching SERPs (uses cache from previous run)
python3 .claude/skills/nod-serp-clusters/scripts/cluster.py keywords.csv --gl pl --hl pl --levels 3 --report html
# Force fresh SERP fetch (ignore cache)
python3 .claude/skills/nod-serp-clusters/scripts/cluster.py keywords.csv --gl pl --hl pl --no-cache

# === Semantic clustering (no NodesHub tokens needed) ===
python3 .claude/skills/nod-serp-clusters/scripts/cluster_semantic.py keywords.csv --threshold 0.25 --levels 3
python3 .claude/skills/nod-serp-clusters/scripts/cluster_semantic.py keywords.csv --output clusters_semantic.csv --json

How It Works

Step 1 — Weighted Jaccard Similarity

For each keyword pair, compare top-10 SERP results:

Position weight: 1 / log2(pos + 2) — pos 1 has weight 1.44, pos 10 has 0.43
Exact URL match: average of both position weights
Domain soft match: average weights × 0.3 bonus (same domain, different URL)
Score = sum(intersection) / sum(union)

Step 2 — Dynamic Domain Weighting

Instead of hardcoded blacklists, domains are weighted by how often they appear:

domain_weight = 1 / sqrt(coverage), min 0.2
Wikipedia at 50% coverage → weight 0.63 (reduced impact)
Niche site at 5% → weight 1.0 (full impact)

Step 3 — Louvain Community Detection

Graph with keywords as nodes, weighted Jaccard as edge weights. Louvain finds natural communities without chain clustering (a problem with agglomerative methods).

Step 4 — LLM Naming

OpenRouter generates 2-5 word cluster names based on keyword samples.

Setup

Requires two API keys:

NODESHUB_API_KEY — for SERP data. If not set up: run /connect-nodeshub.
OPENROUTER_API_KEY — for LLM cluster naming. If not set up: run /connect-openrouter.

# Verify both keys
python3 .claude/skills/nod-nodeshub-api/scripts/check_setup.py

Parameters

Always present these parameters to the user before running, so they understand what each one does.

Clustering parameters

Parameter	What it does	Default	Effect
`--threshold T`	Min weighted Jaccard similarity to connect two keywords	0.55	Lower = more keywords grouped together, higher = tighter clusters. 0.55 ≈ 7/10 shared URLs
`--levels N`	Clustering depth (1-3)	1	1 = single flat clustering, 2 = broad + specific, 3 = broad + medium + specific hierarchy
`--domain-bonus F`	Bonus for same-domain-different-URL match	0.3	Higher = more weight to domain-level similarity (not just exact URL). 0 = URL-only matching
`--resolution F`	Override Louvain resolution for all levels	auto	Higher = more clusters, lower = fewer bigger clusters. Overrides per-level defaults

Data parameters

Parameter	What it does	Default	Effect
`--top-n N`	Only cluster top N keywords (sorted by serp_overlap)	0 (all)	Useful to focus on most important keywords and save tokens
`--budget N`	Max NodesHub tokens to spend	no limit	Hard stop for SERP fetching
`--workers N`	Concurrent SERP requests	4	1 = sequential (safe), 4 = default, 8 = aggressive. Higher = faster
`--gl`	Country code for SERP	pl	Match to target market
`--hl`	Language code	pl	Match to target language

Output parameters

Parameter	What it does	Default	Effect
`--report html`	Generate HTML report with dendrogram, domain visibility, SERP features	off	Interactive D3.js dendrogram with collapsible levels
`--report md`	Generate Markdown report with text tree	off	Same data, text format
`--json`	Also output JSON file	off	Machine-readable cluster data
`--output PATH`	Custom output CSV path	auto	Default: `{input_stem}_clustered.csv`
`--model`	OpenRouter model for cluster naming	google/gemini-2.5-flash-lite	Any OpenRouter model ID

Report (`--report`)

HTML report includes:

Dendrogram — interactive D3.js tree showing L1 → L2 → L3 → keyword hierarchy. Click nodes to expand/collapse branches
Top 20 domains — visibility table (appearances, unique URLs, avg position)
SERP features — distribution of PAA, videos, knowledge graph, etc. with percentages
Cluster cards — each multi-keyword cluster with keyword tags and top ranking domains

MD report includes:

Text tree — ASCII hierarchy of clusters
Same domain visibility and SERP features tables

Branding

HTML reports can display your company logo and use your brand colors. To customize:

Edit assets/branding/brand-config.json — set company name, colors, fonts
Replace assets/branding/logo-light.svg and logo-dark.svg with your logos

The report header shows your logo + report title + date; the footer shows your company name + generation timestamp. If no branding is configured, default Nodeshub styling is used.

To extract brand styles from your website automatically, run assets/branding/extract-brand-styles.js in browser DevTools.

Multi-level Clustering

With --levels 2 or --levels 3, the script clusters at multiple thresholds simultaneously. Each keyword gets assigned to a cluster at each level. SERP data is fetched only once — levels only affect the clustering step.

Level	`--levels 2`	`--levels 3`	Resolution	Purpose
L1 (broad)	threshold×0.5, res=0.5	threshold×0.4, res=0.3	Low = fewer big clusters	Pillar topics, content silos
L2 (medium)	—	threshold×0.7, res=0.8	Medium	Subtopic groups
L3 (specific)	threshold×1.0, res=1.0	threshold×1.0, res=1.0	Standard	Individual pages

CSV output columns per level: {level}_id, {level}_name, {level}_size

Example with --levels 3:

keyword, L1_broad_id, L1_broad_name, L2_medium_id, L2_medium_name, L3_specific_id, L3_specific_name
"seo cennik", 0, "SEO uslugi", 3, "Cennik pozycjonowania", 12, "Koszty SEO"

Output

CSV columns (added to original keyword-research columns)

Column	Description
`cluster_id`	Numeric cluster ID (0 = largest cluster)
`cluster_name`	LLM-generated descriptive name for the cluster
`cluster_size`	Number of keywords in this cluster

Threshold guide

Threshold	~Shared URLs	Clustering behavior	Use when
0.22	~2/10	Very loose — large topic buckets	Pillar pages, content silos
0.35	~4/10	Loose — broad subtopic groups	Content planning
0.45	~5/10	Medium — related keywords	Subtopic mapping
0.55	~7/10	Default — strong SERP overlap	Standard SEO clustering
0.65	~8/10	Strict — near-identical SERPs	1-page-per-cluster mapping
0.75+	~9/10	Very strict — true synonyms only	Cannibalization analysis

Cost

NodesHub: 1 token per keyword (SERP fetch)
OpenRouter: ~1 LLM call per 10 clusters (naming). Minimal cost with gemini-2.5-flash-lite.
Example: 100 keywords = 100 NodesHub tokens + a few cents OpenRouter

Workflow

Run /nod-keyword-research to get a keyword CSV
Run this skill on the CSV to cluster by SERP similarity
Use clusters for content planning — one page per cluster

Report

After collecting data, ask the user:

"Add results to an HTML report?"

New report — creates a branded HTML report in reports/

Existing report — appends a section to a chosen report

Skip — no report

Use render_report_section(all_level_results, all_serps, all_snippets) from cluster.py, then create_report() or append_section() from report.py. Note: render_report_section returns (section_html, extra_head) — pass extra_head to create_report(extra_head=extra_head).

Related Skills

nod-keyword-research — generates the input keyword CSV
nod-content-brief — create content briefs for each cluster

nod-serp-clusters

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

nod-serp-clusters

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

SERP Clusters

Clustering Methods

Method 1: SERP-based (default) — cluster.py

Method 2: Semantic — cluster_semantic.py

When to use which?

Quick Start

How It Works

Step 1 — Weighted Jaccard Similarity

Step 2 — Dynamic Domain Weighting

Step 3 — Louvain Community Detection

Step 4 — LLM Naming

Setup

Parameters

Clustering parameters

Data parameters

Output parameters

Report (--report)

HTML report includes:

MD report includes:

Branding

Multi-level Clustering

Output

CSV columns (added to original keyword-research columns)

Threshold guide

Cost

Workflow

Report

Related Skills

Similar Skills

SERP Clusters

Clustering Methods

Method 1: SERP-based (default) — cluster.py

Method 2: Semantic — cluster_semantic.py

When to use which?

Quick Start

How It Works

Step 1 — Weighted Jaccard Similarity

Step 2 — Dynamic Domain Weighting

Step 3 — Louvain Community Detection

Step 4 — LLM Naming

Setup

Parameters

Clustering parameters

Data parameters

Output parameters

Report (--report)

HTML report includes:

MD report includes:

Branding

Multi-level Clustering

Output

CSV columns (added to original keyword-research columns)

Threshold guide

Cost

Workflow

Report

Related Skills

Similar Skills

Method 1: SERP-based (default) — `cluster.py`

Method 2: Semantic — `cluster_semantic.py`

Report (`--report`)

Method 1: SERP-based (default) — `cluster.py`

Method 2: Semantic — `cluster_semantic.py`

Report (`--report`)