From thinking-frameworks-skills
Checks a candidate seed against an existing corpus for exact (sha256) and near-duplicates (title/content Jaccard, shared topics). Exact matches exit; near-matches link via related_seeds instead of creating duplicates.
How this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-frameworks-skills:dedupe-against-corpusThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- [Three tiers of match](#three-tiers-of-match)
Related skills: Called by ingest-inbox-item step 4. Queried ad-hoc by search-corpus. Backlinks into matched seeds (the one place this skill writes outside new seeds).
SKIPPED.LINK candidate.LINK candidate.Tier 2 and 3 candidates are unioned (up to 3 total LINK targets).
Dedupe one candidate seed:
- [ ] Step 1: Grep all corpus/**/*.md frontmatter for fingerprint match
- [ ] Step 2: If exact match, return SKIPPED
- [ ] Step 3: Normalized title Jaccard against all existing seeds
- [ ] Step 4: For seeds sharing ≥2 topic tags, first-200-word Jaccard
- [ ] Step 5: Union tier-2 and tier-3 candidates, cap at 3
- [ ] Step 6: If any LINK candidates, return LINK with related_seeds list; else CREATE
- [ ] Step 7: For LINK, Edit matched seeds' related_seeds to add this candidate's id
The matched seeds get their related_seeds field extended (append, never replace) ONLY IF:
manual_edits: false on the matched seed, ORDo not touch any other field on the matched seed.
Candidate: new seed about "dropout as ensemble" with topics [regularization, ensembling, dropout], first 200 words describing thinned-network averaging.
Existing corpus:
2026-03-11-l2-as-gaussian-prior — topics [regularization, bayesian]. Title Jaccard = 0.1 (different). Shared tags: 1. Skip content-similarity check.2026-02-08-bagging-in-deep-nets — topics [regularization, ensembling]. Shared tags: 2. First-200-word Jaccard: 0.51 → LINK candidate.Output: {action: LINK, related_seeds: [2026-02-08-bagging-in-deep-nets]}.
Side effect: corpus/seeds/2026-02-08-bagging-in-deep-nets.md gets its related_seeds appended with the new seed's id.
links.related_seeds field.corpus/dead/ — dead ideas stay dead.status: published to status: seed — published posts are immutable except for typo fixes.SKIPPED, LINK, CREATE.npx claudepluginhub lyndonkl/claude --plugin thinking-frameworks-skillsSearches a substacker corpus (seeds, drafts, published) for prior thinking on a topic, keyword, analogy, or author. Returns ranked matches with id, status, density, and excerpt. Use when another agent needs precedent before generating new material.
Deduplicates overlapping research materials, AI notes, excerpts, transcripts, and source packs by inventorying sources, grouping duplicates, and identifying canonical sources.
Detects keyword cannibalization across blog posts by extracting primary keywords from titles/headings, clustering semantically similar targets, and flagging competing posts with severity-scored merge/differentiate recommendations.