From repost-with-agent
Layer 2 semantic-similarity dedupe for Repost-with-agent. After Layer 1 (exact + fuzzy-string match) returns "no duplicate", you (the agent) read the candidate draft alongside the destination's most recent posts and use your OWN reasoning to decide whether the candidate is "saying the same thing in different words" as anything already on the destination. If yes, skip the publish — Ethan would rather miss a post than ship an embarrassing paraphrased duplicate.
How this skill is triggered — by the user, by Claude, or both
Slash command
/repost-with-agent:repost-dedup-semanticThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Why this skill exists.** Ethan voice 6106 (2026-05-01): *"It should make
Why this skill exists. Ethan voice 6106 (2026-05-01): "It should make sure the agent actually semantically looks and processes the content of the message and checks the target destination and sees if there's a post with similar wording already there. If because there is, then it shouldn't go through. So the ID thing in the JSON files, etc., that's precise, and that's like layer one. But layer two is it should check the semantics, and if there's something already similar, it shouldn't post a duplicate. That'll be embarrassing."
This is Layer 2 of a two-layer dedupe pipeline. Layer 1 catches verbatim re-posts via cheap string ops; Layer 2 catches paraphrased duplicates via your own semantic reasoning.
| Layer | Skill | Method | Catches | Cost |
|---|---|---|---|---|
| 1 | repost-dedup | Exact sourceItemId lookup + fuzzy-string match (normalize, ≥80-char prefix overlap) | Verbatim and near-verbatim re-posts | Cheap (string ops) |
| 2 | repost-dedup-semantic (this) | Agent reads candidate + recent destination posts, makes a judgment | Paraphrased duplicates ("same point, different words") | One pass of your reasoning |
Both run in series. Layer 1 first as a quick filter; Layer 2 only on candidates that survived Layer 1. A candidate must pass BOTH to publish.
Before you start this skill you should already have:
pair.policy.semanticDedupeWindowSize.pair.policy.semanticDedupeEnabled
(default true) and pair.policy.semanticDedupeWindowSize (default 30).If semanticDedupeEnabled === false, skip this skill entirely and proceed
to publish. Don't append any audit event — the user explicitly opted out.
If you don't have the destination scrape from Layer 1 (e.g. Layer 1 was
skipped or its scrape failed), this skill cannot run reliably. Behave as
"uncertain": if pair.policy.blockOnUncertainDuplicate === true (default),
SKIP the candidate and append pair.dedupe.uncertain audit with reason
"semantic-dedupe-no-destination-scrape". If false, proceed to publish.
For each candidate, ask yourself this question literally:
Would a reader who has already seen one of the existing destination posts find the candidate redundant?
That's the threshold. Not "do they share keywords" — Layer 1 already handles keyword overlap. The Layer 2 question is about communicative function: is the candidate making essentially the same point, the same announcement, the same opinion, the same claim, with the same call-to-action implied or stated? If yes → it's a paraphrased duplicate, skip. If the candidate has genuine new information, a different angle, a different communicative function, or addresses a different audience → proceed.
Confirm Layer 1 already ran and returned "unique". If not, stop — Layer 2 only runs on Layer-1-clean candidates.
Load the destination scrape window. Take the most recent
windowSize (default 30) post bodies from the destination scrape Layer
1 produced. If fewer than windowSize posts exist, use whatever Layer 1
gathered.
Read the candidate draft AND the destination posts. Hold both in your reasoning. You don't need to dump them to disk — this is an in-context read.
For each existing post in the window, ask the question from "The judgment you have to make" above. Walk through the comparison explicitly in your reasoning:
First match wins. As soon as you decide one existing post is a semantic duplicate of the candidate, stop comparing and treat the candidate as a duplicate. (No need to score every existing post — one match is enough.)
Lean conservative. When genuinely on the fence between "proceed" and "skip", skip. Ethan voice 6106: "that'll be embarrassing." The cost of a missed post is low; the cost of an embarrassing duplicate is high. Asymmetric — bias toward skip.
You produce one of two verdicts per candidate:
semantic-duplicate — skip publishpair.publish.semantic_duplicate audit event to
~/.repost-with-agent/pairs/<id>/audit.jsonl with these fields:
{
"ts": "<ISO-8601>",
"event": "pair.publish.semantic_duplicate",
"pairId": "<id>",
"sourceItemId": "<candidate sourceItemId>",
"candidateExcerpt": "<first 200 chars of candidate draft>",
"matchedExistingUrl": "<destination URL of the matched existing post>",
"matchedExistingExcerpt": "<first 200 chars of the matched existing post>",
"agentReasoning": "<1-3 sentence justification — why these are the same communicative content>",
"windowSize": <int — number of posts you compared against>
}
posted.jsonl so this sourceItemId is
treated as Layer-1 done on next run (avoids re-doing Layer 2 on the same
candidate next tick). Also append a matching catch-up line to
~/.repost-with-agent/global-posted.jsonl with
event: "global.publish.semantic_duplicate", the resolved contentKey, and
status: "skipped-duplicate" so every other pair learns this destination
already has the content:
{"ts":"<ts>","sourceItemId":"<id>","canonicalSourceUrl":"<src>","destinationUrl":"<matchedExistingUrl>","destinationId":"<dest id>","note":"caught-up via Layer 2 semantic dedupe"}
semantic-unique — proceed to publishrepost-run
step 8 / repost-backfill step 6 publish branch).pair.publish.start / pair.publish.success events will fire.pair.dedupe.semantic_clean with {candidateExcerpt, windowSize, candidatesCompared}. Not required by the schema but useful
for retrospective analysis.This skill does NOT send a Telegram message itself. The non-negotiable
Telegram-confirm rule applies only to successful publishes; a skipped
candidate is the absence of a publish, so no ping is owed. The successful-
publish ping happens in the calling skill after Layer 2 returns
semantic-unique.
If you want to surface a streak of skips to Ethan (e.g. backfill skipped all 10 candidates as semantic duplicates), include that in the regular final-summary message the calling skill sends — don't fire a separate Telegram for each Layer 2 skip.
semantic-unique and proceed (or
defer to a length / sanity check earlier in the flow).semantic-unique and proceed.pair.policy.semanticDedupeWindowSize defaults to 30. This is enough for
most accounts: 30 recent posts on a typical destination covers ~1–4 weeks
of activity.
The publish flow now looks like:
Step 4 — Layer 1 dedupe (skills/repost-dedup)
├── local — posted.jsonl exact match
├── global — cross-pair contentKey ledger
└── remote — destination fuzzy-string match
Step 4.5 — Layer 2 semantic dedupe (THIS SKILL)
└── agent reasoning over candidate + destination scrape
Step 5+ — pick newest non-duplicate, expand URLs, length check
Step 8 — publish
Step 9 — append posted.jsonl
Step 10 — Telegram-confirm
A candidate is "publishable" iff it cleared BOTH Layer 1 and Layer 2.
skills/repost-dedup/SKILL.md — Layer 1 (exact + fuzzy string match).skills/repost-global-dedupe/SKILL.md — global cross-pair content ledger.skills/repost-run/SKILL.md — single-post flow that calls this skill at step 4.5.skills/repost-backfill/SKILL.md — multi-post flow that calls this skill per loop iteration.skills/repost-notify/SKILL.md — Telegram payload spec (only on successful publish).docs/state-files.md — pair.publish.semantic_duplicate audit event schema + the pair.policy.semanticDedupeEnabled / semanticDedupeWindowSize fields.docs/destinations/<platform>.md — per-platform notes including window-size guidance.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub ethansk/repost-with-agent --plugin repost-with-agent