From repost-with-agent
Reference for the Repost-with-agent dedupe algorithm — how to check a candidate against per-pair history, the global cross-pair ledger, and recent destination posts to avoid double-posting. Used by repost-run, repost-backfill, and any other publish path.
How this skill is triggered — by the user, by Claude, or both
Slash command
/repost-with-agent:repost-dedupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reference algorithm for deciding whether a candidate post is a duplicate of
Reference algorithm for deciding whether a candidate post is a duplicate of
something already on the destination. Custom user rules run before this skill;
see skills/repost-custom-rules/SKILL.md for not-post-worthy preference skips.
Then three dedupe checks run before publish: local (per-pair posted.jsonl),
global (cross-pair global-posted.jsonl), and remote (against the actual
destination feed).
This skill is Layer 1 of a two-layer dedupe pipeline. It catches
verbatim and near-verbatim re-posts via cheap string ops. Layer 2
(skills/repost-dedup-semantic/SKILL.md) catches paraphrased duplicates via
agent semantic reasoning. Both layers must clear before publish — see
"Layer separation" below.
sourceItemId lookup), but only catches posts
this one pair recorded.All three checks are mandatory before any publish unless the user explicitly
disables global dedupe for a pair with pair.policy.globalDedupeEnabled: false.
A candidate skipped by custom rules should not reach this dedupe stage and must
not be appended to posted.jsonl / global-posted.jsonl.
Repost-with-agent v4.3+ runs dedupe in two passes:
| Layer | Skill | Method | Catches | Cost |
|---|---|---|---|---|
| 1 | repost-dedup (this skill) | Exact sourceItemId lookup + fuzzy-string match (normalize + ≥80-char prefix overlap) | Verbatim and near-verbatim re-posts | Cheap (string ops) |
| 2 | repost-dedup-semantic | Agent reads candidate + recent destination posts and judges semantic redundancy | Paraphrased duplicates ("same point, different words") | One reasoning pass |
Layer 1 runs first as a quick filter. Only candidates that survive Layer 1 proceed to Layer 2. A candidate is publishable iff it clears BOTH layers.
Layer 1 cannot catch a paraphrase like "We just shipped X — agents do the work, no APIs" vs. an existing "Just launched the X cross-poster. Pure agent-driven, no API needed." The strings differ enough that fuzzy-prefix overlap won't trigger; only Layer 2's semantic check catches it. Conversely, Layer 2 is wasted on verbatim re-posts where Layer 1's string match is trivially correct and orders of magnitude cheaper. Run them in series.
Layer 2 is enabled by default (pair.policy.semanticDedupeEnabled: true)
and can be turned off per-pair if you genuinely want only string-level
dedupe.
~/.repost-with-agent/pairs/<id>/posted.jsonl (line-delimited JSON,
may be empty or missing).sourceItemId, status,
destinationUrl/destinationId, and remediation/deletion markers.sourceItemId and compute the latest live-success verdict:
status missing/posted/caught-up/skipped-duplicate
with a destination URL or ID, no needsRemediation / needsRepost, and no
malformed/deleted event/status;deleted-*, posted-deleted, event: "global.publish.deleted", or needsRepost: true;posted-malformed, event: "global.publish.malformed", needsRemediation: true, or needs-repost.Do not use a simple grep sourceItemId as the final answer when cleanup rows may exist; grep is only a prefilter. Use jq/Python/agent reasoning to compute the newest live-success verdict.
jq -c 'select(.sourceItemId == "<candidate-id>")' ~/.repost-with-agent/pairs/<id>/posted.jsonl
# Inspect the matched rows and apply the latest live-success verdict; row existence alone is not duplicate proof.
Read ~/.repost-with-agent/global-posted.jsonl (append-only NDJSON; may be
missing). Use skills/repost-global-dedupe/SKILL.md to resolve a candidate
contentKey and detect whether any pair has already posted/caught-up that
content for this pair's destination platform/account.
Key points:
pair.policy.globalDedupeEnabled is true. Treat missing as true.contentKey from <sourcePlatform>:<sourceItemId> or canonical URL.contentKey so the downstream pair still represents the LinkedIn-origin
content.contentKey using the same latest live-success verdict as local
dedupe. global.publish.deleted, deleted-*, posted-malformed,
needsRepost, and needsRemediation rows remove or quarantine old proof;
they are not duplicates.duplicate-global. Skip and append the audit/catch-up lines
described in repost-global-dedupe.This check is what makes all pairs look globally instead of thinking in silos.
pair.destination.profileUrl.\s+ with single .tolower.https?://\S+ token. (Why? X / Bluesky rewrite
URLs into shortened aliases like t.co/abc123 that won't match the
original lnkd.in/abc from the source.).!?,;: and quotes.Why 80 chars? It's enough that incidental phrases ("good morning everyone") won't match by accident, but short enough that truncated reposts (e.g. when the destination has a tighter char cap and the body was cut) still match.
If the destination scrape fails (network error, page failed to load, CAPTCHA modal, login expired):
pair.policy.blockOnUncertainDuplicate === true (default): SKIP all
candidates this run.false: proceed and publish anyway (Ethan would rather see a near-
duplicate than miss a post — but the default is conservative).Append a pair.dedupe.uncertain audit event with the reason.
For the running agent: the simplest implementation is to do everything in memory in your context. You don't need a script — just read the JSON / scrape the page / compare strings.
If you want to factor into Bash for a backfill of 50+ candidates, here's a sketch:
# Normalize a string (whitespace + lowercase + strip URLs + trailing punct).
normalize() {
echo "$1" \
| tr '[:upper:]' '[:lower:]' \
| sed -E 's#https?://[^[:space:]]+##g' \
| sed -E 's/[[:space:]]+/ /g' \
| sed -E 's/[[:punct:]]+$//'
}
But honestly, doing it in your reasoning is fine for the small candidate counts (≤20) you'll typically see.
For each candidate, produce one of these verdicts:
duplicate-local — latest local ledger verdict is live success. Skip.
not-duplicate-local-remediation — local rows exist but the latest verdict is
deleted/malformed/needs-remediation; do not skip on local ledger alone.
duplicate-global — latest global same-destination verdict is live success.
Skip + append the global/per-pair catch-up records from
skills/repost-global-dedupe/SKILL.md.
not-duplicate-global-remediation — global rows exist but the latest verdict
is deleted/malformed/needs-remediation; do not skip on global ledger alone.
duplicate-remote — found in the destination scrape. Skip + append a
catch-up entry to posted.jsonl so we don't re-check next run:
{"ts":"<ts>","sourceItemId":"<id>","canonicalSourceUrl":"<src>","destinationUrl":"<destination url where match was found>","destinationId":"<id>","note":"caught-up via destination dedupe"}
unique — neither match. Eligible for publish.
uncertain — cannot determine. See "Uncertain matches" above.
skills/repost-custom-rules/SKILL.md — user preference skip rules +
append-only considered state; runs before this dedupe layer.skills/repost-dedup-semantic/SKILL.md — Layer 2 semantic dedupe (agent
reasoning over candidate vs. recent destination posts). Runs AFTER this
skill on Layer-1-clean candidates. Catches paraphrased duplicates.skills/repost-global-dedupe/SKILL.md — cross-pair contentKey ledger.skills/repost-run/SKILL.md — calls this dedupe at step 4 (Layer 1) and
repost-dedup-semantic at step 4.5 (Layer 2).skills/repost-backfill/SKILL.md — runs Layer 1 once across the full
candidate set (newest-first) and Layer 2 per loop iteration before the
publish loop.docs/destinations/<platform>.md — per-platform quirks (e.g. X's t.co
rewriting, Bluesky's link cards) plus Layer 2 window-size guidance.npx claudepluginhub ethansk/repost-with-agent --plugin repost-with-agentGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.