Skill

rebase-validate

Validates git rebase correctness using a five-layer pipeline (range-diff, sem, weave, jscpd, tests). Use when verifying a git rebase (especially stacked PRs with --update-refs), comparing branches post-rebase, or auditing conflict resolution correctness. Covers scenarios like: 'validate rebase', 'check rebase', 'did we lose anything', 'compare before and after rebase', 'duplicate test blocks', 'rebase validation', 'run range-diff', 'lost code during rebase', 'rebase artifacts', 'conflict resolution verification'.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/vp-git:rebase-validate

User invocable

Model invocable

Inline context

Effort: high

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

BashReadGrepGlobAgentmcp__sem__sem_diffmcp__sem__sem_impact

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Validate a git rebase for lost code, duplicate blocks, broken refs, and stale

Supporting Files

references/silent-behaviors.md

SKILL.md

352 lines · ~3.6k tokens

Stats

LanguageJavaScript

Stars0

MaintenanceExcellent

Last CommitMay 27, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Rebase Validation — Five-Layer Pipeline

Validate a git rebase for lost code, duplicate blocks, broken refs, and stale references using five complementary layers. No single tool catches all rebase regressions — the value is in layered defense.

When to Use

After completing any rebase with conflict resolution
After git rebase --update-refs on a stacked branch set
When comparing a rebased branch against the original
When the user suspects code was lost or duplicated during rebase

Quick Start

If the user just finished a rebase and wants validation, run this sequence:

Step 0: Verify refs (are we comparing the right things?)
Step 1: git range-diff -s (commit overview — seconds)
Step 2: sem entity-level diff — MCP (mcp__sem__sem_diff) or CLI, if available
Step 3: weave preview target (merge readiness — if weave is available)
Step 4: jscpd + duplicate grep (token-level — catches what sem misses)
Step 5: project validation suite (functional oracle)

Always run Step 0 first. After Step 0 completes, Steps 1-4 can run as parallel agents. Step 5 is the final oracle. On agents without a parallel-Agent tool, run Steps 1-4 sequentially — they're independent.

Step 0: Verify Refs

Before ANY comparison, confirm refs point where expected. A broken ref makes every subsequent comparison meaningless.

# Confirm branch positions
git rev-parse --short <rebased-branch>
git rev-parse --short <original-branch>

# Confirm ancestry
git merge-base --is-ancestor <expected-base> <rebased-branch> && echo "OK" || echo "WRONG BASE"

Why this matters: git rebase --update-refs silently moves ALL branch refs pointing to commits in the replayed range — including backup branches. If you created a backup before rebase and it was in the replayed range, it got moved.

Safe backup strategy: use tags (not branches) for pre-rebase snapshots, or create backup branches pointing to commits OUTSIDE the range being replayed.

Optional safety check — if no pre-rebase snapshot exists, git reflog can show where the branch pointed before rebase:

git reflog <rebased-branch> | head -5

Step 1: `git range-diff -s` (Commit-Level)

The single most important first check. Gives a complete overview in seconds.

# Using remote as "before" snapshot (most practical if pushed before rebase)
git range-diff -s origin/old-branch..origin/old-tip  new-base..new-tip

# Using pre-rebase tags
git range-diff -s pre-rebase/branch..old-tip  new-base..new-tip

# Immediately after rebase (reflog — only works for the tip branch)
# Assumes @{u} has not moved since the rebase (e.g. no force-push to upstream)
git range-diff @{u} @{1} @

Reading the output

Symbol	Meaning	Action
`=`	Identical patch	No review needed
`!`	Altered (conflict resolution changed content)	Review these
`<`	Dropped (only in old range)	Verify intentional
`>`	Added (only in new range)	Verify expected

Use --creation-factor=90 for rebases with heavy conflict resolution — the default (60) can misclassify heavily-resolved commits as drop+add instead of altered. The algorithm builds a cost matrix via "diff of diffs" and solves least-cost assignment — higher creation-factor values make it more forgiving of large changes when pairing commits.

Additional useful flags:

--left-only — suppress commits missing from the old range (show only what's new or changed)
--right-only — suppress commits missing from the new range (show only what was dropped or changed)

Note: range-diff ignores merge commits by default. This is correct for rebases (which linearize history) but matters if comparing branches that contain merges.

For a quick count:

git range-diff -s ... | grep -c '='   # identical
git range-diff -s ... | grep -c '!'   # altered (review these)
git range-diff -s ... | grep -c '<'   # dropped
git range-diff -s ... | grep -c '>'   # added

Step 2: Entity-Level Diff (sem MCP or CLI)

Compares named code entities between branches. See references/silent-behaviors.md for limitations (incl. the MCP/CLI field-name difference). Prefer the most capable backend available, in the order below. On a large diverged branch, scope to the suspect files (e.g. file_path on the MCP tool, or a -- <path> pathspec on the CLI) — a whole-branch diff is the 900-entity noise trap.

Tier 1 — sem MCP (preferred when available). Call mcp__sem__sem_diff with base_ref = old-branch and target_ref = new-branch (optional file_path to scope). If the tool is not present in this session or the call errors, do not retry — fall through to Tier 2. It returns JSON shaped:

{ "changes": [ { "change_type": "added|modified|deleted|renamed",
                 "entity_name": "...", "entity_type": "...", "file": "..." } ],
  "total_changes": N }

For risk-ranking a large change set, mcp__sem__sem_impact (file_path + entity_name, mode=all) surfaces the highest-fan-in changed entity to direct attention first (see the Agent Delegation Pattern below).

Tier 2 — sem CLI (fallback). Emit an explicit signal when sem is absent so a silent no-op is never mistaken for "ran clean":

if command -v sem >/dev/null 2>&1; then
  sem diff --format json old-branch..new-branch
  # Alternative syntax if dotdot doesn't work:
  # sem diff --from old-branch --to new-branch --format json
else
  echo "sem CLI not installed — Tier 2 did NOT run; fall through to Tier 3" >&2
fi

As of sem 0.6.0: the CLI emits camelCase (changeType, filePath); the MCP emits snake_case (change_type, file). The summarizer below handles both.

Tier 3 — git (NOT an entity-level check). If neither MCP nor CLI is available, git diff --stat gives a file/line summary — but it cannot see a function dropped from an otherwise-present file. Record Layer 2 as unavailable (not passed) in your final summary and lean harder on Steps 4–5:

git diff --stat old-branch new-branch

Summarize by change type and file. MCP path: you already hold the structured JSON — read the change_type/file counts directly; no shell needed. Apply the same guards the script enforces below: if the result has no changes array, or a record lacks both change_type and file, treat it as schema drift and do NOT report zero. Then apply the zero-result check below. CLI path: pipe through the script, which normalizes both field shapes and fails loud rather than reporting a false zero:

sem diff --format json old-branch..new-branch | python3 -c "
import json, sys, collections
data = json.load(sys.stdin)
changes = data.get('changes')
if not isinstance(changes, list):
    sys.exit('Step 2: no usable changes[] — sem produced no entity diff; do NOT read as zero changes')
ct = lambda c: c.get('change_type') or c.get('changeType')
fp = lambda c: c.get('file') or c.get('filePath')
if any(ct(c) is None or fp(c) is None for c in changes):
    print('  WARNING: records with unrecognized fields — possible sem schema drift; counts may be wrong', file=sys.stderr)
by_type = collections.Counter(ct(c) for c in changes)
by_file = collections.Counter(fp(c) for c in changes)
for t, n in by_type.most_common(): print(f'  {t}: {n}')
print()
for f, n in by_file.most_common(15): print(f'  {n:3d} {f}')
"

Focus on deleted entities (backup has, rebased doesn't) — with base_ref = old-branch and target_ref = new-branch, a backup-only entity surfaces as deleted. These are potential losses; filter out intentionally deleted files before investigating.

A zero/empty result does not clear test files or large files. sem silently yields no entities for describe()/it()/test() blocks (call expressions) and for files over the 32KB tree-sitter cap (see references/silent-behaviors.md). When the change touches test files or files >30KB, treat a sem zero as inconclusive and defer to Step 4 (jscpd + grep) before declaring no losses.

Step 3: `weave preview` (Merge Readiness)

After a successful rebase onto target, this should show 0 conflicts.

weave preview target-branch

If conflicts remain, the rebase didn't fully resolve all divergences. This is normal if target has commits not yet in the rebased branch.

Check availability first:

command -v weave >/dev/null 2>&1 || echo "weave not installed — skipping to Step 4"

Step 4: Duplicate Detection (Token-Level)

This layer catches what sem/weave miss — they cannot parse ~40% of test files.

4a: jscpd for duplicate code blocks

npx --yes jscpd . --ignore "node_modules/**,dist/**,.git/**" --min-lines 5 --min-tokens 50 --max-lines 5000 --reporters console

CRITICAL: --max-lines defaults to 1000 — files over 1000 lines are silently skipped. Always pass --max-lines 5000.

Compare clone count against baseline (if available) to distinguish pre-existing from rebase-introduced duplicates.

4b: Duplicate test name grep

# Adapt the find path to match the project's test directory layout
while IFS= read -r -d '' f; do
  dupes=$(grep -oE "(it|test)\(['\"\`][^'\"\`]*['\"\`]" "$f" | sort | uniq -d)
  [ -n "$dupes" ] && echo "DUPLICATE in $f:" && echo "$dupes"
done < <(find . -path ./node_modules -prune -o \( -name '*.spec.js' -o -name '*.test.js' \) -print0 2>/dev/null)

node:test silently runs both copies of duplicate it() blocks — no warning. Cross-scope duplicates (same name in different describe() blocks) are usually intentional. Same-scope duplicates are bugs.

4c: ast-grep for structural patterns (optional)

If ast-grep is installed (brew install ast-grep), it can complement jscpd with structural pattern matching — finding code that matches a known shape rather than detecting unknown duplicates:

command -v sg >/dev/null 2>&1 || echo "ast-grep not installed — skipping"

# Find all test blocks (useful for manual review of test structure)
sg -p 'it($NAME, $$$BODY)' -l js .

# Find all describe blocks wrapping test blocks
sg -p 'describe($NAME, $$$BODY)' -l js .

ast-grep is a search tool, not a clone detector — it finds known patterns, while jscpd finds unknown duplicates. Use ast-grep when you know what rebase artifact to look for. No silent file-size thresholds (unlike jscpd).

Step 5: Functional Verification

The definitive oracle. Every other layer is advisory.

Run the project's full validation suite — lint, type checking, and tests. Adapt these commands to the project's setup:

# Example (adapt to your project)
npm run check       # or: cargo check, go vet, etc.
npm test            # or: cargo test, go test ./..., etc.

Duplicate test blocks pass both lint and tests — they are invisible to this layer. That's why Layer 4 exists.

Baseline Establishment (Before Rebase)

If you haven't rebased yet, run these first to establish a baseline:

# Save baseline duplicate counts (adapt paths to project layout)
npx --yes jscpd . --ignore "node_modules/**,dist/**,.git/**" --min-lines 5 --min-tokens 50 --max-lines 5000 --reporters console > /tmp/jscpd-baseline.txt

# Save baseline duplicate test names (per-file, same approach as main check)
while IFS= read -r -d '' f; do
  dupes=$(grep -oE "(it|test)\(['\"\`][^'\"\`]*['\"\`]" "$f" | sort | uniq -d)
  [ -n "$dupes" ] && echo "$f: $dupes"
done < <(find . -path ./node_modules -prune -o \( -name '*.spec.js' -o -name '*.test.js' \) -print0 2>/dev/null) > /tmp/test-names-baseline.txt

# Tag all branch tips (for range-diff after rebase)
for b in branch1 branch2 branch3; do
  git tag "pre-rebase/$b" "$b"
done

Without baselines, every finding during verification requires manual investigation to determine if it's pre-existing or rebase-introduced.

Agent Delegation Pattern

This pattern is optimized for Claude Code's Agent tool. On other agents, treat the agent list as a sequential checklist — order doesn't matter beyond Step 0 first, Step 5 last.

For large rebases, delegate verification to parallel agents:

Agent 1: git range-diff -s (commit correspondence)
Agent 2: sem entity-level diff — mcp__sem__sem_diff if available, else sem CLI
Agent 3: weave preview target (merge readiness, if available)
Agent 4: jscpd + grep (duplicate detection)
Agent 5: project validation suite (lint + types + tests)

Launch all 5 immediately after rebase (in parallel where supported, sequentially otherwise). Check range-diff and jscpd results first (arrive in seconds, highest signal). The gap between "rebase complete" and "verification results" is dangerous — kick off verification immediately, decide nothing until results arrive.

For a large change set, have Agent 2 follow its diff with mcp__sem__sem_impact on the modified entities and review the highest-fan-in one first — the entity with the most transitive dependents is where a botched conflict resolution does the most damage.

Classifying Findings

Finding	Classification	Action
`!` commit in range-diff	Conflict resolution	Review the diff-of-diff
`deleted` entity in sem (backup-only, base=old/target=new)	Potential loss	Check if intentionally deleted
Duplicate test block (same describe scope)	Rebase artifact	Delete the stale copy
Duplicate test name (cross describe scope)	Pre-existing pattern	Note for future cleanup
weave conflict	Unresolved divergence	Expected if target moved forward
lint/type error	Rebase artifact	Fix immediately

rebase-validate

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

rebase-validate

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Rebase Validation — Five-Layer Pipeline

When to Use

Quick Start

Step 0: Verify Refs

Step 1: git range-diff -s (Commit-Level)

Reading the output

Step 2: Entity-Level Diff (sem MCP or CLI)

Step 3: weave preview (Merge Readiness)

Step 4: Duplicate Detection (Token-Level)

4a: jscpd for duplicate code blocks

4b: Duplicate test name grep

4c: ast-grep for structural patterns (optional)

Step 5: Functional Verification

Baseline Establishment (Before Rebase)

Agent Delegation Pattern

Classifying Findings

Further Reading

Similar Skills

Rebase Validation — Five-Layer Pipeline

When to Use

Quick Start

Step 0: Verify Refs

Step 1: git range-diff -s (Commit-Level)

Reading the output

Step 2: Entity-Level Diff (sem MCP or CLI)

Step 3: weave preview (Merge Readiness)

Step 4: Duplicate Detection (Token-Level)

4a: jscpd for duplicate code blocks

4b: Duplicate test name grep

4c: ast-grep for structural patterns (optional)

Step 5: Functional Verification

Baseline Establishment (Before Rebase)

Agent Delegation Pattern

Classifying Findings

Further Reading

Similar Skills

Step 1: `git range-diff -s` (Commit-Level)

Step 3: `weave preview` (Merge Readiness)

Step 1: `git range-diff -s` (Commit-Level)

Step 3: `weave preview` (Merge Readiness)