From ai-native-toolkit
Compresses LLM-directed documents (prompts, system messages, skills) by replacing core knowledge with pointers while preserving bespoke details, with optional A/B validation for behavioral equivalence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-native-toolkit:semantic-compressThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Make a document written **for an LLM reader** smaller while preserving what it *does*. The essence of an LLM-directed document is **behavioural**, not textual - the behaviour it induces in the reading model across the tasks it handles. Compression splits content into two kinds and treats each correctly:
Make a document written for an LLM reader smaller while preserving what it does. The essence of an LLM-directed document is behavioural, not textual - the behaviour it induces in the reading model across the tasks it handles. Compression splits content into two kinds and treats each correctly:
Compression is therefore point at core, spell out bespoke. Pointing is not deletion and it is not full explanation; it is the minimum that both activates the right core knowledge and preserves every bespoke detail.
This skill operates in one of two modes, selected deterministically:
| Input | Mode | What happens |
|---|---|---|
| Short snippet with an obvious local swap, no behavioural surface | Local | Quick core->pointer pass, no A/B |
| Whole document / skill / system prompt | Distill | Full A/B-validated loop |
Default to distill when:
Local is permitted only when all hold:
When in doubt, distill: a local edit cannot, by construction, preserve a global behavioural property, so anything with a behavioural surface goes through the A/B gate.
A compression is never accepted on inspection - only on behavioural evidence from an A/B run.
This skill must refuse to output a compressed document that has not passed an A/B equivalence run against the original. Introspection about behaviour ("this should work the same") is structurally unreliable - the model guesses optimistically. Execution over the transfer set is the only arbiter.
This rule binds distill mode (the rule's home: whole-document compression always carries behavioural risk). Local mode is the deliberate, narrow exception - a span small enough (< 500 chars, single obvious swap, no downstream behaviour) that the behavioural risk is negligible by construction. The moment a local edit touches a behavioural surface, it is no longer local: it is a distill, and the gate applies.
The v1 span-level operation: find a span that explains a concept the model already holds, replace it with a pointer, keep every bespoke detail verbatim. These steps are also the inner micro-operation distill mode regenerates with (references/distill-loop.md, Part 2, step 1).
This skill applies only when the LLM is the audience for the explanation. If the text explains a concept to a human (onboarding notes, a message to teammates, docs for new hires), the explanation is not redundant for its real audience - leave it. Compress only the spans the model itself is meant to read and act on.
Nested / wrapped instructions. If the input wraps an instruction the model is meant to process (e.g. "preprocess this instruction before executing it: '...'", or a quoted prompt to compress), the wrapper is a meta-directive to you - act on it, do not emit it. Compress the quoted payload by the rules below and return only that. The payload's audience is the model, so the audience gate is satisfied for the payload regardless of the wrapper.
Read the input. For each span, classify:
A single sentence often contains both. Split at that seam.
Replace a core-knowledge explanation with the smallest cue that activates it - usually the concept's name, optionally one disambiguating word:
Always emit the pointer, with one exception: if a surviving bespoke span already names the concept literally, the pointer is redundant - omit it then, and only then. Never drop the pointer on the grounds that the frame is merely "implied" - that judgement is self-certifying and is exactly the escape that collapses this skill back into deleting what the model knows. When in doubt, keep the pointer; it costs almost nothing.
Keep every bespoke span unchanged: specific facts, decisions, constraints, and especially non-standard twists. If a known term is redefined locally ("optimistic locking - but we hash the whole record, not a version counter"), the twist is bespoke: keep it in full. Dropping it because the term looks familiar is the most dangerous failure this skill can make.
Could the model act correctly on the output alone? If a pointer is too thin to disambiguate, widen it by one cue word - never back to a full explanation. If a bespoke detail was lost, restore it verbatim. The all-core degenerate case (input that is purely known concepts with nothing bespoke and no instruction) compresses to its pointers; if there is also no instruction to act on, the output is just those pointers - not empty text and not a meta-note.
Clean text: pointers for core, explicit bespoke. No meta-commentary, no "[compressed]" markers, no "see X" links - the pointer is the reference, inline and natural.
Input: "When evaluating whether to remove a legacy feature, first understand why it was created - you shouldn't destroy something without understanding its purpose, as it may serve a need you're unaware of. Our legacy auth module was built for EU data-residency compliance that still applies to German users." Output: "Chesterton's Fence on removing the legacy auth module: built for EU data-residency compliance, still applies to German users."
Input: "Remember to follow the Agile Manifesto and the SOLID principles in this project." Output: "Follow Agile and SOLID." Nothing bespoke and no instruction beyond the frame, so it compresses to the bare pointers - not to empty text.
Input: "We use optimistic locking - by which we specifically mean the client sends a hash of the entire record snapshot, and the server rejects on any field mismatch, not just a version-counter bump." Output: "Optimistic locking, but our variant: client sends a hash of the whole record snapshot; server rejects on any field mismatch, not a version-counter bump."
Input: "The staging cluster runs 3 nodes in eu-west-2 with a 4GB heap cap per node; the nightly backup starts at 02:00 UTC to the cold-storage bucket." Output: (unchanged - all bespoke)
Input: "Reminder for new hires: 'idempotent' means you can safely retry an operation. Skim the API onboarding doc before Friday." Output: (unchanged - the definition is for new hires, not the model)
The headline operation: produce the smallest document that behaves the same as the original, with acceptance gated on A/B behavioural evidence (the Hard Rule above), never on inspection. Distill mode composes two reference docs:
references/transfer-set-design.md - derives and confirms the transfer set (the operational definition of the document's essence).references/distill-loop.md - the engine: teacher baseline capture, candidate regeneration, and the iterate-to-minimal controller.The loop composes ab-equivalence (skills/ab-equivalence/references/ab-equivalence.md) for the behavioural test. This skill owns compression (transfer set, candidate regeneration, the loop, the report); ab-equivalence owns the behavioural comparison (running the runner over the transfer set, judging equivalence per case).
references/transfer-set-design.md.references/distillation-report-template.md).Strict no-regression gate. The candidate passes iff, on every transfer-set case, it induces every behaviour the original induced. Any dropped behaviour is a fail (essence lost) and triggers add-back. Incidental improvements are acceptable but never required and never the goal - the goal is faithful, smaller reproduction. The gate is behavioural equivalence, not textual similarity.
Input: "Distill this CLAUDE.md to the smallest version that behaves the same"
Output: The minimal equivalent document plus an A/B distillation report (<document-name>-distillation-report.md) recording the size delta, the transfer set and its coverage, per-case equivalence verdicts, what was dropped vs load-bearing, and the distribution-shift caveat.
A distillation is only valid over the transfer set it was tested against - the same overfitting risk model distillation faces outside its transfer distribution. The transfer set is the operational definition of essence for this run; behaviour outside its coverage is unproven.
If fewer than 70% of the identified behaviour-inducing sections are covered by at least one case, warn the user before proceeding and offer to auto-generate additional cases for the uncovered sections. A distillation gated on a thin set makes a weak equivalence claim - say so loudly rather than proceed silently.
For each section in the original:
if exercised_by_at_least_one_case(section):
if A/B says inert: can drop
if A/B says load-bearing: must keep (pointed or de-duped, never dropped)
else:
section is NOT PROVEN INERT
keep by default (conservative)
report as "kept (uncovered, conservative default)"
The A/B distillation report (references/distillation-report-template.md) makes the coverage legible:
Compression (Local + Distill above) is Transform #1 of this skill's optimizer family: it makes a document smaller while preserving behaviour. Directive-clarity is Transform #2: it makes a document lighter to act on while preserving behaviour, by rewriting instructions that force the reading model to unpack an action before it can act into concrete directives that name the action. Both transforms share one validator - ab-equivalence's harness (skills/ab-equivalence/references/ab-equivalence.md) - and differ only in the gate they apply to its result. The frame that generates this transform, and the caveat that keeps it honest (every claimed gain is a hypothesis the harness must measure, never an assertion), is references/cognitive-ergonomics.md.
--directive-clarity <document> targets a document directly.references/directive-clarity-patterns.md. Detection over-includes on purpose: it optimises for recall, flagging every candidate shape.references/battle-scar-classifier.md. A battle-scar - a prohibition earned from a specific past failure, where the wording is the load-bearing content - is flagged for preservation and never rewritten. The classifier is the precision half: it holds back scars to a sub-10% false-positive target, defaulting to preserve on uncertainty. Only the convert-for-free set proceeds.references/directive-clarity-rewrites.md, under the two acceptance checks (names-the-concrete-action, semantic-equivalence). Keep every qualifier that scopes a prohibition; keep the fact behind a fact-not-action rewrite; never fabricate a referent for a vague pointer - flag it for human confirmation and leave the original in place.skills/ab-equivalence/references/ab-equivalence.md). The harness emits, per case, an equivalence verdict (equivalent / candidate-regressed / candidate-diverged) and an efficiency signal (original_directness, candidate_directness, interpretation_notes).Directive-clarity's acceptance is stricter than compression's. Compression gates on strict no-regression alone (sameness). Directive-clarity gates on no-regression and a measured directness gain:
candidate-regressed (summary.pass == true). A rewrite that reads cleaner but permits a behaviour the original forbade is a regression and fails here.candidate_directness > original_directness on the rewritten cases, with no candidate-regressed. This is the efficiency signal the A/B harness already records on every run (skills/ab-equivalence/references/ab-equivalence.md) - directive-clarity reads that signal and gates on it; it does not redesign or re-instrument the harness. A rewrite that loses nothing but also measures no directness improvement has not earned its place: keep the original.On a regression, the divergence names the rewrite that lost behaviour. Revert that specific rewrite to its pre-rewrite form, keep the passing rewrites, and re-validate - the same granular add-back the distill loop uses.
When both transforms run on one document, they run as Option C: staged with a checkpoint - compress first, freeze its gains, then directive-clarity on top, with any regressing rewrite reverting to the frozen state without losing the compression.
Why staged, not the alternatives:
Run order and checkpoint mechanism (precise):
D_compressed, accepted by compression's strict no-regression gate against the original teacher.D_compressed as the directive-clarity stage's input. Capture a fresh teacher baseline by running the A/B harness's runner over D_compressed once per transfer-set case - these become the cached teacher transcripts for the next stage. (D_compressed is behaviourally equivalent to the original by construction, but its directness differs, so the gain must be measured against D_compressed's directness, not the original's.) The original teacher transcripts from step 1 are not reused here.D_compressed. Detect -> classify -> rewrite (loop steps 1-3) to produce candidate D_direct.D_direct against the D_compressed checkpoint teacher. Apply the directive-clarity gate (no-regression AND directness gain).D_compressed form; passing rewrites stay. Because the teacher is D_compressed, the checkpoint is the floor - no rewrite can drop behaviour below the compressed state. Re-validate and converge.Compress first (not directive-clarity first) because compression can point-away or de-duplicate prose that directive-clarity would otherwise spend a rewrite and an A/B cycle on; compressing first shrinks the surface directive-clarity must scan and avoids rewriting spans that compression removes.
npx claudepluginhub bjcoombs/ai-native-toolkit --plugin ai-native-toolkitGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.