Search everything...

Stats

Actions

Available In

schliff

Name: schliff
Author: zandereins

By zandereins

Deterministic skill linter and scoring engine for Claude Code. 7-dimension scoring, anti-gaming detection, autoresearch-style autonomous improvement.

npx claudepluginhub zandereins/schliff --plugin schliff

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands11

schliff:analyze

/analyze

Perform a comprehensive analysis of an existing Claude Code skill. Examines all 7 dimensions (structure, triggers, quality, edges, efficiency, composability, clarity), shows both composite score and binary eval pass rate, and provides specific, actionable improvement suggestions ranked by impact.

schliff:auto

/auto

Run the autonomous self-driving improvement loop. Scores the skill, generates improvement gradients, applies deterministic patches, re-scores, and keeps or reverts changes automatically. Stops on plateau detection or target reached.

schliff:bench

/bench

Run a full benchmark of a skill's current quality. Measures all 7 dimensions (structure, triggers, quality, edges, efficiency, composability, clarity) and runs binary eval assertions if an eval suite exists. Records results in JSONL format and supports comparison against previous benchmarks to show progress deltas.

schliff:doctor

/doctor

Run a health check on all installed skills. Scans skill directories, scores each skill structurally, and produces a summary table with grades and actionable recommendations. Zero arguments needed.

schliff:eval

/eval

Run the unified evaluation suite against a skill combining 6-dimension scoring with binary assertions. Produces pass rates and composite quality scores.

Skills1

schliff

/schliff

Deterministic skill linter and scoring engine for Claude Code — the Ruff for SKILL.md files. 7-dimension structural scoring (structure, triggers, quality, edges, efficiency, composability, clarity) with anti-gaming detection, 60-70% rule-based patches, and cross-session episodic memory. An autoresearch loop that measures first, then fixes — not the other way around. Use for linting, scoring, and autonomously improving any Claude Code skill: trigger accuracy, output quality, edge coverage, token efficiency, composability, or custom metrics. Works with community, custom, project-local, or global skills. Trigger phrases: "make this skill better", "optimize my skill", "iterate on this skill overnight", "improve [metric] from X to Y", "audit skill", "review my skill", "harden skill", "benchmark skill", "lint my skill", "score my skill", or paste SKILL.md for auto-analysis. Also use when user shares skill without explicit instructions. Do NOT use for brand-new skills from scratch — use skill-creator first, then come to Schliff. Do NOT use for SQL query tuning. Do NOT use for prompt template authoring.

Stats

Version6.1.0

ReleasedApr 24, 2026

Stars2

MaintenanceExcellent

LicenseMIT

Last CommitApr 25, 2026

AddedApr 21, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

schliff3

README

Schliff

Your AI instructions are silently degrading. Schliff catches it.

Deterministic quality scoring for CLAUDE.md, SKILL.md, .cursorrules, AGENTS.md, and system prompts. No LLM, no API key — same input, same score. Python 3.9+, zero core dependencies (optional schliff[evolve] adds litellm for the evolution loop).

pip install schliff
schliff score path/to/SKILL.md

$ schliff score demo/bad-skill/SKILL.md
schliff v7.2.0

  structure      ███████░░░   70/100  fair
  efficiency     ████░░░░░░   35/100  poor
  composability  ███░░░░░░░   30/100  poor
  clarity        █████████░   90/100  great

  Structural Score  ███████████░░░░░░░░░  53.8/100  [D]
  ⚠ 4/8 dimensions measured (weight coverage: 40%). Unmeasured: triggers, quality, edges
  → 13 deterministic fixes available. Run `/schliff:auto` in Claude Code to apply.

  Tokens: 378 / 1,000 (ok)

A real optimization

@wan-huiyan ran schliff on the 1,331-line SKILL.md for agent-review-panel, a multi-agent code-review skill. Two optimization rounds later: 340 lines, 75% fewer tokens, structure 65 → 100, composability 56 → 91. A/B tested on a 1,132-line document — identical review quality with a quarter of the tokens.

Skill	Score	Rounds	Author
agent-review-panel	75 [C] → 85.6 [A]	2	@wan-huiyan
shieldclaw (OpenClaw)	68 [C] → 94.6 [A]	1	@Zandereins
demo bad-skill	54 [D] → 98.3 [S]	18 auto	@Zandereins

Score yours: schliff score path/to/SKILL.md — share what you find

Seen in the wild

A root CLAUDE.md written for modelcontextprotocol/servers (Anthropic's official MCP reference repo) merged to main on April 17th, 2026. Running schliff on it returned 59.2/100 at 40% weight coverage — a useful measurement of where the file actually needed work and where the scorer was structurally unfair for a project-root document. Full walkthrough →

What the data says

We scored 120 public instruction files across 60 source repos. Mean grade: D. 59% below C. Adding one companion eval suite lifts the mean +22 points.

Composability is the real weak spot — mean 30.4/100. Files tell agents what to do, rarely where to stop or hand off
No companion eval suite in the corpus — verified 0/60 source repos ship an eval-suite.json, evals/, or any test artifact. Three dimensions stay unmeasured, locking 45% of the score
Hedging dilutes intent — efficiency averages 52.8/100. "You might want to consider" is noise
Format alone doesn't save you — AGENTS.md averages 64.8, SKILL.md 55.4. Skipping frontmatter costs ~15 points regardless of format

Read the full report → · Reproduce it

What Schliff Catches

Dimension	Weight	What it catches
structure	15%	Missing frontmatter, empty headers, no examples, dead content
triggers	20%	Eval-suite trigger accuracy, false positives, missed activations
quality	20%	Thin assertions, missing feature coverage, low coherence
edges	15%	No edge cases defined, missing categories (invalid, scale, unicode)
efficiency	10%	Hedging, filler words, repetition, low signal-to-noise
composability	10%	Missing scope boundaries, no error behavior, no handoff points
clarity	5%	Contradictions, vague references, ambiguous instructions
security	5%	(opt-in) Hardcoded secrets, unsafe commands, exposed credentials

Grades: S (≥95) · A (≥85) · B (≥75) · C (≥65) · D (≥50) · E (≥35) · F (<35). Full methodology: docs/SCORING.md

Quick Start

View full README on GitHub

schliff

Popularity

What's Inside

Confidence

README

Schliff

A real optimization

Seen in the wild

What the data says

What Schliff Catches

Quick Start

Similar Plugins

ui-design

nanobanana

llm-council-plugin

product-management

Schliff

A real optimization

Seen in the wild

What the data says

What Schliff Catches

Quick Start

Popularity

Health & Quality

Similar Plugins

ui-design

nanobanana

llm-council-plugin

product-management