By zandereins
Deterministic skill linter and scoring engine for Claude Code. 7-dimension scoring, anti-gaming detection, autoresearch-style autonomous improvement.
Perform a comprehensive analysis of an existing Claude Code skill. Examines all 7 dimensions (structure, triggers, quality, edges, efficiency, composability, clarity), shows both composite score and binary eval pass rate, and provides specific, actionable improvement suggestions ranked by impact.
Run the autonomous self-driving improvement loop. Scores the skill, generates improvement gradients, applies deterministic patches, re-scores, and keeps or reverts changes automatically. Stops on plateau detection or target reached.
Run a full benchmark of a skill's current quality. Measures all 7 dimensions (structure, triggers, quality, edges, efficiency, composability, clarity) and runs binary eval assertions if an eval suite exists. Records results in JSONL format and supports comparison against previous benchmarks to show progress deltas.
Run a health check on all installed skills. Scans skill directories, scores each skill structurally, and produces a summary table with grades and actionable recommendations. Zero arguments needed.
Run the unified evaluation suite against a skill combining 6-dimension scoring with binary assertions. Produces pass rates and composite quality scores.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Your AI instructions are silently degrading. Schliff catches it.
Deterministic quality scoring for CLAUDE.md, SKILL.md, .cursorrules, AGENTS.md, and system prompts. No LLM, no API key — same input, same score. Python 3.9+, zero core dependencies (optional schliff[evolve] adds litellm for the evolution loop).
pip install schliff
schliff score path/to/SKILL.md
$ schliff score demo/bad-skill/SKILL.md
schliff v7.2.0
structure ███████░░░ 70/100 fair
efficiency ████░░░░░░ 35/100 poor
composability ███░░░░░░░ 30/100 poor
clarity █████████░ 90/100 great
Structural Score ███████████░░░░░░░░░ 53.8/100 [D]
⚠ 4/8 dimensions measured (weight coverage: 40%). Unmeasured: triggers, quality, edges
→ 13 deterministic fixes available. Run `/schliff:auto` in Claude Code to apply.
Tokens: 378 / 1,000 (ok)
@wan-huiyan ran schliff on the 1,331-line SKILL.md for agent-review-panel, a multi-agent code-review skill. Two optimization rounds later: 340 lines, 75% fewer tokens, structure 65 → 100, composability 56 → 91. A/B tested on a 1,132-line document — identical review quality with a quarter of the tokens.
| Skill | Score | Rounds | Author |
|---|---|---|---|
| agent-review-panel | 75 [C] → 85.6 [A] | 2 | @wan-huiyan |
| shieldclaw (OpenClaw) | 68 [C] → 94.6 [A] | 1 | @Zandereins |
| demo bad-skill | 54 [D] → 98.3 [S] | 18 auto | @Zandereins |
Score yours: schliff score path/to/SKILL.md — share what you find
A root CLAUDE.md written for modelcontextprotocol/servers (Anthropic's official MCP reference repo) merged to main on April 17th, 2026. Running schliff on it returned 59.2/100 at 40% weight coverage — a useful measurement of where the file actually needed work and where the scorer was structurally unfair for a project-root document. Full walkthrough →
We scored 120 public instruction files across 60 source repos. Mean grade: D. 59% below C. Adding one companion eval suite lifts the mean +22 points.
eval-suite.json, evals/, or any test artifact. Three dimensions stay unmeasured, locking 45% of the scoreRead the full report → · Reproduce it
| Dimension | Weight | What it catches |
|---|---|---|
| structure | 15% | Missing frontmatter, empty headers, no examples, dead content |
| triggers | 20% | Eval-suite trigger accuracy, false positives, missed activations |
| quality | 20% | Thin assertions, missing feature coverage, low coherence |
| edges | 15% | No edge cases defined, missing categories (invalid, scale, unicode) |
| efficiency | 10% | Hedging, filler words, repetition, low signal-to-noise |
| composability | 10% | Missing scope boundaries, no error behavior, no handoff points |
| clarity | 5% | Contradictions, vague references, ambiguous instructions |
| security | 5% | (opt-in) Hardcoded secrets, unsafe commands, exposed credentials |
Grades: S (≥95) · A (≥85) · B (≥75) · C (≥65) · D (≥50) · E (≥35) · F (<35). Full methodology: docs/SCORING.md
npx claudepluginhub zandereins/schliff --plugin schliffComprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Write feature specs, plan roadmaps, and synthesize user research faster. Keep stakeholders updated and stay ahead of the competitive landscape.