By asragab
Automated skill/prompt/tool evaluation and improvement using session log analysis, signal classification, LLM-as-judge target identification, and autoresearch-style optimization loops
Run an autoresearch-style optimization loop on a target identified by M3. Generates improvements via LLM, evaluates them against the target's eval questions, keeps improvements that beat the best score, and tracks all experiments in a JSONL state file. Inspired by Karpathy's autoresearch pattern.
Classify extracted session events into noise, friction, success, and neutral categories. Identifies friction signals (tool errors, user corrections, retries, long chains, abandoned approaches) and success signals (clean completions, user acknowledgments). Rule-based, no LLM calls required.
Use LLM-as-judge to analyze friction clusters from M2 classified events and identify which skills, prompts, tools, or workflows are the best candidates for optimization. Clusters friction by root cause, scores targets by frequency x severity x improvability, and outputs a ranked target list.
Extract structured events from Claude Code session transcripts into normalized JSONL format. Parses user messages, tool_use blocks, tool results, thinking blocks, errors, progress events, and subagent transcripts.
A Claude Code plugin marketplace containing plugins for session search/analytics and skill evaluation.
| Plugin | Version | Description |
|---|---|---|
| cass | 0.2.0 | Cross-agent session search, context, analytics, export, and learnings powered by CASS CLI |
| skill-eval | 1.0.0 | Automated skill/prompt/tool evaluation and improvement via session log analysis and autoresearch optimization |
claude plugin marketplace add https://github.com/ASRagab/asragab-claude-marketplace
# Install by name (use plugin@marketplace to disambiguate)
claude plugin install cass
claude plugin install skill-eval
Cross-agent session search, context loading, token analytics, session export, and learning synthesis powered by CASS (Coding Agent Session Search). Searches across Claude Code, Codex, Cursor, Gemini CLI, Copilot, and 14+ other agents.
/cass:session-searchSearch across all indexed coding agent sessions. Supports lexical (BM25), semantic (vector), and hybrid search modes.
cass search "authentication flow" --mode hybrid --json --limit 10
cass search "error" --days 30 --json --aggregate agent
/cass:session-contextLoad relevant past session context for the current task, file, or project.
cass sessions --current --json
cass timeline --since 7d --json --group-by day
/cass:session-analyticsAnalyze session history for usage patterns, token consumption, and tool efficiency.
cass analytics tokens --days 7 --group-by day --json
cass analytics tools --limit 20 --json
/cass:session-exportExport sessions to markdown, text, JSON, HTML, or self-contained encrypted HTML.
cass export <session_path> -o conversation.md
cass export-html <session_path> --encrypt --password "secret" --filename report.html
/cass:session-learningsExtract patterns, recurring issues, and actionable lessons from past sessions.
cass search "error fix bug" --mode hybrid --json --limit 20
cass analytics tools --limit 20 --json
/cass:session-maintenanceDiagnose, repair, and maintain CASS installation, index, analytics, and remote sources.
cass health --json
cass doctor --fix
cass index --full --json
A four-stage pipeline for identifying friction in coding agent sessions and iteratively optimizing skills, prompts, and tools. Inspired by Karpathy's autoresearch pattern.
The stages run sequentially — each consumes the output of the previous stage.
/skill-eval:transcript-extract (M1)Extract structured events from Claude Code session transcripts into JSONL.
bun scripts/transcript-extract.ts --since 7d -o events.jsonl
/skill-eval:signal-classify (M2)Classify extracted events into friction, success, noise, and neutral categories using rule-based heuristics. No LLM calls required.
bun scripts/signal-classify.ts -i events.jsonl -o classified.jsonl
bun scripts/signal-classify.ts -i events.jsonl --stats
/skill-eval:target-identify (M3)Use LLM-as-judge to rank friction clusters by frequency, severity, and improvability.
bun scripts/target-identify.ts -i classified.jsonl --top 5 -o targets.jsonl
Requires ANTHROPIC_API_KEY environment variable.
/skill-eval:autoresearch-loop (M4)Iteratively generate and evaluate improvements against a target's eval criteria.
bun scripts/autoresearch-loop.ts -t targets.jsonl --max-rounds 20
Requires ANTHROPIC_API_KEY environment variable.
cd plugins/skill-eval
# 1. Extract events from recent sessions
bun scripts/transcript-extract.ts --since 7d -o events.jsonl
# 2. Classify friction signals
bun scripts/signal-classify.ts -i events.jsonl -o classified.jsonl
# 3. Identify top optimization targets
bun scripts/target-identify.ts -i classified.jsonl --top 5 -o targets.jsonl
# 4. Run autoresearch loop on the top target
bun scripts/autoresearch-loop.ts -t targets.jsonl --max-rounds 10
claude plugin uninstall cass
claude plugin uninstall skill-eval
# To remove the marketplace itself
claude plugin marketplace remove asragab-claude-marketplace
MIT
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Optimize any text artifact using gepa — prompts, code, configs, skills
Cross-agent session search, context loading, token analytics, session export, remote sources, and learning synthesis powered by CASS (Coding Agent Session Search)
npx claudepluginhub asragab/asragab-claude-marketplace --plugin skill-evalax - the agent experience layer. Local graph of every Claude Code + Codex session, skill invocation, edit, and commit. Surfaces what to use, what to ground on, and which experiments to package next.
Generate an explorable HTML report of Claude Code session usage — tokens, cache efficiency, subagents, skills, and the most expensive prompts — from local ~/.claude/projects transcripts.
Meta-Cognition tool for Claude Code: session history analysis, workflow optimization, and 21 MCP tools for deep session insights.
Analyze Claude Code agent session transcripts to identify inefficiencies, anti-patterns, repeated mistakes, missing tooling opportunities, and user frustration signals for continuous improvement
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation