By shenmali
Agentic RAG evaluation skill pack and demo workbench for industrial maintenance documentation QA
Agentic RAG evaluation workbench and skill pack for industrial maintenance documentation QA.
This repository contains two connected pieces:
RAG Evidence Studio: a runnable React + Express workbench.RetAgGen PowerUP: a Karpathy-style skill/plugin layer for agentic RAG evaluation workflows.The project demonstrates a production-style RAG workflow using a synthetic PLC maintenance corpus. Instead of only returning an answer, the workbench shows retrieved evidence, citation quality, faithfulness metrics, and the agent trace behind each run.
Naive RAG often fails quietly: it retrieves similar but wrong chunks, answers with confidence, and gives no diagnostic signal. RAG Evidence Studio makes the retrieval and evaluation path visible.
.claude-plugin, .cursor/rules, skills/retaggen-powerup, CLAUDE.md, CURSOR.md, and EXAMPLES.md..claude-plugin/ Claude Code plugin metadata
.cursor/rules/ Cursor project rule
skills/retaggen-powerup/ Reusable agent skill
data/corpus/ Synthetic industrial maintenance corpus
src/core/ Retrieval, chunking, evaluation, and agent pipeline
src/server/ Local Express API and JSONL run store
src/ui/ React workbench components
tests/ Deterministic unit and integration tests
CLAUDE.md Agent behavior guide for this repo
CURSOR.md Cursor setup notes
EXAMPLES.md Example agent tasks and success criteria
The bundled corpus is synthetic and contains no proprietary vendor manual content.
flowchart LR
Corpus[Synthetic PLC Corpus] --> Chunker[Chunker]
Chunker --> BM25[BM25]
Chunker --> Dense[Dense Vector Index]
BM25 --> RRF[RRF Fusion]
Dense --> RRF
RRF --> Rerank[Reranker]
Rerank --> Judge[Retrieval Judge]
Judge --> Generator[Answer Generator]
Generator --> Eval[Answer Judge + Metrics]
Eval --> UI[Evaluation Workbench]
Install dependencies:
npm install
Start the API:
npm run dev:api
Start the frontend:
npm run dev
Open http://localhost:5173.
If another app already uses port 5173, run Vite on a different port:
npm run dev -- --port 5174
npm run verify
This runs the Vitest suite and production build.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub shenmali/retaggen-powerup --plugin retaggen-powerupComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Create new skills, improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, or benchmark skill performance with variance analysis.