By 0x0f0f0f
Research Engineering Skills for Agents: a curated set of six skills for doing high-quality engineering on research-grade codebases.
Before non-trivial changes, consult the project's references/ directory — vendored repos and paper notes — to ground the work in primary sources instead of guessing. Trigger when about to design, implement, or modify code that involves an algorithm, formula, or API that's covered by a paper or reference repo in references/; when the user mentions a concept that matches a reference (e.g. "DSR", "CPCV", "GEPA", "Pareto selection"); or when the user explicitly says "check references", "consult the literature", "look up X in the repos". Also use to bootstrap references/ structure in a new project.
Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.
Keep per-directory CLAUDE.md (or AGENTS.md) files accurate and in sync with the code, and keep a Progress log in the root CLAUDE.md. Trigger after finishing a task that created, deleted, moved, or non-trivially changed files; when the user says "update CLAUDE.md", "log progress", "sync memory", "wrap up"; or at the end of a coding session before the user moves on. Also use to bootstrap CLAUDE.md files in a project that doesn't have them yet.
Set up a Python project so files are simultaneously marimo interactive notebooks and pytest test modules — assertions are the source of truth, cells render the results. Use when the user wants notebooks that double as tests, asks to wire up marimo with pytest, or wants executable documentation/tutorials that stay green in CI.
Manage per-PR planning notes, progress logs, and findings in the project's `plans/` directory. Use this skill whenever the user mentions a PR's plan, progress, status, or findings — including phrases like "log progress", "what did we do today", "start a plan for X", "record this finding", "wrap up the PR", "mark the PR done", "ready to merge", or any time work on a PR is starting, advancing, or completing. Also use proactively at the end of a meaningful work session on a PR, even if the user hasn't explicitly asked to log anything, and whenever a non-obvious design decision is made that future contributors would want to know about.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A curated set of six Agent Skills for doing high-quality engineering on research-grade codebases — the kind of work where correctness is grounded in primary sources, architecture is kept deep and navigable, and project knowledge is written down so it survives across sessions.
Each skill is a self-contained SKILL.md (plus supporting files) usable by any
agent that understands the Skills convention — Claude Code, Codex, Cursor, and others.
| Skill | What it does |
|---|---|
🏛️ improve-codebase-architecture | Surface architectural friction and propose deepening refactors (shallow → deep modules), presented as a visual before/after HTML report, then a grilling loop to design the chosen refactor. |
📚 consult-references | Before non-trivial changes, read the actual paper or vendored repo in references/ instead of guessing. Grounds algorithms and formulas in primary sources. |
📥 read-arxiv-paper | Ingest an arXiv paper into references/papers/<slug>/ using the highest-fidelity format available (HTML → ar5iv → LaTeX → PDF) and write a structured NOTES.md. |
🧠 maintain-memory-md | Keep per-directory CLAUDE.md/AGENTS.md files honest and in sync with the code, plus a root Progress log. Bootstraps them in projects that have none. |
📋 pr-plan-tracking | Maintain lightweight per-PR plans, progress logs, and findings under plans/. Start a plan, log progress, record a finding, complete a PR. |
🧪 marimo-notebook-tests | Wire up a Python project so files are simultaneously marimo interactive notebooks and pytest test modules — executable docs that stay green in CI. |
The six skills aren't a grab-bag; they chain into one workflow. A typical run from "here's a paper" to "the refactor is merged and remembered" looks like this:
📥 Ingest the papers. Point read-arxiv-paper
at an arXiv ID or URL. It fetches the highest-fidelity format available
(arXiv HTML → ar5iv → LaTeX → PDF), unpacks it into references/papers/<slug>/,
and writes a structured NOTES.md you'll actually re-read — equations copied
verbatim, not paraphrased.
📚 Read the papers — and the implementation. Before any non-trivial change,
consult-references opens the relevant NOTES.md
(and the .tex when an equation has to be exact) plus the vendored repos in
references/repos/. Algorithms get grounded in the source, not in memory.
💡 Get the insights — and formalize the problem. With the literature loaded,
improve-codebase-architecture walks the
code for shallow modules and friction, naming things in the project's own domain
vocabulary (CONTEXT.md) and a precise architecture glossary
(deep / shallow / seam / leverage). The deletion test separates pass-throughs
from modules that earn their keep.
📋 Propose the improvements as a plan. The architecture skill presents candidate
deepening refactors as a visual before/after HTML report; pick one, and
pr-plan-tracking turns it into a per-PR plan under
plans/ — tasks, open questions, and a progress log.
🚀 Execute the plan. Implement the refactor, citing the paper slug and the
equation right above the code. marimo-notebook-tests
makes the change verifiable: files that are simultaneously marimo notebooks and
pytest modules, so the docs stay green in CI.
🧠 Record what you learned. maintain-memory-md
keeps per-directory CLAUDE.md/AGENTS.md honest and appends a one-line Progress
entry; pr-plan-tracking logs the load-bearing findings. The next session — human or
agent — gets back up to speed in 20 seconds.
A concrete run: ingest the Deflated Sharpe Ratio paper → consult it before touching walk-forward scoring → notice the scoring module is shallow (a thin wrapper hiding the real bug in how it's called) → plan a deepening refactor → implement it with the DSR equation quoted verbatim above the function → log the finding so nobody re-derives it three weeks later.
Then the loop closes and starts again: read the literature → ground the code in it → keep the architecture deep → record what you learned → track it on the PR.
npx claudepluginhub 0x0f0f0f/research-engineering-skills --plugin research-engineering-skillsComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Create new skills, improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, or benchmark skill performance with variance analysis.
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development