From research-workflow
End-to-end research workflow skill covering the full lifecycle: direction exploration, literature survey, problem formulation, method design, evaluation design, implementation, experiments, paper writing, review, and post-submission. Use when starting any research phase, transitioning between phases, or needing adversarial pre-mortem review. Canonical command: /research-workflow:research <mode>. Triggers on natural-language phrases like 'research workflow', 'adversarial gate', 'pre-mortem review', 'research risk', 'novelty check', or any mention of research phases (explore, foundation, design, eval-design, implement, experiment, write, review, rebuttal).
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-workflow:researchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> End-to-end research lifecycle management with adversarial validation gates.
phases/phase-0-explore.mdphases/phase-1-foundation.mdphases/phase-2-design.mdphases/phase-3-eval-design.mdphases/phase-4-implement.mdphases/phase-5-experiment.mdphases/phase-6-write.mdphases/phase-7-review.mdphases/phase-8-rebuttal.mdphases/phase-9-present.mdphases/phase-evolve.mdphases/phase-pilot.mdphases/phase-supplementary.mdphases/phase-survey-methodology.mdphases/phase-theory-proofs.mdphases/phase-writing-memory.mdreference/capability-artifacts.mdreference/capability-critique.mdreference/capability-evaluation.mdreference/capability-ideation.mdEnd-to-end research lifecycle management with adversarial validation gates. Domain-agnostic. Venue-agnostic. Supports multiple paper types. Project config in
.claude/research-project.local.md; live state in.claude/research-state.yaml; narrative memory in.claude/findings.md. Phase details inphases/; templates intemplates/.
/research-workflow:research init -- answer 5 questions (project name, domain, paper type, venue, deadline)/research-workflow:research explore -- systematic literature scan with gap analysis/research-workflow:research status -- see what's done, what's next, how much time leftThat's it. Everything else (learning mechanisms, agent patterns, evolution) works automatically in the background.
| Term | Meaning |
|---|---|
| Gate | Adversarial review at phase transitions -- finds fatal flaws before they become unfixable |
| G-V-R | Generator-Verifier-Reviser -- a 3-step loop where content is created, checked, then fixed |
| Worker-Critic | Two agents: one creates, one immediately critiques -- ensures quality through separation |
| CitationAgent | A dedicated post-processing agent that verifies all citations after writing |
| Reflexion | Learning from failures: structured reflection -- predicate rules for next time |
| MPR | Meta-Policy Reflexion -- extracting IF/THEN rules from experience with confidence scores |
| STORM | Multi-perspective questioning technique for thorough literature coverage |
| Recipe | A pre-defined combination of mode + sub-features for a specific task type |
| Oracle | The source of truth used to evaluate results (e.g., test suite, static analyzer, human label) |
Before doing ANYTHING, match the user's intent to a recipe. Don't just pick a mode — pick the full recipe with all sub-features.
→ Full detail in reference/dispatch-recipes.md: the intent→recipe Decision Tree, Domain-Aware Activation ([DOMAIN:xxx] tags), the Quick Pattern Selection Guide, Context-Aware Auto-Suggestions, and Outcome-Driven Recipe Optimization.
CRITICAL RULE: If in doubt, run the cheapest decisive check first — a demand/beneficiary probe or kill-gate beats a broad scan; activate more sub-features only after the direction survives its cheapest gates. Thoroughness means not skipping gates, not running everything at once.
The canonical command — what you actually type at the Claude Code prompt:
/research-workflow:research <mode> [args]
This SKILL.md and the phases/ / reference/ files frequently shorten this to /research <mode> for prose readability. The shorthand is for reading only — when typing a command, suggesting one to the user, or printing one as runtime output, always use the canonical /research-workflow:research <mode> form. The bare /research namespace is unowned post-restructure and does not dispatch.
| Mode | Description |
|---|---|
init | Initialize project config, select paper type |
status | Current phase, risks, blockers, next actions |
explore | Phase 0: Literature landscape, gap analysis, direction decision |
foundation | Phase 1: Deep reading, motivation, RQ formulation |
design | Phase 2: Method/framework design with novelty stress test |
pilot | Phase 2.5: Feasibility pilot (5-10 examples before full eval design) |
eval-design | Phase 3: Dataset, metrics, baselines, ablation — HIGHEST-ROI GATE |
implement | Phase 4: Engineering implementation (systems/tool papers) |
experiment | Phase 5: Experiment execution, monitoring, honest analysis |
write [§N] | Phase 6: Paper writing with claim provenance + narrative quality |
review | Phase 7: Multi-persona simulated peer review + submission prep |
rebuttal | Phase 8: Reviewer response, camera-ready, or venue pivot |
gate [N] | Adversarial Gate: pre-mortem for phase N (default: current) |
rollback N | Phase Rollback: structured return to phase N with impact analysis |
risk | Risk Registry: view, add, update, mitigate |
cost | Cost/time estimation with iteration multipliers |
novelty-watch | Literature freshness scan for competitive threats |
present | Phase 9 (post-acceptance addon): Presentation preparation (talk, poster, demo) |
evolve | Periodic self-improvement: analyze usage, promote rules, update protocols |
mine-patterns | Extract writing patterns from past papers for style consistency |
advisor-prep | Prepare advisor meeting: progress summary, open decisions, risk highlights |
This skill gets smarter with every use. Five mechanisms work together to make both triggering and content continuously evolve.
→ Full detail in reference/learning-mechanisms.md: Mechanisms 1-5 (Structured Reflection + Predicate Rules, Outcome Quality Tracking, Dynamic Shell Injection, Three-Tier Memory, Semantic Routing), the Post-Invocation Protocol, Progress Tracking, and the /research evolve cycle.
.claude/research-project.local.md).claude/research-project.local.md (config), research-state.yaml (live state), findings.md (narrative memory).phases/phase-{N}.md for the relevant phasephases/phase-supplementary.md for artifact/ethics/collaboration guidanceThe following modes work WITHOUT init — for ad-hoc use on any paper/project:
gate — Adversarial review of any paper, design, or evaluation planreview — Simulated peer review of any manuscriptexplore — Literature landscape scan for any topicnovelty-watch — Check if a research idea has been donecost — Estimate experiment costs for any setupwrite — Writing protocol for any paper sectionpresent — Presentation preparation for any accepted paperWhen invoked standalone, ask the user for minimal context (paper type, venue, topic) inline instead of requiring full init. Apply Global Rules regardless.
/research init protocol.claude/ exists, then materialize 3 templates into it:
mkdir -p .claude — Bash-tool init MUST NOT assume the directory already exists; brand-new projects do not have a .claude/.${CLAUDE_PLUGIN_ROOT}/skills/research/templates/<file> — plugin-injected env var when available~/.claude/plugins/marketplaces/research-workflow/skills/research/templates/<file> — marketplace symlink layout~/.claude/plugins/cache/research-workflow/research-workflow/<version>/skills/research/templates/<file> — cache fallback (pick highest version if multiple)research-project.template.md → .claude/research-project.local.mdresearch-state.yaml → .claude/research-state.yamlfindings.md → .claude/findings.mdcp "${CLAUDE_SKILL_DIR}/..." — ${CLAUDE_SKILL_DIR} is not reliably injected into the Bash-tool environment (empirically empty), and the resulting cp /templates/... expansion fails before mkdir can help.research-project.local.md answers; research-state.yaml starts at current_phase: 0/research-workflow:research explore to begin."/research status protocolRead .claude/research-state.yaml. Display:
research-project.local.md)milestones:)phases:); per-phase typed artifact + lifecycle status (from artifacts:)Gate synthesis: line in findings.md (DEEPEN / BROADEN / PIVOT / CONCLUDE; a direction, not a gate verdict)accepted for the current phase + findings.md Open Questionsrisks:)last_activity)Opt-in scaffolding — not active by default. The skill ships a research-router CLI; users may optionally wire it into a UserPromptSubmit hook (see README.md § Optional: semantic-router hook) that monitors conversation for research-related context. When that hook is installed and fires, Claude should briefly mention the relevant /research-workflow:research <mode> — e.g.:
"这个场景可以用
/research-workflow:research gate做系统化审查,要试试吗?"
Do NOT force-invoke the skill. One sentence suggestion, then follow the user's lead. Without the hook, this protocol does not fire — Claude only invokes a mode when the user does. The skill remains fully correct in either mode (§14.3: prose must never make a no-hook user depend on a hook they have not installed).
Set during init. Determines which phases are active and how they adapt.
| Type | Phases Active | Key Differences |
|---|---|---|
systems | 0-8 (all) | Build artifact → evaluate → full pipeline |
empirical | 0-3, 5-8 (skip 4) | Study design replaces implementation; data collection replaces coding |
benchmark | 0-3, 5-8 (skip 4) | Evaluate the dataset/benchmark itself; baseline = existing benchmarks |
survey | 0-1, survey-methodology, 6-8 | Systematic review methodology replaces Phases 2-5; see phase-survey-methodology.md |
theory | 0-2, theory-proofs, 6-8 | Formal proofs replace Phases 3-5; see phase-theory-proofs.md |
Phase 9 (
present) is a post-acceptance add-on for all paper types; it is not part of the tracked 0-8 main phase sequence.
Phase subfiles contain [SYSTEMS] [EMPIRICAL] etc. markers for type-specific guidance.
Phases are a guide, not a prison. Real research is iterative. You can:
- Work on adjacent phases simultaneously (e.g., write §3 while finalizing experiments)
- Skip ahead to test an idea, then come back to formalize
- Use any mode at any time -- the phase number is a suggested order, not a gate
The formal Rollback Protocol (
/research rollback N) is only needed for significant cross-gap regressions (e.g., Phase 5 discovers Phase 2 design is fundamentally wrong). For routine back-and-forth between adjacent phases, just do it naturally.
[UNVERIFIED], never guess\cite{} clears the CitationAgent contract (≥2 metadata sources + retraction screen, reference/capability-critique.md); every quantitative claim carries [source: file:line] or [source: Paper, §X, p.Y]GAP_REPORT.md, never written around (reference/capability-artifacts.md)RESTRUCTURE-PLAN.md §15). SKILL.md stays a thin dispatcher; reference/ capability docs grow only by justification; DRY is enforced./research evolve change proposal — and, during the restructure, every migration step — MUST carry a Budget & Structure checklist: the wc -l delta + cumulative trend · each new reference//templates/ file's D/R justification · each change classed relocation (names its paired delete) or net-new gap-fill (names the D/R item) · a DRY / pointer audit./research evolve runs the G8 mechanical hook (phases/phase-evolve.md § Budget & Structure Check): the wc -l trend, the 9-check structural sweep, the SKILL.md dispatcher-shape check. G8 REJECTS a proposal that omits the checklist or fails a mechanical check — never on a positive line-count delta alone./research review and the multi-persona reviewer personas (Phase 7; the Adversarial Gate) simulate peer review — they are a pre-mortem self-review of the authors' OWN work, run by or for that work's authors to surface weaknesses before submission. They carry no venue authority.review is valid ONLY as that manuscript's authors' own self-review aid (you are a co-author, or its authors asked you).phases/phase-7-review.md opens with a scope-&-ethics banner + an operational provenance check; M5 ships this resident rule, M6 adds the fuller public scope-&-ethics doc and completes the §14.8 public-release ethics gate.autonomy_level in research-project.local.md sets how often the skill pauses for
human approval — human-in-the-loop cadence only. It never changes phase order and
never decides whether an Adversarial Gate runs; Global Rules G1–G9 apply at every level.
| Level | Routine pause cadence |
|---|---|
manual | Pause after every mode — the human approves each step forward. |
checkpoint (default) | Autonomous within a mode; pause at every Adversarial Gate and before every Decision Log must-log event (§ Decision Log). |
gate-only | Autonomous within a phase; pause only at Adversarial Gates (phase transitions). |
full-auto | No routine pause between modes — run and report continuously. |
full-auto removes ONLY the routine pause: at every level a Gate CONDITIONAL/BLOCK
verdict still binds (G3 — address the findings, never pause-skip them), and the
mandatory checkpoints + SmartPause below still pause.
Mandatory checkpoints — ALWAYS pause for explicit human approval, every level incl. full-auto:
SmartPause — at any level, pause and surface the uncertainty when the skill cannot cite evidence for a step, faces options with no clear winner, or extrapolates beyond what it verified.
Standalone invocations (no project config) use checkpoint. A shared / public install
defaults to checkpoint; opting up to full-auto is an explicit per-project choice.
/research gate [N])Principle: Finding a flaw one phase earlier costs 10x less to fix.
reference/gate-matrix.md (★ = deep, ○ = scan)Idea/direction-stage rule: personas and gates emit structured critiques, objections, and missing-evidence lists — never numeric novelty/impact scores or accept/reject predictions as a decision basis; comparative ranking among ≥3 surviving alternatives uses pairwise debate.
If the Gate produces CONDITIONAL or BLOCK:
This converts the Gate from a passive "find problems" tool into an active "find AND fix problems" loop. Inspired by 199-biotechnologies deep-research critique loop-back pattern.
Runs for EVERY Gate result — after the step-8 verdict and after any Step-9 loop-back has finished (if loop-back re-ran the Gate, synthesize from the final verdict). This is the skill's outer-loop "step back and synthesize" cadence, folded into the Gate itself, not a separate scheduler or phase:
findings.md — Current Understanding, Patterns & Insights, Open Questions.findings.md Open Questions naming the next-loop direction:
Gate synthesis: <DIRECTION> — <one-line reason> (<phase>, <YYYY-MM-DD>)
/research rollback.In standalone mode (no project findings.md) state the direction inline in the Gate
output instead. The direction is a synthesis call, NOT a gate verdict — the verdict
stays PASS / CONDITIONAL / BLOCK (step 8). /research status surfaces the latest
Gate synthesis: line.
→ The full A–H check-dimension matrix (which dimensions apply per phase, ★ = deep / ○ = scan): reference/gate-matrix.md.
→ Phase 3 cascades catastrophically — ALL 12 mandatory questions must be cleared before Phase 4. The canonical checklist: phases/phase-3-eval-design.md (§ Gate Criteria — MANDATORY 12 QUESTIONS). The cross-cutting evaluation rules it enforces: reference/capability-evaluation.md.
→ The G-V-R within-phase quality loop for high-stakes artifacts, plus the specialized agent patterns (Worker-Critic, CitationAgent, Citation Context Analysis): reference/capability-critique.md.
/research rollback N)→ Full protocol — Impact Analysis → Decision (fix-at-source / patch-downstream / abandon-&-pivot) → Execute (log, re-gate, propagate): reference/rollback.md.
/research risk)The risks: block in .claude/research-state.yaml (schema: reference/capability-artifacts.md).
Rules:
open → mitigated / accepted_risk / resolvedaccepted_risk MUST have paper defense + Threats section text/research novelty-watch)Run monthly + before Phase 7. Search multiple query phrasings. Check arXiv, conf accepted lists. Overlap analysis → differentiation → citation update.
RQ ← motivation [Paper, §X, p.Y]
→ metric → oracle → experiment config → raw results → paper claim [file:line]
Every link = concrete path. No abstract references.
The decisions: block in .claude/research-state.yaml (schema: reference/capability-artifacts.md).
Must-log: direction pivots, venue changes, baseline/dataset/metric changes, scope cuts, budget changes. Feeds → Threats to Validity.
/research cost)→ Bottom-up estimate with iteration multipliers and a debug budget, plus the actual-vs-estimated tracking table: reference/cost-estimation.md.
→ Companion-skill delegation table and optional MCP integrations (Semantic Scholar, GPT-Researcher): reference/skill-integration.md.
→ The specialized agent patterns this skill relies on — Worker-Critic, CitationAgent, Citation Context Analysis, Generator-Verifier-Reviser: reference/capability-critique.md.
Progressive-disclosure detail lives in reference/; SKILL.md stays a thin dispatcher and points to it.
| File | Contents |
|---|---|
reference/dispatch-recipes.md | intent→recipe decision tree · domain activation · pattern guide · auto-suggestions · recipe optimization |
reference/learning-mechanisms.md | learning mechanisms 1-5 · post-invocation protocol · progress tracking · /research evolve |
reference/capability-critique.md | Worker-Critic · CitationAgent · citation-context analysis · Generator-Verifier-Reviser |
reference/capability-artifacts.md | typed artifact contracts · project-state tree · Direction/RQ Card schemas |
reference/capability-evaluation.md | evaluation-integrity rules · oracle/instrument/contamination · immutable evaluator · leakage audit |
reference/capability-ideation.md | direction-ideation — gap taxonomy · brainstorming lenses · Diverge/Converge/Refine · disposition filters |
reference/gate-matrix.md | the A–H gate check-dimension matrix |
reference/rollback.md | full phase-rollback protocol |
reference/cost-estimation.md | cost/time estimation formula |
reference/skill-integration.md | companion-skill delegation + MCP integrations |
| Anti-Pattern | Mechanism |
|---|---|
| Fabricated citations | G1 + /verify-before-write (if installed) — manual citation re-read fallback otherwise (phase-1-foundation.md:21); the verification is mandatory, only the command is optional |
| Late fatal flaws | Gate at every transition |
| Unauthorized edits | G2 Scope Lock |
| Shallow reviews | G4 + multi-persona |
| Optimistic cost | G6 + retry multipliers |
| LLM-evaluates-LLM | B4 + Oracle Matrix |
| Unreliable tools | B3 + Instrument Matrix |
| Unfair baselines | C1-C2 + original code |
| Claim overreach | G7 + B5 boundary check |
| Scope creep | Decision Log + kill conditions |
| Novelty erosion | Novelty Watch |
| Results ≠ RQs | B1 causal chain |
| Irreproducible | E1-E3 + code freeze |
| Post-hoc narrative | Contingency before experiments |
| Bad story | H1 narrative arc check |
| Incomplete related work | H2 audit |
| Weak threats section | H3 + Risk Registry → Threats |
| Over page limit | H4 page budget |
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub dianxiang-sun/research-workflow --plugin research-workflow