From medsci-project
Generates structured peer review drafts for medical journals with journal-specific formatting and systematic manuscript analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/medsci-project:peer-reviewinheritThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are assisting a medical researcher in writing peer reviews for scientific journals. The reviews
references/aczel_2021_reviewer2_patterns.mdreferences/domain-probes/ai_overclaiming.mdreferences/domain-probes/narrative_review.mdreferences/domain-probes/observational_confounding.mdreferences/domain-probes/radiomics.mdreferences/domain-probes/rct_trial.mdreferences/domain-probes/sr_ma.mdreferences/domain-probes/survival_prognostic.mdreferences/exemplar_reviews/README.mdreferences/exemplar_reviews/ai_overclaiming.mdreferences/exemplar_reviews/calibration_missing.mdreferences/exemplar_reviews/data_leakage.mdreferences/exemplar_reviews/optimistic_validation_reporting.mdreferences/exemplar_reviews/reference_standard_validity.mdreferences/exemplar_reviews/selective_outcome_reporting.mdreferences/narrative_review_audit.mdreferences/reviewer_calibration/README.mdreferences/reviewer_calibration/compliance_floor.mdreferences/reviewer_profiles/AJR.mdreferences/reviewer_profiles/EURE.mdYou are assisting a medical researcher in writing peer reviews for scientific journals. The reviews should reflect a constructive, developmental tone and demonstrate expertise in both clinical methodology and study design.
/write-paper/self-review{working_dir}/review/{manuscript_id}/./analyze-stats "Effect-Size
Real-World Translation") and compare it to a known minimal clinically important difference. Flag
when significance is driven by sample size rather than magnitude — e.g., a small correlation
clearing FDR at large n, or a continuous test significant where the source's categorical
comparison was not./check-reporting is available, delegate. Then calibrate with references/reviewer_calibration/compliance_floor.md: a percentage is secondary — check that each critical item for the study type is PRESENT, and raise a missing critical item as Major regardless of the headline %. Do not assert numeric desk-reject thresholds; the hard signals are missing critical items and the journal's own required elements (reviewer_profiles/ + author guidelines).Before finalizing Major Revision (or, for AJR-style forms, a Reconsider tier) for an original AI, LLM, or methodology paper — or for a Review / narrative / primer article — explicitly run this calibration gate. It prevents a valid issue list from under-weighting contribution and priority.
If 2 and 3 both hold, do not default to Major Revision simply because the review is constructive. In the confidential comments, state that the manuscript has a priority/contribution problem in addition to the fixable technical issues, and calibrate the recommendation toward the journal's stronger option (for example, reject/resubmission where that tier exists). If only 1 holds and the value/novelty case is strong, Major Revision remains appropriate.
Fixable vs unfixable tier-domination: separate defects that a revision can repair (extraction errors, missing supplementary, a mislabeled table, an over-claiming sentence) from defects that cannot be repaired within the current submission (poolability of incommensurable studies, a broken construct, an invalid evaluation instrument). When both classes are present, the unfixable class governs the recommendation — do not let a long list of fixable items reframe an unfixable core as "addressable in revision."
Review/narrative/primer escalation (the contribution IS the product): for a review article there is no data to re-analyze; the distinct contribution — novelty, integrative synthesis, domain-specificity — is the deliverable itself. Therefore weak novelty / no distinct contribution / not domain-specific is unfixable-in-current-form: "add a distinct contribution" asks for a substantially different paper, so each gap looking individually "addressable in revision" is a trap. When RV1 (novelty) is a Major in a saturated space and no distinct contribution exists, escalate the recommendation one tier toward Reject (e.g., Reconsider → Reject) rather than defaulting to the revision tier.
Confidential-note Reject-grade self-grep: before committing the recommendation, re-read your own Confidential Comments to the Editor. If they contain Reject-grade language — "hard to distinguish from work it already cites," "cannot be resolved by minor editing," or deferring the value/priority judgment to the editorial board ("whether the incremental value clears the bar is a scope judgment I leave to the board") — that deferral is itself a Reject-grade tell, not a neutral hand-off. Re-examine plain Reject so the confidential note and the recommendation are consistent.
Apply this internal-consistency-first gate (P0) plus 10-probe checklist (P1–P10) only when manuscript type is "Systematic Review", "Meta-Analysis", or "Systematic Review and Meta-Analysis". These probes complement (do not replace) the generic Phase 2 issue checklist.
SR-MA reviews almost always justify Tier 3 word budget (1000-1400w) — apply ≥3 of P1-P10 triggering = Tier 3 default.
Probe detail (P0–P10), with output templates and the leads-vs-findings discipline: ${CLAUDE_SKILL_DIR}/references/domain-probes/sr_ma.md. Load it and apply each probe when the trigger above fires. In this skill, map each probe finding to the review draft as a Major / Minor comment; route conclusion-threatening or integrity findings into the Confidential Comments to the Editor, and place a confirmed error that drives a headline claim as the Major #1 candidate.
Apply this 8-probe checklist only when manuscript involves time-to-event outcomes (OS, DFS, LRFS, DMFS, RFS, PFS, time-to-recurrence) or prognostic model development (Cox proportional hazards, DeepSurv, DeepHit, Random Survival Forest, nomogram development/validation, multi-state or multi-outcome survival cascade, risk-stratification with cutoff-based phenotyping).
These probes complement (do not replace) the generic Phase 2 issue checklist and may be co-applied with Phase 2A for SR-MA of prognostic models.
Exempt:
Probe detail (S1–S8), with output templates: ${CLAUDE_SKILL_DIR}/references/domain-probes/survival_prognostic.md. Load it and apply each probe when the trigger above fires. In this skill, map each probe finding to the review draft as a Major / Minor comment; route a conditioning/causal-framing, competing-risks, or estimand-provenance (S8) design flaw into the Confidential Comments to the Editor and place it as the Major #1 candidate.
Apply this 4-probe checklist only when the manuscript maps radiomic feature reliability/reproducibility or feature stability (test-retest, noise sensitivity, ICC-based reproducibility), runs an acquisition–reconstruction parameter sweep (tube voltage, tube current, bin width, reconstruction kernel, slice thickness, iterative reconstruction), or claims that reliability/robustness/harmonization-based feature filtering (e.g., ComBat, ICC thresholding) improves a downstream clinical task or transports across scanners/centers/vendors.
These probes complement (do not replace) the generic Phase 2 issue checklist. Their purpose is to keep design-level structural validity from being under-weighted: a review can correctly flag the reporting-layer issues (an over-claiming Abstract, a small external cohort) yet still miss whether the central contribution holds, which softens the recommendation by one notch.
Exempt:
Probe detail (R1–R4), with output templates: ${CLAUDE_SKILL_DIR}/references/domain-probes/radiomics.md. Load it and apply each probe when the trigger above fires. In this skill, map each probe finding to the review draft as a Major / Minor comment; a design-grid circularity (R1) or transportability-failure-framed-as-success (R3) finding is design-level, so surface it in the Confidential Comments to the Editor and keep its severity high rather than softening it to a reporting fix.
Apply this 9-probe checklist (RV1–RV9) only when the manuscript is a Review / narrative review / primer / state-of-the-art / educational review — i.e., a non-systematic synthesis rather than original research. Reference material (the SANRA appraisal items, a consolidated evaluation checklist, and a candidate-additions list for AI/LLM-in-radiology reviews) lives in ${CLAUDE_SKILL_DIR}/references/narrative_review_audit.md.
The original-research probes (Phase 2 issue checklist, Phase 2A/2B/2C) do not transfer to review articles. The key inversion: for original research, reviewers are discouraged from scope-expanding requests, but for narrative reviews, identifying thematic gaps and proportionately suggesting missing content is an expected part of the reviewer's role — error-spotting alone is necessary but not sufficient. Keep SANRA in its lane: it is a 6-item critical appraisal tool, not a reporting guideline, so do not over-enforce it (only RV3 is SANRA-aligned, and as a suggestion; do not demand PRISMA — narrative ≠ systematic).
Exempt:
Probe detail (RV1–RV9), with the verify-your-own-criticism gate and output templates: ${CLAUDE_SKILL_DIR}/references/domain-probes/narrative_review.md. Load it and apply each probe when the trigger above fires; the SANRA appraisal items and candidate-additions catalog in ${CLAUDE_SKILL_DIR}/references/narrative_review_audit.md remain peer-review-specific supporting material. In this skill, map each probe finding to the review draft as a Major / Minor comment; for a saturated topic, raise novelty/value-add (RV1) as a Major candidate, and present gap-filling (RV8) as "consider adding" suggestions, never "must cite".
Apply this 6-probe checklist (O1–O6) only when the manuscript is an observational study (cohort, case-control, cross-sectional, health-screening / registry) whose central claim is an adjusted exposure–outcome association estimated by covariate adjustment rather than randomization. These probes complement (do not replace) the generic Phase 2 issue checklist and the STROBE reporting items; they target the gap between the stated adjustment set and what the exposure-stratified Table 1 shows.
Exempt:
Probe detail (O1–O6), with output templates: ${CLAUDE_SKILL_DIR}/references/domain-probes/observational_confounding.md. Load it and apply each probe when the trigger above fires. O1 (a measured covariate that is imbalanced by exposure in Table 1 yet absent from the adjustment set) is data-checkable and the highest-yield probe — verify it against the manuscript's own Table 1. In this skill, map each probe finding to the review draft as a Major / Minor comment; a confounding-completeness gap (O1), a selection/collider structure that could generate the association (O3), or an undisclosed complete-case collapse (O5) is design-level, so surface it in the Confidential Comments to the Editor and place it as the Major #1 candidate rather than softening it to a reporting fix.
Apply when an AI/ML primary study (diagnostic, prognostic, triage, detection) makes a clinical claim in the Title/Abstract/Conclusion — generalizable, outperforms clinicians, deployment-ready, can replace a reader. Complements Phase 2F (recommendation calibration) and the signature "Overclaiming vs evidence level" check; co-applies with Phase 2C for radiomics-AI and Phase 2B for prognostic-AI.
Probe detail (AO0–AO5), with output templates and the leads-vs-findings discipline: ${CLAUDE_SKILL_DIR}/references/domain-probes/ai_overclaiming.md. Load it and apply each probe when the trigger fires. Run AO0 first — locate the load-bearing claim and read it together with its cited evidence before alleging over-reach (a hedged Discussion qualifier is not a headline). In this skill, map each probe finding to the review draft as a Major / Minor comment; a headline generalizability (AO1), superiority/replacement (AO2/AO3), or deployment-readiness (AO4) claim that outruns the design is framing-level — surface it in the Confidential Comments to the Editor and place it as the Major #1 candidate when it is the paper's headline. AO5 catches over-reach in the reported metric itself (best-fold headline without cross-fold CI/SD, unstated/test-tuned operating point, rebalanced-accuracy, or a code-vs-claims mismatch); pair it with the exemplar_reviews/optimistic_validation_reporting.md phrasing model and raise it as Major when it carries the headline.
Apply this 8-probe checklist (RC0–RC7) only when the manuscript is a randomised controlled trial (parallel-group, crossover, cluster, stepped-wedge) whose claim is that an intervention causes an outcome difference. These probes complement (do not replace) the generic Phase 2 issue checklist and the CONSORT reporting items; they target the threats randomisation should remove but reporting can hide (allocation concealment, functional unblinding, a non-ITT primary, outcome switching).
Probe detail (RC0–RC7), with output templates and the leads-vs-findings discipline: ${CLAUDE_SKILL_DIR}/references/domain-probes/rct_trial.md. Load it and apply each probe when the trigger fires. Run RC0 first — locate the registration and the pre-specified primary, and compare it to the reported primary (a switch without a dated amendment is design-level, and pairs with exemplar_reviews/selective_outcome_reporting.md). In this skill, map each probe finding to the review draft as a Major / Minor comment; a broken-randomisation primary (RC3, per-protocol/completers), unconcealed allocation (RC1), or an open-label trial with a subjective outcome (RC2, functional unblinding) is design-level — surface it in the Confidential Comments to the Editor and place it as the Major #1 candidate. A reported baseline significance test (RC5) is MINOR.
Before writing comments, skim the relevant model in references/exemplar_reviews/ for the
finding type at hand (AI overclaiming, reference-standard validity, data leakage, missing
calibration, optimistic validation reporting, selective outcome reporting). Each shows the same four moves — anchor the location, state the gap, phrase
it as a partner (Aczel-compliant), and calibrate severity (design-level → Major #1). Model
the anchoring and phrasing; do not copy — they are synthetic teaching examples.
Generate {manuscript_id}_review_draft.md:
# {manuscript_id} — Review Draft
**Manuscript**: {title}
**Journal**: {journal}
**Type**: {Original Research | Review | Technical Note | ...}
**Recommendation**: {Major Revision | Minor Revision}
---
## {Journal-specific scores section, if applicable}
---
## CONFIDENTIAL COMMENTS TO THE EDITOR
{100-150 words: summary + strengths + key concerns + fatal flaw hierarchy if applicable + recommendation}
**Clinical Impact**: {High/Moderate/Low} — {1 sentence on implications}
---
## COMMENTS TO THE AUTHORS
**Research Summary & General Comments**
{2-3 sentences summarizing objective, design, key finding (in your own words)}
Major strengths:
1. {Specific strength}
2. {Specific strength}
3. {Specific strength (optional)}
{Scope + feasibility: 1-2 sentences — "I have suggestions focused on [areas]. Achievable within existing data."}
(80-150 words total)
**Major Comments**
1) **{Issue title}**
{Problem 1-2 sentences. Location cited.}
Suggested revisions:
- {Fix 1}
- {Fix 2}
2) **{Issue title}**
...
**Minor Comments**
1) {One sentence, location cited.}
2) ...
**Closing Remark**
{2-3 sentences, constructive.}
Length targets (3-tier, data-grounded):
Reference baseline (from peer-comment empirical analysis, n=21 reviewer blocks across 13 decision letters): median ≈ 545 words, central 50% range 366-856w, 90th percentile ≈ 870w, only 5% exceed 1000w. Most peer reviewers cluster below 900w.
awk + wc (no estimation) — at Phase 3 mid-checkpoint and Phase 6 final.your_wc / 545 and report. Ratio > 2.0 (above 1090w) flags trim candidate. Ratio < 1.0 may indicate insufficient design-level rigor for AI/methodology critique reviews.After drafting, verify mechanically:
awk + wc for exact measurement (no estimation). Identify which tier the Author section falls in (Tier 1 ≤700w / Tier 2 700-1000w ★ default / Tier 3 1000-1400w). Most reviews should land in Tier 2. If Tier 3, justify with a one-line rationale (which design-level concern warrants the extra length) and verify Tier 3 frequency stays ≤20% rolling. Hard cap 1400w. Also measure at Phase 3 mid-checkpoint, not only at final. Report reference-baseline ratio (wc / 545w) — ratio > 2.0 flags trim candidate.references/aczel_2021_reviewer2_patterns.md):
Fix all issues found, then present to user.
{manuscript_id}_review_final.md — the polished version.{manuscript_id}_submission.md — formatted for copy-paste into editorial system:
wc / 545w) reported; ratio > 2.0 trimmedreferences/aczel_2021_reviewer2_patterns.md): avoid attitude markers ("reject," "absurd," "oblivious"), boosters, personal attacks on authors, vague dismissals, and typo nitpicking; prefer first-person rapport ("I appreciate," "I stumbled over"), hedged suggestions ("I'd suggest," "could," "would help"), and critique aimed at the work rather than the people. Apply throughout drafting, not just QC.Recurring high-yield checks — apply to every manuscript:
For survival / prognostic-model manuscripts, also apply the Phase 2B 8-probe audit (conditioning, censoring, competing risks, cutoff optimism, comparator horizon alignment, C-index variant transparency, calibration beyond discrimination, estimand provenance).
For radiomic feature-reproducibility / phantom parameter-sweep / reliability-filtering manuscripts, also apply the Phase 2C 4-probe audit (design-grid circularity, construct validity / proxy-target gap, transportability framing with Reject-escalate calibration, multiplicity).
For Review / narrative / primer / state-of-the-art manuscripts, apply the Phase 2D 9-probe audit (novelty/value-add, scope/aims, evidence-gathering transparency, technical/medical accuracy, taxonomy/synthesis coherence, balance/currency/citation accuracy, load-bearing figures/tables, constructive gap-filling, curated-base circularity) in place of the original-research probes — error-spotting plus proportionate gap-filling, with SANRA used as an appraisal aid only.
For observational studies whose central claim is an adjusted exposure–outcome association, also apply the Phase 2E 6-probe audit (confounding completeness, adjustment-set provenance, selection/collider bias, exposure measurement validity, missing-data / complete-case collapse, residual-confounding E-value), with O1 — a measured covariate imbalanced by exposure in Table 1 yet absent from the adjustment set — checked against the manuscript's own Table 1.
Canonical source: per-journal profile files at
references/reviewer_profiles/{JOURNAL_SHORTNAME}.md
In Phase 1 (Setup), after identifying the journal, read the matching profile and render its scorecard template at the top of the draft in Phase 3, above Confidential Comments to the Editor. This avoids duplicating journal form fields across multiple skills.
Current profiles:
| Short | Journal | System | Scorecard |
|---|---|---|---|
| KJR | Korean Journal of Radiology | ScholarOne | 8 items, Excellent→Poor |
| RYAI | Radiology: Artificial Intelligence | ScholarOne | 5 items, 1–9 |
| INSI | Insights into Imaging | Editorial Manager | 4 items, H/M/L |
| AJR | American Journal of Roentgenology | Editorial Manager | Section-by-section |
| EURE | European Radiology | Editorial Manager | INSI-style base |
If a journal has no profile yet, use the generic format from Phase 3 and ask the user for the invitation form's scorecard fields so a new profile can be added under reviewer_profiles/.
| Artifact | Filename | Format |
|---|---|---|
| Review draft | {manuscript_id}_review_draft.md | Markdown |
| Final review | {manuscript_id}_review_final.md | Markdown |
| Submission text | {manuscript_id}_submission.md | Plain text |
| Need | Skill | When |
|---|---|---|
| Reporting compliance | /check-reporting | Phase 2 — guideline check |
| AI pattern detection | /humanize | If reviewing for AI writing patterns |
/write-paper/self-review/search-lit if citations are needed for reviewer comments.[CHECK] rather than asserting compliance.npx claudepluginhub aperivue/medsci-skills --plugin medsci-projectConducts structured 7-stage peer reviews of scientific manuscripts and grants, evaluating initial assessment, sections, statistics, reproducibility, figures, ethics, and writing per CONSORT/STROBE/PRISMA.
Audits medical manuscripts against 32 reporting guidelines (STROBE, CONSORT, PRISMA, TRIPOD+AI, etc.) and generates item-by-item compliance reports with PRESENT/MISSING/PARTIAL status.
Writes structured, checklist-based peer reviews for manuscripts and grants, assessing methodology, statistics, and reporting standards (CONSORT/STROBE).