From medsci-project
Audits canonical manuscript vs. submission drift, builds or freezes journal submission packages. Useful before submitting or after journal portal edits.
How this skill is triggered — by the user, by Claude, or both
Slash command
/medsci-project:sync-submissioninheritThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You help keep the canonical manuscript and journal-specific submission packages
scripts/author_registry_example.yamlscripts/blind_sweep.pyscripts/check_asset_anonymization.pyscripts/check_cross_artifact_stale.pyscripts/cover_letter_drift_check.pyscripts/cross_document_n_check.pyscripts/detect_copy_divergence.pyscripts/preflight_gate.pyscripts/scope_drift_check.pyscripts/sync_submission.pyskill.ymltests/fixtures/copy_ok.mdtests/fixtures/copy_stale.mdtests/fixtures/ssot.mdtests/test_asset_anonymization.shtests/test_copy_divergence.shtests/test_cross_artifact_stale.shtests/test_cross_document_n.shtests/test_preflight_gate.shtests/test_scope_drift.shYou help keep the canonical manuscript and journal-specific submission packages
from drifting apart. The skill treats submission/{journal}/ as derived output
and records whether it is current, stale, or frozen.
/orchestrate --e2e marks a project as submission-ready.project.yaml, or a direct canonical manuscript path.chest, ryai, academic_radiology.audit: compare existing submission against canonical source.build: copy canonical source into submission/{journal}/manuscript/ and write metadata.freeze: mark a package as submitted/frozen.python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" audit --project-root . --journal chest
python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" build --project-root . --journal chest
python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" freeze --project-root . --journal chest --status submitted
For double-blind journals, sweep author identifiers across all upload artifacts:
python "${CLAUDE_SKILL_DIR}/scripts/blind_sweep.py" \
--registry _shared/authors/author_registry.yaml \
--files submission/{journal}/supplementary/*.md submission/{journal}/cover_letter.md \
--backup-dir .cache/blind_sweep_backup
The registry is a project-local YAML mapping author identifiers (full names, native scripts, initials with/without periods, email, ORCID) to role labels (e.g., "Reviewer 1"). See scripts/author_registry_example.yaml for schema. Never commit a populated registry to a public repository — keep it next to the manuscript.
| Artifact | Path | Purpose |
|---|---|---|
| Submission metadata | submission/{journal}/.journal_meta.json | Source hash, status, canonical path |
| Sync audit | qc/submission_sync_{journal}.json | Drift result consumed by orchestrator |
| Manifest update | artifact_manifest.json | Submission package registry |
| Pre-flight gate | qc/preflight_gate_report.json | Aggregated halt-on-failure manifest (see "Pre-flight gate" below) |
Run this once, right before freeze/submission. It orchestrates the existing
deterministic checks and the /verify-refs audit into one halt-on-failure gate,
writes a single aggregated manifest (qc/preflight_gate_report.json), and exits
non-zero so a build wrapper or CI step can stop the freeze. It shells out to
the per-check scripts and reimplements none of them — the halt decision is driven
by each sub-check's normalized exit code.
python "${CLAUDE_SKILL_DIR}/scripts/preflight_gate.py" --project-root . --journal chest
# add --strict to also halt on the heuristic/conditional (P1) checks
# add --online to make fabricated / author-mismatched references halt (PubMed/CrossRef)
# add --double-blind to make the asset-anonymization scan halt
By default the gate halts only on the unambiguous, deterministic errors (P0):
leftover placeholder/markers (check_placeholders.py), undefined [@key]
citations (check_citation_keys.py), duplicate references (verify_refs.py,
offline-deterministic), and a canonical-vs-submission hash mismatch
(sync_submission.py audit). The heuristic or conditional checks — check_xref,
detect_copy_divergence, scope_drift_check, cover_letter_drift_check,
cross_document_n_check, check_cross_artifact_stale — run and report as P1
warn but do not halt unless promoted with --strict or --require ID;
check_asset_anonymization is P1 unless --double-blind. A check whose inputs are
absent (no rendered docx, no cover letter, no copies, no journal) is recorded
skipped, never a blocker. Exit codes: 0 clean, 1 halt (≥1 blocker), 2 gate
config error (e.g. a --require'd check could not run).
The gate's offline references pass is the deterministic subset (duplicates +
pagination placeholders); an online /verify-refs --strict against PubMed/CrossRef
remains the authoritative fabrication and author-name check before submission.
project.yaml or explicit input.audit reports DRIFT, do not retarget or freeze until the user either
patches the canonical manuscript or records the difference as journal-only.build succeeds, run /verify-refs before final submission.Gate 0 (pre-flight, last step before freeze): run scripts/preflight_gate.py --project-root . --journal {journal} to aggregate the deterministic checks below into one halt-on-failure manifest (qc/preflight_gate_report.json). Non-zero exit blocks the freeze. See "Pre-flight gate" above for the P0/P1 tiering and flags. This orchestrates Gates 1–3, 5b, 8, 9, 11 plus the placeholder and citation-key checks; the individual gates remain runnable on their own.
Gate 1: block freezing when canonical manuscript is missing.
Gate 2: block retargeting when the previous submission has unresolved drift.
Gate 3: require /verify-refs audit before marking a package submission-safe.
Gate 4: docx audits must use a recursive walk (paragraphs + tables + nested-table cells); a flat document.paragraphs scan is insufficient.
Gate 5: before freeze, confirm portal free-text fields (cover letter, data availability, acknowledgements, abstract, author contributions) match the manuscript body.
Gate 6 (double-blind journals): before freeze, export the portal's blinded review PDF and grep for all author identifiers across the entire upload set — manuscript, supplementary, cover letter, registry record PDFs (PROSPERO/ClinicalTrials), portal Letter-field text. A clean manuscript blind does not imply a clean portal blind.
Gate 7 (text-only docx rebuilds): never use pandoc --reference-doc=manuscript.docx for response/cover/supplementary text-only docx — the reference docx ships its embedded media (figure files) into the new docx, bloating size 50–100×. Use plain pandoc input.md -o output.docx for text-only artifacts.
Gate 5b (Phase 4 cover-letter free-text drift): before freeze, run scripts/cover_letter_drift_check.py to verify the cover letter's word-count / reference-count / table-figure-count claims still match the manuscript. Cover letters routinely go stale across v_N → v_(N+1) branching and are not covered by any docx-level audit. See "Phase 4 — Cover-letter free-text drift" below.
Gate 8 (Phase 5 cross-document N consistency): before freeze, run scripts/cross_document_n_check.py over the manuscript bundle (abstract, body, PROSPERO record, cover letter, supplementary, INDEX, PRISMA flow caption). Any N category with >1 distinct integer value is a P0 drift. When a FINAL_POOL_LOCK.yaml is present, supply --pool-lock to make the locked counts the authoritative baseline. See "Phase 5 — Cross-document N consistency" below.
Gate 9 (Phase 6 intra-manuscript scope drift): run scripts/scope_drift_check.py against the manuscript (and optionally the PROSPERO record). Numeric anchors (AUC, OR/HR/RR, sensitivity/specificity) appearing in Limitations / Discussion but absent from Methods + Results are P0 SCOPE_DRIFT. PROSPERO ↔ Methods synthesis-method disagreement is a P0 PROSPERO_DRIFT.
Gate 10 (Phase 7 v_(N+1) docx regeneration): when building a new submission from a frozen prior version, run scripts/verify_package_integrity.py --assert-vN-docx-changed --vN-docx <prev>.docx --new-docx <next>.docx. Identical MD5 = unmodified seed copy = block submission. Defense-in-depth — required even when the upstream pipeline appears to have regenerated the docx.
Gate 11 (Phase 8 multi-copy divergence): when the project hand-maintains more than one manuscript copy (working SSOT, circulation, portal), run scripts/detect_copy_divergence.py --ssot <ssot>.md --copy <copy>.md ... before freeze or circulation. Any STALE_COPY (an SSOT numeric claim or heading that did not propagate to a copy) is a P0 drift. See "Phase 8 — Multi-copy manuscript divergence" below.
Gate 12 (target-journal metadata drift): on build / retarget, cross-check the target the manuscript is written for against the target the project is being submitted to. Compare project.yaml target (and any in-manuscript header/footer "for submission to X" string) against the journal the package is built for, and check the structural metadata the target dictates — abstract heading structure (4- vs 5-heading), body word limit, citation style (Vancouver / AMA), required elements (Highlights / Central Illustration / Key Points). A mismatch (e.g., a header still reading the previous journal after a cascade retarget, or a 4-heading abstract for a 5-heading target) is a target-restructure trigger — branch to v_(N+1) per manuscript-versioning.md §2 and sync every sidecar (cover letter, title page, ICMJE COI list) — not a silent build.
# header target vs project.yaml target
TGT=$(python3 -c "import yaml;print(yaml.safe_load(open('project.yaml')).get('target',''))" 2>/dev/null)
grep -niE 'for submission to|submitted to|prepared for' manuscript/manuscript.md # compare against "$TGT"
Cover letters live outside the submission docx files but are read by the
editor side-by-side with the manuscript. Their ## Article details
block — body word count, abstract word count, reference count,
table/figure count — is a sidecar SSOT that routinely goes stale when a
manuscript branches v_N → v_(N+1) (word limit retarget, abstract
restructure, late reference batch).
scripts/cover_letter_drift_check.py measures the manuscript truth and
compares it to the cover letter's numeric claims:
python "${CLAUDE_SKILL_DIR}/scripts/cover_letter_drift_check.py" \
--manuscript manuscript.md \
--cover-letter cover_letter.md \
--refs refs.bib \
--out qc/cover_letter_drift.json
Body words are matched with a 5% tolerance ("approximately N words" phrasing). Abstract words tolerate ±5. Reference / table / figure counts require exact match.
Output qc/cover_letter_drift.json:
{
"submission_safe": false,
"truth": {"body_words": 3036, "abstract_words": 319, "references": 12,
"tables": 3, "figures": 4},
"claims": {"body_words": 3790, "abstract_words": 250, "references": 12},
"drifts": [
{"field": "body_words", "truth": 3036, "cover_letter_claim": 3790,
"severity": "MAJOR",
"note": "|claim - truth| = 754 > tolerance 151"}
]
}
Drift resolution: regenerate the cover letter from the manuscript at v_(N+1) build time. The script never edits the cover letter — that is left to the manuscript build pipeline so the cover letter stays a deliberate authored artifact.
Multi-document cohort-size drift is a high-frequency desk-reject pattern.
Manuscript abstracts, body prose, PROSPERO records, supplementary extraction
sheets, and PRISMA flow captions all repeat the same k included / k excluded
/ N patients totals — and any disagreement between them is read by reviewers
as either a data-integrity failure or a late-edit failure. Either reading
ends the round.
scripts/cross_document_n_check.py scans the submission package, extracts
every "N " claim by category (patients, cases, included, excluded,
nodules, tumors, studies_total), and groups them by category. A category with
more than one distinct integer value is a P0 drift.
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
--root . \
--out qc/cross_document_n.json
When the project has frozen a 2_Data/FINAL_POOL_LOCK.yaml from /meta-analysis
Phase 3f.5, pass it as the authoritative anchor:
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
--root . \
--pool-lock 2_Data/FINAL_POOL_LOCK.yaml \
--out qc/cross_document_n.json
Output qc/cross_document_n.json:
{
"submission_safe": false,
"drift_count": 1,
"drifts": [
{
"category": "included",
"values": [63, 64],
"locations": [
{"file": "abstract.md", "line": 4, "value": 63, "context": "..."},
{"file": "supplementary/s1.md", "line": 12, "value": 64, "context": "..."}
],
"severity": "MAJOR"
}
],
"lock_violations": []
}
Treat submission_safe: false as a halt. Resolve drift by tracing each
location to its data artifact (extraction sheet, PRISMA cascade TSVs) and
correcting the document(s) that disagree with the locked count.
Late-revision sensitivity analyses sometimes get introduced in the Discussion or Limitations subsection without ever propagating back to Methods + Results. The manuscript then makes claims (with explicit AUC, OR, sensitivity numbers) whose primary report never exists. Reviewers read this as a fabrication-grade red flag, and editors desk-reject.
A second variant of the same anti-pattern: the PROSPERO record commits to a synthesis method (Freeman-Tukey, random-effects DerSimonian-Laird, bivariate, HSROC, Bayesian, etc.) but the Methods section uses a different one — or the PROSPERO record was updated and Methods stayed behind. When accompanied by a Methods line saying "no amendment lodged", this becomes a documented silent protocol deviation.
scripts/scope_drift_check.py detects both patterns:
python "${CLAUDE_SKILL_DIR}/scripts/scope_drift_check.py" \
--manuscript manuscript.md \
--prospero prospero/prospero_v2.md \
--out qc/scope_drift.json
Output:
{
"submission_safe": false,
"limitations_only_anchors": [
{
"anchor": "0.869",
"kind": "AUC",
"found_in": ["Limitations:31"],
"missing_from": ["Methods", "Results"]
}
],
"synthesis_method_drift": [
{"method": "Freeman-Tukey", "prospero": true, "methods": false}
]
}
Resolution: either (a) propagate the anchor into Methods + Results as a primary report or (b) remove it from Limitations / Discussion. For synthesis-method drift, file a PROSPERO amendment and update Methods to match — both must agree before submission.
When a v_N submission package was frozen and a v_(N+1) is being built
(after a markdown body edit, reviewer round, or cascade-rejection
re-target), the v_(N+1) docx MUST differ from the v_N docx. The most
common silent-revert pattern is a cp v_N/manuscript.docx v_(N+1)/manuscript.docx step that skips the pandoc / Zotero CWYW
regeneration entirely. The markdown body is then edited, but the docx
the portal receives is the frozen v_N — the change silently reverts at
peer review.
Run the byte-identity assertion at the top of the v_(N+1) submission gate:
python3 /path/to/medsci-skills/scripts/verify_package_integrity.py \
--assert-vN-docx-changed \
--vN-docx SUBMISSION/<journal>/v<N>/manuscript.docx \
--new-docx SUBMISSION/<journal>/v<N+1>/manuscript.docx
Identical MD5 → exit 1 with explanatory error. Block submission until the regeneration step is fixed.
When a project keeps several hand-maintained manuscript copies — manuscript.md
(the working SSOT), manuscript_circulation.md (co-author feedback), and
submission/<journal>/manuscript.md (portal) — a batch of edits applied to the
SSOT routinely lands in only some of the copies. The portal then receives a copy
missing a subset of the edits, and the divergence surfaces (if at all) only when a
reviewer notices the inconsistency.
Before freezing a package or sending a circulation round, run the directional detector (SSOT → each copy):
python3 ${CLAUDE_SKILL_DIR}/scripts/detect_copy_divergence.py \
--ssot manuscript.md \
--copy manuscript_circulation.md \
--copy submission/<journal>/manuscript.md \
--out qc/copy_divergence.json --strict
It reports, per copy, the SSOT claims (numeric assertions — n = N, percentages,
p, OR/HR/RR, 95% CI — and section headings) that did not propagate. A STALE_COPY
(DIVERGENT overall) is a P0 blocker: re-propagate the unpropagated claims, or —
better — stop hand-maintaining parallel copies and generate the circulation /
submission variants from the single SSOT via a build step (pandoc transform), so
there is only one editable source. Claims are matched as normalized strings, so
wording differences do not register — only a changed or absent number/heading does;
legitimately copy-specific content (a circulation cover note) shows up as copy_only
and can be ignored.
Some Springer Editorial Manager journals offer only Manuscript / Figure / Table / Supplementary / LaTeX upload item types — no separate Title Page or Cover Letter slot, and sometimes no Graphical Abstract slot. Common for observational / cohort submissions.
<w:br w:type="page"/>; a bare \newpage is silently dropped in docx output) + the manuscript body with its byline / affiliations / corresponding-author footnote removed so the title page is not duplicated.
for s in Funding "Competing Interests" "Ethics Approval" "Consent to Participate" "Consent for Publication" "Author Contributions" "Data Availability"; do
unzip -p manuscript.docx word/document.xml | sed 's/<[^>]*>//g' | grep -q "$s" && echo "OK $s" || echo "MISSING $s"; done
Post-submission learnings (npj Digital Medicine R1, 2026-05): a clean docx-level audit still missed several stale artifacts that surfaced only at the portal review stage. Apply these whenever auditing a submission package.
python-docx paragraph.runs does not expose runs inside <w:hyperlink>; document.paragraphs skips table cells; document.tables does not recurse into nested tables. Figures, captions, and reporting checklists are routinely wrapped in 1×1 or nested tables, so flat scans silently miss them.
paragraphs + tables + nested-table cells recursively for every stale-string scan..runs — a missing inline element can be misread as an empty () artifact and "fixed" into a real defect.Cover letter, Data Availability, Acknowledgements, Abstract, and Author Contributions are often typed directly into the journal portal, outside any docx this skill audits. A clean docx audit does not imply a clean portal.
A clean manuscript-level blind sweep does not imply a clean portal-level blind. Author identifiers commonly leak through:
.md/.docx files, especially methodology logs, agreement metrics, amendment logs)Blind sweep regex coverage must include both period and no-period initial forms (e.g., Y.N. and YN), full names in roman + native scripts, institution names, ORCID IDs, and submission email domains. The first blind PDF export from the portal is the authoritative drift detector — always export and grep before final submit.
PROSPERO's "Print/PDF" export from the public record renders only the current amendment narrative. Previous versions are accessible only by selecting older versions in the public-record version-history dropdown. When citing PROSPERO version state, never rely on a single PDF export to verify cross-version consistency — record each published version's PDF independently and clarify in cover/supplementary which version anchors the methodology vs. which version reflects documentation-only erratum.
For documentation-only PROSPERO errata (correcting a narrative fact without changing methods/eligibility/synthesis), prefer a single Revision-Note append over a new structured amendment entry. Preserves historical audit trail and minimizes portal edit surface.
If response_to_reviewers.docx / cover_letter.docx / supplementary text-only docx grow to >100 KB after a rebuild, suspect --reference-doc pulling manuscript figure media. Verify with unzip -l output.docx | grep word/media/ — should be empty for text-only artifacts.
A tone, wording, or number change applied to one file (e.g. the abstract) must propagate to every file that repeats it — discussion, response-to-reviewers quotes, reporting checklists, supplementary captions, title page.
expertise-dependent patterns vs expertise-dependent evaluation patterns) — an exact-match grep on the short form passes while the long form remains stale./write-paper; it packages already canonical content..journal_meta.json.npx claudepluginhub aperivue/medsci-skills --plugin medsci-projectManages citation keys, CSL rendering, DOCX cross-references, marker conversion, and Zotero CWYW injection for medical manuscripts. Replaces inline reference handling across multiple skills.
Audits academic papers pre-submission using parallel agents for content, numerical consistency, references/DOIs, writing quality, figures/formatting, and replication archives.
Audits academic or technical manuscripts with a section-level refactoring report covering argument architecture, narrative flow, citation hygiene, and submission-readiness.