From Omics Skills
Critically review, score, compare, and rank one or more AI scientist outputs for biology, bioinformatics, computational life science, or adjacent research tasks. Trigger when the user asks to evaluate notebooks, code, figures, analyses, manuscripts, software, or final reports produced by AI scientists; compare multiple AI scientists on the same task; judge publication readiness; or audit rigor, reproducibility, novelty, and task completion. Do not use this skill to perform the original research task itself unless the user is explicitly asking for a reviewer-style audit of already produced outputs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/omics-skills:ai-scientist-evaluatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when Codex should behave like a skeptical reviewer panel rather
agents/openai.yamlassets/default_weight_profiles.yamlassets/evaluation_schema.jsonassets/evaluation_template.jsonassets/report_template.mdexamples/bio_task_mappings.mdexamples/example_prompts.mdreferences/category_definitions.mdreferences/question_bank.mdreferences/red_flags.mdreferences/score_scale.mdreferences/task_profiles.mdscripts/aggregate_reviews.pyUse this skill when Codex should behave like a skeptical reviewer panel rather than a research generator. Evaluate completed outputs, not just plans.
references/task_profiles.md and load the
matching weights from
assets/default_weight_profiles.yaml.
Use the primary scientific profile first for composite tasks, then add
manuscript comments as a secondary layer.references/question_bank.md. Always include
the universal questions, then add the profile-specific and multi-submission
questions when needed.references/red_flags.md. Penalize missing
evidence, task drift, unsupported biological claims, fabricated identifiers,
and unverifiable citations more than polished narrative.references/score_scale.md. Use
references/category_definitions.md if
category meaning is unclear. A score of 5 earns the full category weight.assets/evaluation_template.json and
validate the shape against
assets/evaluation_schema.json. Use
assets/report_template.md for markdown
reports. For completed JSON reviews, you may aggregate rankings with
python scripts/aggregate_reviews.py review1.json review2.json --out_md leaderboard.md.| Task | Action |
|---|---|
| General scientific audit | Use profile scientific-analysis |
| Phylogenomics or comparative genomics review | Use profile phylogenomics-comparative-genomics |
| Viral functional genomics review | Use profile viral-functional-genomics |
| Methods or software benchmark review | Use profile methods-software |
| Manuscript or short communication review | Use profile manuscript-packaging |
| Pick scoring weights | Read assets/default_weight_profiles.yaml |
| Interpret category names | Read references/category_definitions.md |
| Ask evidence-forcing review questions | Read references/question_bank.md |
| Check integrity and rigor failures | Read references/red_flags.md |
| Score consistently | Read references/score_scale.md |
| Draft a report | Use assets/report_template.md |
| Produce structured JSON | Use assets/evaluation_template.json and assets/evaluation_schema.json |
| Rank finished JSON reviews | Run python scripts/aggregate_reviews.py review1.json review2.json --out_md leaderboard.md |
If key artifacts are missing, continue the review and mark the evidence gap explicitly instead of pretending certainty.
For a single submission, produce:
For multiple submissions, produce:
Use these recommendation labels:
90-100: Outstanding / near publication-ready75-89: Strong but needs minor to moderate revision60-74: Promising but major revision needed40-59: Weak / unreliable in important respects<40: Not trustworthy for scientific useUse $ai-scientist-evaluator to review five AI scientist submissions for the
same task. Inspect notebooks, code, figures, runtime notes, and manuscripts.
Score each submission with the appropriate weight profile, answer the critical
questions, identify red flags, and produce a ranked consensus table with
best-in-class awards.
Use $ai-scientist-evaluator to review this AI scientist submission as if you are
a skeptical reviewer panel. Tell me whether the notebook and manuscript really
support the main claims, score the work, and list the revisions required before
I would trust it.
python scripts/aggregate_reviews.py review_a.json review_b.json --out_md leaderboard.md
Issue: The submission includes only a polished manuscript and no underlying artifacts. Solution: Continue the review, but mark reproducibility and claim-evidence gaps explicitly and do not award publication-ready status.
Issue: The task spans more than one domain profile. Solution: Score with the closest primary scientific profile first, then add manuscript or secondary-domain comments without inventing a new weight set unless the user asks for one.
Issue: Multiple submissions look close in total score. Solution: Break ties with integrity, task completion, validation strength, and limitation handling before writing quality.
Issue: A claim looks impressive but evidence is thin or missing. Solution: Penalize unsupported claims, cite the missing evidence directly, and keep the verdict skeptical.
/bio-logic — general scientific reasoning beyond AI evaluation/manuscript-review-council — equivalent pipeline for human-authored manuscripts/scientific-writing — draft the evaluation writeupnpx claudepluginhub fmschulz/omics-skills --plugin omics-skillsProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.