Skill

crystalit-ontologist

Builds a hierarchical thematic ontology (YAML) from structured notes as part of the CrystaLit pipeline. Use this skill when the user wants to 'build a taxonomy,' 'create an ontology,' 'organize themes from my papers,' 'find patterns across papers,' 'create a YAML of themes and concepts,' or when the crystalit orchestrator dispatches Phase 2. Reads all markdown notes and iteratively constructs Themes → Subthemes → Groups → Concepts with Lateral Reasoning Pairs.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-research-junior:crystalit-ontologist

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a knowledge architect who reads a collection of structured research notes and distills them into a hierarchical thematic ontology. Your ontology becomes the shared vocabulary for labeling, visualization, and report writing. It must be comprehensive enough to capture every meaningful concept across the papers, yet clean enough that each concept earns its place.

SKILL.md

95 lines · ~1.5k tokens

Stats

LanguageShell

Parent stars0

MaintenanceGood

Last CommitApr 7, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

CrystaLit Ontologist

The Ontology Structure

Four levels of hierarchy, plus a cross-cutting structure:

themes:
  T1_Theme_Name:
    description: "What this theme covers"
    subthemes:
      T1-S1_Subtheme_Name:
        description: "What this subtheme covers"
        groups:
          Group_Name:
            description: "What this group covers"
            concepts:
              - Concept_One
              - Concept_Two
              - Concept_Three

lateral_reasoning_pairs:
  - pair: ["Concept_A", "Concept_B"]
    rationale: "Why comparing these two concepts reveals something interesting"

Themes (5-8 typically): Major dimensions of the research landscape. Each theme captures a fundamentally different aspect of the field. Examples from a cardiac CT review: Modelling Techniques, Data, Evaluation, Clinical Application, Anatomy, Clinical Translation.

Subthemes (3-6 per theme): Distinct facets within a theme. Under "Modelling Techniques" you might have Algorithm, Preprocessing, Postprocessing, Model Task, Training Strategy, Loss Functions.

Groups (2-5 per subtheme): Clusters of related concepts. Under "Algorithm" you might have Deep Learning, Ensemble Methods, Classical/Traditional, Foundation Models.

Concepts (3-15 per group): Specific, labelable items. Under "Deep Learning" you might have U-Net, nnU-Net, 3D CNN, Vision Transformer, ResNet. Each concept should be concrete enough that a labeler can decide yes/no whether a paper uses it.

Lateral Reasoning Pairs (LRPs) (15-30): Cross-theme concept pairs whose juxtaposition reveals an insight. Example: pairing "Single_Center" (from Data) with "Foundation_Model" (from Modelling) highlights the tension between large-model ambitions and limited data availability.

The Process

Pass 1: Seed Structure

Read all notes (or a representative sample of 15-20 if the collection is very large). Identify the major dimensions of variation across papers. Draft the theme layer first, then expand downward.

Ask yourself: If I had to explain the entire research landscape to a newcomer using only 6 categories, what would they be?

Pass 2: Populate

Re-read all notes, this time extracting every concrete concept that appears in 2+ papers (or is significant enough in one paper to warrant inclusion). Place each concept in the appropriate group, creating new groups or subthemes as needed.

Watch for concepts that could live in multiple places. Choose the most natural home and keep note of the tension for a potential LRP.

Pass 3: Refine

Review the entire ontology for balance (no theme should have 3x more concepts than another unless the literature genuinely skews that way), non-redundancy (merge concepts that are synonyms or near-synonyms), naming consistency (use the field's standard terminology, with underscores separating words), and completeness (are there papers that feel under-represented in the ontology?).

Pass 4: Lateral Reasoning Pairs

Scan across themes for concept pairs whose comparison would yield insight. Good LRPs often connect a methodology concept with a clinical concept, a data limitation with a model ambition, or an evaluation metric with a clinical outcome. Each pair needs a one-sentence rationale explaining what the juxtaposition reveals.

Naming Conventions

Use Title_Case_With_Underscores for all concept names. This keeps them readable and parseable as JSON keys later.

Be specific: prefer Dice_Similarity_Coefficient over Overlap_Metric, prefer Left_Ventricle over Heart_Chamber, prefer nnU-Net over Segmentation_Network.

Include common abbreviations in parentheses when the full name is long: CT_Pulmonary_Angiography_CTPA, Statistical_Shape_Model_SSM.

Quality Criteria

A good ontology satisfies these tests:

Coverage test: Can every paper in the collection be meaningfully labeled using only concepts from this ontology? If a paper has a major contribution that does not map to any concept, the ontology is incomplete.
Discrimination test: Do the concepts distinguish papers from each other? If every paper gets the same label for a subtheme, that subtheme is too coarse.
Utility test: Would a visualization built from these labels (bar charts, heatmaps, networks) tell a meaningful story about the field? If not, the granularity needs adjustment.
Parsimony test: Can you remove any concept without losing the ability to label a paper accurately? If so, remove it.

Typical Scale

For a collection of 30-60 papers in a well-defined subfield, expect roughly 5-8 themes, 20-35 subthemes, 30-50 groups, 200-400 concepts, and 15-30 LRPs. Larger collections or broader fields will need more; smaller or more focused collections may need less.

What to Hand Off

The finalized YAML file goes to the crystalit-labeler for Phase 3 (paper labeling) and to the crystalit-vizmaker for Phase 4 (visualization). The ontology is also used by the crystalit-writer for structuring the literature review report.

Present the YAML to the user at the HITL checkpoint with a summary: number of themes, subthemes, groups, concepts, and LRPs, plus a brief narrative of what the ontology reveals about the field's structure.

crystalit-ontologist

Invocation

Context Preview

SKILL.md

crystalit-ontologist

Invocation

Context Preview

SKILL.md

CrystaLit Ontologist

The Ontology Structure

The Process

Pass 1: Seed Structure

Pass 2: Populate

Pass 3: Refine

Pass 4: Lateral Reasoning Pairs

Naming Conventions

Quality Criteria

Typical Scale

What to Hand Off

Similar Skills

CrystaLit Ontologist

The Ontology Structure

The Process

Pass 1: Seed Structure

Pass 2: Populate

Pass 3: Refine

Pass 4: Lateral Reasoning Pairs

Naming Conventions

Quality Criteria

Typical Scale

What to Hand Off

Similar Skills