From flare-skills
Use when the user asks to convert, prepare, or organize a medical imaging dataset for nnUNet v2 / nnU-Net training, structure imagesTr/labelsTr folders, write dataset.json, generate splits_final.json, or set up classification labels (cls_data.csv). Triggers on the strings nnUNet, nnU-Net, imagesTr, labelsTr, dataset.json, splits_final.json, classification_labels, NaturalImage2DIO, NibabelIO, SimpleITKIO, Tiff3DIO. Inputs may be NIfTI / MHA / NRRD / PNG / BMP / TIFF; raw DICOM inputs must hand off to the dicom-converter skill first.
How this skill is triggered — by the user, by Claude, or both
Slash command
/flare-skills:nnunet-converterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert datasets into nnUNet v2 format with **minimal unnecessary conversion**.
README.mdreferences/2d_images.mdreferences/3d_tiff.mdreferences/classification_labels.mdreferences/conversion_notes_template.mdreferences/dataset_json_spec.mdreferences/input_layouts.mdreferences/label_handling.mdreferences/migration_and_inference.mdreferences/multi_modal.mdreferences/splits_and_provenance.mdscripts/convert_template.pyscripts/make_nnunet_dataset_simple.pyscripts/write_manifest.pyConvert datasets into nnUNet v2 format with minimal unnecessary conversion. nnUNet v2 natively supports many file formats — avoid format conversion whenever possible.
This SKILL.md is intentionally compact. Detailed guidance lives in references/*.md and is
loaded on demand based on the input dataset's characteristics. The pointer table at the
end of this file tells you exactly which reference to read for a given situation. Mandatory
references are flagged with MUST read — those are non-negotiable.
dicom-converterIf the input is raw DICOM, this skill is NOT the right entry point. Hand off to the dicom-converter skill first; come back here only after NIfTI / MHA / NRRD outputs exist.
Trigger conditions (any one of these → STOP and hand off):
.dcm, .DCM, or .IMA.DICOMDIR file is present.pydicom/SimpleITK.ImageSeriesReader call, or RTSTRUCT/SEG decoder anywhere in this conversion script.Why this is non-negotiable. dicom-converter runs a 10-check header-only audit (z-spacing uniformity, multi-acquisition under one SeriesUID, duplicate z, orientation, multi-RTSTRUCT, SOP-UID anchor coverage, FoR linkage, etc.) and routes RTSTRUCT contours / SEG frames by SOP-UID, not by z-coordinate geometry. Inlining a DICOM parser here bypasses every one of those checks and silently produces:
AcquisitionNumber values,These failures are silent — the script runs to completion, the NIfTI looks plausible, and nothing flags the missing or mis-routed voxels until you compare against ground truth. The nnUNet stage assumes correct NIfTI inputs; producing those is dicom-converter's job.
dicom-converter skill to produce NIfTI before we format for nnUNet".dicom-converter's workflow: audit (scripts/audit_dicom_dataset.py), build the SOP-UID map (scripts/build_sop_to_acq.py) if dirty, parse multi-RTSTRUCT directories (scripts/parse_rtstruct_union.py) when applicable, and write the NIfTI / MHA / NRRD outputs.If the user insists on doing the DICOM step inside nnunet-converter, refuse and point at dicom-converter. The handshake exists because the failure modes are invisible without it.
nnUNet v2 supports multiple file formats natively via its ReaderWriter abstraction.
Do NOT convert files to a different format unless strictly necessary.
.nii.gz → keep as .nii.gz.mha → keep as .mha (do NOT convert to .nii.gz).nrrd → keep as .nrrd.png / .bmp → keep as .png / .bmp (do NOT wrap in NIfTI).tif / .tiff → keep as .tif.jpg / .jpeg → must convert to .png (JPEG is lossy; nnUNet requires lossless)The only valid reasons to convert format are:
.jpg) — convert to .png.file_ending per dataset).| ReaderWriter | Extensions | Notes |
|---|---|---|
| NaturalImage2DIO | .png, .bmp, .tif | 2D natural images. RGB stored in a single file (no channel split). |
| NibabelIO | .nii.gz, .nrrd, .mha | Standard 3D medical imaging. |
| NibabelIOWithReorient | .nii.gz, .nrrd, .mha | Same as NibabelIO, reorients to RAS. |
| SimpleITKIO | .nii.gz, .nrrd, .mha | Alternative 3D reader. |
| Tiff3DIO | .tif, .tiff | 3D TIFF stacks. Requires companion .json with spacing. |
IMPORTANT: nnUNet requires lossless (or no) compression. No .jpg.
nnUNet_raw/
└── Dataset{ID}_{Name}/ # e.g. Dataset042_LiverSeg
├── dataset.json
├── imagesTr/
│ ├── case_0001_0000.nii.gz # channel 0 of case 0001
│ ├── case_0001_0001.nii.gz # channel 1 (multi-modal only)
│ └── ...
├── labelsTr/
│ ├── case_0001.nii.gz # segmentation mask (NO channel suffix)
│ └── ...
└── imagesTs/ # optional test images (no labels needed)
└── ...
File naming rule: {CASE_ID}_{XXXX}.{FILE_ENDING} for images, {CASE_ID}.{FILE_ENDING} for labels.
CASE_ID: any string, e.g. liver_001, BRATS_042.XXXX: 4-digit zero-padded channel identifier (_0000, _0001, ...)._0000 exists..png RGB natural images, all 3 colour channels are stored in a single file with suffix _0000. Do NOT split RGB into separate files. Details in references/2d_images.md.The five steps below are the canonical conversion flow. Each step lists the references you must load before doing the work for the relevant scenario.
Archive inventory first. If the input came from a zip/tar/7z archive or an
already-extracted archive folder, list the full archive contents or extracted
folder tree before choosing the input folder. Inspect every candidate folder,
including names such as _preprocessed, preprocessed, processed, and
derived; "prefer least processed data" does not mean skipping processed-looking
folders before you know what they contain.
FIRST CHECK — is the input DICOM? If yes (any .dcm / .DCM / .IMA / DICOMDIR / RTSTRUCT / SEG present), STOP. Hand off to the dicom-converter skill per the upstream-handshake section above. Do not proceed past Step 1 until the upstream stage has emitted NIfTI / MHA / NRRD outputs. You MUST NOT write DICOM-parsing code in this skill.
Once the input is confirmed to be a non-DICOM supported format, determine:
references/input_layouts.md before pairing images with labels..jpg (convert to .png), or unsupported (convert to nearest supported). If you see .dcm here, return to the FIRST CHECK above — this skill does not parse DICOM.imagesTr?"CT" → CT-specific global normalization (clip 0.5/99.5 percentile + z-score on foreground)."noNorm" → no normalization."rescale_to_0_1" → rescale intensities to [0, 1]."rgb_to_0_1" → uint8 / 255 (use for RGB natural images)."zscore" or anything else → per-image z-score (default for MRI).references/classification_labels.md before writing any classification CSV or classification_labels block.references/multi_modal.md before the conversion script touches imaging data.Use scripts/convert_template.py as a starting point for complex / non-NIfTI inputs.
Shortcut for the simple-NIfTI case: if and only if all of these hold —
.nii.gz,raw-dir/images/<case>_<chan>.nii.gz + raw-dir/labels/<case>.nii.gz,— you MAY use scripts/make_nnunet_dataset_simple.py directly instead of writing your own script. It copies files, writes dataset.json, and writes a seeded splits_final.json in one shot. Do not use it for any other layout.
Choose dependencies by input format:
.nii.gz / .mha / .nrrd → SimpleITK or nibabel..png / .jpg / .bmp → Pillow..tif / .tiff (2D) → Pillow. .tif (3D stacks) → tifffile.Mandatory pre-reads, depending on the data you have:
references/2d_images.md before writing any conversion code.references/3d_tiff.md before writing any conversion code.references/multi_modal.md before writing any conversion code.references/label_handling.md before writing any label-remapping code.references/classification_labels.md before writing cls_data.csv or classification_labels.Universal rules to enforce in the script:
file_ending in dataset.json.file_ending across the dataset.imagesTr and labelsTr..jpg / .jpeg inputs, convert to .png (lossless).You MUST read references/dataset_json_spec.md before writing or modifying dataset.json. Do not rely on memory of the schema — required fields and optional fields both have subtle rules (e.g. sort_keys=False for region-based labels).
Minimal required fields:
{
"channel_names": {"0": "CT"},
"labels": {"background": 0, "liver": 1},
"numTraining": 51,
"file_ending": ".nii.gz"
}
Optional but recommended: name, description, reference, licence, overwrite_image_reader_writer.
overwrite_image_reader_writeris optional — nnUNet auto-detects the correct ReaderWriter from the file extension. Set it only if auto-detection fails or you need a specific reader (e.g.,NibabelIOWithReorientto force RAS reorientation, orTiff3DIOfor 3D TIFF stacks).
ls imagesTr/ | wc -l # should equal numTraining * num_channels
ls labelsTr/ | wc -l # should equal numTraining
ls imagesTr/ | head -5 # spot-check naming
ls labelsTr/ | head -5
# If nnUNet is installed:
nnUNetv2_plan_and_preprocess -d {ID} --verify_dataset_integrity
Recommended visual QC after conversion: use the sibling dicom-converter
overlay-video helper documented in dicom-converter/references/visualization_qc.md
to generate a small random sample of image+label videos. This is recommended for
routine conversions and mandatory for any recovered failed case.
After every successful conversion you MUST read references/conversion_notes_template.md and append a fully-populated entry to conversion_notes.md in the nnUNet_raw/ directory. This step is non-negotiable — it is the only durable record of source paths, dropped files, label mapping, and licence.
If conversion_notes.md does not exist, create it with a # nnUNet Dataset Conversion Notes header first.
Skipping this step is treated as an incomplete conversion.
If the user wants reproducible cross-validation (almost always — anything that will train a model needs frozen splits), you MUST read references/splits_and_provenance.md before generating splits_final.json. The reference covers:
splits_final.json lives ($nnUNet_preprocessed, not $nnUNet_raw)._manifest.json companion (run scripts/write_manifest.py). This is optional but recommended; the mandatory human-readable record is still conversion_notes.md (Step 5).Record the seed and num_folds in your Step 5 conversion-notes entry under Notes.
| Situation | What you MUST read |
|---|---|
| Migrating from MSD or nnU-Net v1 | references/migration_and_inference.md |
Setting up nnUNet_raw / nnUNet_preprocessed / nnUNet_results env vars | references/migration_and_inference.md |
| Preparing inference inputs to match a trained model | references/migration_and_inference.md |
Writing a custom splits_final.json for cross-validation | references/splits_and_provenance.md (preferred) or references/migration_and_inference.md (older brief) |
When the situation matches the left column, the rule on the right is mandatory.
| Situation | Action |
|---|---|
| Input came from an archive or extracted archive folder | List all archive/extracted-folder contents before choosing the input folder; do this before the DICOM handoff check. |
| Input is raw DICOM (.dcm / DICOMDIR / RTSTRUCT / SEG) | STOP. Hand off to the dicom-converter skill per the upstream-handshake section above. Do NOT parse DICOM in this skill. Re-enter only after NIfTI/MHA/NRRD outputs exist. |
| 2D images (PNG / BMP / TIFF including RGB) | MUST read references/2d_images.md before writing conversion code. |
| 3D TIFF stacks (Tiff3DIO, multi-frame TIFF) | MUST read references/3d_tiff.md before writing conversion code. |
| Multi-modal MRI / different resolutions per modality | MUST read references/multi_modal.md before writing conversion code. |
Case-level classification labels (cls_data.csv) | MUST read references/classification_labels.md before writing classification metadata. |
| Label remapping, ignore label, region-based training | MUST read references/label_handling.md before writing label code. |
| Non-trivial / unfamiliar source folder layout | MUST read references/input_layouts.md before pairing images with labels. |
Writing or editing dataset.json | MUST read references/dataset_json_spec.md before writing JSON. |
| Step 5 — Conversion notes (every conversion) | MUST read references/conversion_notes_template.md and append the fully-populated entry. |
Generating splits_final.json or the optional _manifest.json | MUST read references/splits_and_provenance.md before generating either file. |
| Post-conversion visual QC or recovered failed-case review | Use dicom-converter/scripts/make_overlay_qc_videos.py as documented in dicom-converter/references/visualization_qc.md. |
| MSD / nnU-Net v1 migration, env vars, inference, custom splits | MUST read references/migration_and_inference.md. |
nnunet-converter/
├── SKILL.md # This file (entry point)
├── references/
│ ├── dataset_json_spec.md # dataset.json field reference
│ ├── 2d_images.md # PNG / BMP / RGB natural images
│ ├── 3d_tiff.md # Tiff3DIO + companion .json spacing
│ ├── multi_modal.md # multi-channel layout + spatial resampling
│ ├── classification_labels.md # cls_data.csv + classification_labels
│ ├── label_handling.md # validation, ignore label, region-based
│ ├── input_layouts.md # Layouts A / B / C / D
│ ├── conversion_notes_template.md # mandatory Step 5 log template
│ ├── splits_and_provenance.md # splits_final.json + optional _manifest.json
│ └── migration_and_inference.md # env vars, MSD/v1, splits, inference
├── scripts/
│ ├── convert_template.py # reusable Python conversion template (complex inputs)
│ ├── make_nnunet_dataset_simple.py # CLI for the simple 3D-NIfTI case (writes splits_final.json)
│ └── write_manifest.py # optional _manifest.json provenance writer
└── README.md
Both make_nnunet_dataset_simple.py and write_manifest.py were adapted from
ryanwangk/medimg_skills (MIT). Acquisition
of medical datasets (TCGA/GDC, Kaggle, HuggingFace, Google Drive, sbatch) is intentionally
out of scope here — it lives in a separate dataset-acquisition skill.
npx claudepluginhub medfm-flare/flare-skills --plugin flare-skillsCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.