Skill

sample2barcode-generation

Generate Sample2Barcode files for CellRanger multi demultiplexing workflows on FGCZ infrastructure. Supports HTO/CMO (hashtag oligo with TotalSeq-B/C antibodies), OCM (On-Chip Multiplexing with OB1-OB4 barcodes), and Flex v2 probe barcode multiplexing. Use when creating demultiplexing metadata for SUSHI CellRangerMulti app.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sequencing-pipelines:sample2barcode-generation

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate Sample2Barcode.csv files for CellRanger multi demultiplexing workflows on FGCZ infrastructure.

Supporting Files

assets/templates/sample2barcode_double_hash.csvassets/templates/sample2barcode_flexv2_template.csvassets/templates/sample2barcode_ocm_template.csvassets/templates/sample2barcode_template.csvreferences/barcode_sequences.mdreferences/flex_v2_cellranger.mdreferences/sushi_integration.md

SKILL.md

507 lines · ~4.2k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitMay 27, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Sample2Barcode Generation Skill

Overview

Generate Sample2Barcode.csv files for CellRanger multi demultiplexing workflows on FGCZ infrastructure.

Important: This format is FGCZ/SUSHI-specific. The SUSHI CellRangerMulti app reads these files and converts them to the native Cell Ranger multi config format with [samples] sections. The quoting conventions and file locations are specific to FGCZ workflow integration.

This skill supports:

OCM demultiplexing - On-Chip Multiplexing with OB1-OB4 barcodes (read-level)
HTO/CMO demultiplexing - TotalSeq-B/C hashtag antibodies and 10x CMO (cell-level)
Flex v2 probe barcoding - Fixed RNA Panel v2 plate-based multiplexing
File format specifications - Exact CSV format required by SUSHI/ezRun
Folder structure - Where to place files in gstore
Double-hashing support - Pipe-separated IDs for multiple barcodes per sample (HTO only)
SUSHI integration - Required parameters for CellRangerMulti app

When to Use

Use this skill when:

Setting up OCM (On-Chip Multiplexing) experiments with OB1-OB4 barcodes
Setting up HTO (hashtag oligo) cell hashing experiments for CellRanger
Configuring Flex v2 probe barcode multiplexing (4-plex to 384-plex)
Creating demultiplexing metadata for SUSHI CellRangerMulti app
Mapping samples to OCM barcodes, TotalSeq-B, TotalSeq-C, or Flex v2 probe barcodes
Configuring single or double-hashing demultiplexing workflows

Quick Reference

File Format - OCM (Sample2Barcode.csv)

sample_id,ocm_barcode_ids
Sample1,OB1
Sample2,OB2
Sample3,OB3
Sample4,OB4

Requirements:

Column names: sample_id,ocm_barcode_ids (NOT quoted)
Values are NOT quoted
Valid barcode IDs: OB1, OB2, OB3, OB4 only (4-plex max)
One file per pool: {PoolName}_Sample2Barcode.csv
No barcode reference file needed - chemistry handles it internally

Key difference from HTO/CMO: OCM is read-level multiplexing where the barcode info is embedded in the Gene Expression reads. No separate Multiplexing Capture fastqs or barcode reference files are required.

File Format - HTO/CMO (Sample2Barcode.csv)

"sample_id","cmo_ids"
"Sample1","B0301"
"Sample2","B0302"
"Sample3","B0303"

Requirements:

Column names: "sample_id","cmo_ids" (quoted for SUSHI compatibility)
Values quoted for SUSHI/ezRun compatibility
Barcode IDs must match those in the system reference file
One file per pool: {PoolName}_Sample2Barcode.csv

Note: Cell Ranger also supports hashtag_ids as a separate column for HTO-only experiments. The SUSHI integration uses cmo_ids for both CMO and HTO multiplexing.

File Format - Flex v2 Probe Barcodes (Sample2Barcode.csv)

sample_id,probe_barcode_ids,description
Sample1,A-A01,Sample1
Sample2,A-B01,Sample2
Sample3,A-C01,Sample3
Sample4,A-D01,Sample4

Requirements:

Column names: sample_id,probe_barcode_ids,description (NOT quoted)
Values are NOT quoted
Probe barcode format: <plate>-<well> (e.g., A-A01, B-C05)
Available barcodes: A-A01 through P-P24 (384 total)
One file per pool: {PoolName}_Sample2Barcode.csv

Folder Structure

/srv/gstore/projects/pXXXXX/oXXXXX_metaData/
├── Pool1_Sample2Barcode.csv
├── Pool2_Sample2Barcode.csv
└── ... (one per pool)

System Reference Files

HTO/CMO Files - Located at /misc/GT/databases/10x/CMO_files/:

File	Species	Hashtag Type	IDs
`10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv`	Mouse	TotalSeq-B	B0251-B0308
`10x_BL_TotalSeqC_20230620_v1_AntibodyCapture.csv`	Human	TotalSeq-C	C0251-C0310
`10x_CMO_20230620_v1.csv`	Any	10x CMO (lipid)	CMO301-CMO312

Common hashtags:

Mouse: B0301-B0305 (TotalSeq-B)
Human: C0251-C0258 (TotalSeq-C)

Flex v2 Probe Barcode Files - Located at /srv/GT/databases/10x_Probesets/Chromium/:

File	Purpose	Barcodes
`flex-v2-384.txt`	Probe barcode sequences	A-A01 through P-P24 (384 total)
`Chromium_Human_Transcriptome_Probe_Set_v2.0.0_GRCh38-2024-A.csv`	Human probe set	GRCh38 reference
`Chromium_Mouse_Transcriptome_Probe_Set_v2.0.0_GRCm39-2024-A.csv`	Mouse probe set	GRCm39 reference

Flex v2 plate layout:

Row plates: A through P (16 rows)
Column positions: A01 through P24 (24 columns)
Total: 384 unique barcodes (16 × 24)

Core Workflows

Workflow 1: OCM (On-Chip Multiplexing)

For 10x Genomics 3' v3.1 HT, v4, or 5' v3 kits with on-chip multiplexing.

Technology: Read-level multiplexing with overhang barcodes (OB1-OB4) Cell Ranger version: 9.0.0+ required Chemistry: Auto-detected (SC3Pv3-polyA-OCM, SC3Pv4-polyA-OCM, etc.) Max plex: 4 samples per pool

Input: Sample-to-OCM barcode mapping from plate layout

Sample	OCM Barcode
SKNSH_undiff_1	OB1
SKNSH_undiff_2	OB2
SKNSH_undiff_3	OB3
SKNSH_undiff_4	OB4

Output: SKNSH_undiff_Sample2Barcode.csv

sample_id,ocm_barcode_ids
SKNSH_undiff_1,OB1
SKNSH_undiff_2,OB2
SKNSH_undiff_3,OB3
SKNSH_undiff_4,OB4

SUSHI Configuration:

TenXLibrary = GEX,Multiplexing
MultiplexingType = ocm
MultiplexBarcodeSet = (leave empty - not needed)

Key differences from HTO/CMO:

Column name: ocm_barcode_ids (not cmo_ids)
Values NOT quoted
No barcode reference file needed (no MultiplexBarcodeSet)
No Multiplexing Capture library needed (fastqs are in GEX)
No MultiDataDir column needed in dataset

Workflow 2: Flex v2 Probe Barcode Multiplexing

For plate-based multiplexing with Flex v2 chemistry (4-plex to 384-plex).

Technology: Fixed RNA Panel v2 with on-chip probe barcodes Cell Ranger version: 10.0.0+ required (will fail with earlier versions) Chemistry: MFRP for multiplexed / SFRP for singleplex (auto-detected by ezRun)

Input: Sample-to-probe barcode mapping from plate layout

Sample	Probe Barcode	Well Position
Bcells_Donor1	A-A01	Plate A, well A01
Bcells_Donor2	A-B01	Plate A, well B01
Bcells_Donor3	A-C01	Plate A, well C01
Bcells_Donor4	A-D01	Plate A, well D01

Output: Bcells_Sample2Barcode.csv

sample_id,probe_barcode_ids,description
Bcells_Donor1,A-A01,Bcells_Donor1
Bcells_Donor2,A-B01,Bcells_Donor2
Bcells_Donor3,A-C01,Bcells_Donor3
Bcells_Donor4,A-D01,Bcells_Donor4

SUSHI Configuration:

TenXLibrary = fixedRNA,Multiplexing
probesetFile = Chromium_Human_Transcriptome_Probe_Set_v2.0.0_GRCh38-2024-A.csv
MultiplexingType = (ignored for Flex v2)
chemistry = MFRP

Key differences from HTO:

Column name: probe_barcode_ids (not cmo_ids)
Values NOT quoted
Third column description required
Uses probe set reference instead of feature reference
No MultiplexBarcodeSet parameter needed

DO NOT set for Flex v2 (these are HTO/CMO-only and will cause errors):

MultiplexingType — ignored when fixedRNA is in TenXLibrary
FeatureBarcodeFile — HTO/CITE-seq only
MultiplexBarcodeSet — HTO/CMO only

How ezRun detects Flex v2: When fixedRNA is in TenXLibrary, EzAppCellRangerMulti automatically skips all HTO/CMO logic, uses probe_barcode_ids format, and ignores MultiplexingType.

Flex v2 Troubleshooting:

Error	Fix
"Column probe_barcode_ids not found"	File uses HTO format — use `probe_barcode_ids` column, no quotes
"Chemistry detection failed"	Cell Ranger < 10.0.0 — `module load Aligner/CellRanger/10.0.0`
"Probe set file not found"	Wrong filename/version — use exact filename from probe set table
"Feature reference not found"	`FeatureBarcodeFile` was set — leave it empty for Flex v2

For detailed reference (probe set file paths, plex tables, g-req copy example): see references/flex_v2_cellranger.md.

Workflow 3: Single-Hashtag Samples (HTO/CMO Standard)

Most common scenario - each sample labeled with one hashtag.

Input: Sample-to-hashtag mapping from lab

Sample	Hashtag
WT_PBS_mouse1	B0303
WT_PBS_mouse2	B0304
WT_PBS_mouse3	B0305

Output: WT_PBS_Sample2Barcode.csv

"sample_id","cmo_ids"
"WT_PBS_mouse1","B0303"
"WT_PBS_mouse2","B0304"
"WT_PBS_mouse3","B0305"

Workflow 4: Double-Hashtag Samples (HTO)

For increased confidence - samples labeled with two hashtags (pipe-separated).

Input: Sample-to-hashtag mapping with multiple hashtags

Sample	Hashtags
Sample_mouse1	B0303
Sample_mouse2	B0301 + B0304
Sample_mouse3	B0302 + B0305

Output: Sample_Sample2Barcode.csv

"sample_id","cmo_ids"
"Sample_mouse1","B0303"
"Sample_mouse2","B0301|B0304"
"Sample_mouse3","B0302|B0305"

Note: ezRun handles pipe-separated IDs as of commit fef7f98c.

Workflow 5: Multiple Pools (OCM, HTO, or Flex v2)

For experiments with multiple pooled samples, create one file per pool.

Example structure:

/srv/gstore/projects/p39685/o39834_metaData/
├── WT_PBS_Epi_Sample2Barcode.csv
├── WT_PBS_CD45_Sample2Barcode.csv
├── WT_LPS_Epi_Sample2Barcode.csv
├── WT_LPS_CD45_Sample2Barcode.csv
├── KO_PBS_Epi_Sample2Barcode.csv
├── KO_PBS_CD45_Sample2Barcode.csv
├── KO_LPS_Epi_Sample2Barcode.csv
└── KO_LPS_CD45_Sample2Barcode.csv

Step-by-Step Guide

Step 1: Create metaData Folder

# Create folder in working directory first
mkdir -p /srv/GT/analysis/pXXXXX/oXXXXX_metaData

# Later copy to gstore (g-req creates directories)

Step 2: Determine Hashtag-to-Sample Mapping

Obtain from lab:

Which TotalSeq-B or TotalSeq-C antibodies were used
Which samples were labeled with which hashtags
Verify hashtag IDs match system reference files

Verify barcode exists:

# Check TotalSeq-B
grep "B0303" /misc/GT/databases/10x/CMO_files/10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv

# Check TotalSeq-C
grep "C0251" /misc/GT/databases/10x/CMO_files/10x_BL_TotalSeqC_20230620_v1_AntibodyCapture.csv

Step 3: Generate Sample2Barcode.csv Files

Create one file per pool with exact format:

"sample_id","cmo_ids"
"SampleName_replicate1","B0301"
"SampleName_replicate2","B0302"

Using R:

sample_mapping <- data.frame(
  sample_id = c("WT_PBS_mouse1", "WT_PBS_mouse2", "WT_PBS_mouse3"),
  cmo_ids = c("B0303", "B0304", "B0305")
)

write.csv(sample_mapping, "WT_PBS_Sample2Barcode.csv",
          row.names = FALSE, quote = TRUE)

Step 4: Copy to gstore

# Copy all files to metaData folder
g-req copynow /srv/GT/analysis/pXXXXX/oXXXXX_metaData/ /srv/gstore/projects/pXXXXX/

# Verify
ls /srv/gstore/projects/pXXXXX/oXXXXX_metaData/

Step 5: Configure SUSHI Parameters

In SUSHI CellRangerMulti app, set:

Parameter	Value	Notes
`TenXLibrary`	`GEX,Multiplexing`	Required for HTO
`MultiplexingType`	`antibody`	Critical - must be "antibody"
`FeatureBarcodeFile`	`""` (empty)	Leave empty - uses system reference
`MultiplexBarcodeSet`	`10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv`	Or TotalSeqC for human

SUSHI Integration

Required Parameters by Multiplexing Type

OCM (On-Chip Multiplexing)

TenXLibrary = GEX,Multiplexing
MultiplexingType = ocm
MultiplexBarcodeSet = (leave empty)

No MultiDataDir column needed in dataset
No barcode reference file needed

HTO/Hashtag (TotalSeq-B/C)

TenXLibrary = GEX,Multiplexing
MultiplexingType = antibody
FeatureBarcodeFile = ""
MultiplexBarcodeSet = 10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv

Flex v2 (Fixed RNA Panel)

TenXLibrary = GEX,fixedRNA,Multiplexing
MultiplexingType = (leave empty)
probesetFile = Chromium_Human_Transcriptome_Probe_Set_v2.0.0_GRCh38-2024-A.csv

MultiplexBarcodeSet Options

File	Use For
(empty)	OCM multiplexing - no file needed
`10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv`	Mouse samples with TotalSeq-B
`10x_BL_TotalSeqC_20230620_v1_AntibodyCapture.csv`	Human samples with TotalSeq-C
`10x_CMO_20230620_v1.csv`	10x CMO lipid multiplexing (rare)

Common Errors and Solutions

Error: "cannot open file 'CMO_files': it is a directory"

Cause: MultiplexBarcodeSet is empty
Fix: Set to correct reference file

Error: "Column not found in dataset: MultiDataDir"

Cause: MultiplexingType is empty or incorrect
Fix: Set to antibody

Error: "Unknown hashtag_ids provided for sample"

Cause: Barcode ID not in reference file
Fix: Verify IDs match system reference (e.g., B0301 not HTO1)

Error: "Feature barcode not found in reference"

Cause: Mismatch between library type and feature reference
Fix: Use _AntibodyCapture.csv files for HTO (not plain .csv)

Complete Example: Project p39685

Background

8 pooled samples, each with 3 mice labeled with TotalSeq-B hashtags B0303-B0305.

Sample2Barcode Files Created

WT_PBS_Epi_Sample2Barcode.csv:

"sample_id","cmo_ids"
"WT_PBS_Epi_mouse1","B0303"
"WT_PBS_Epi_mouse2","B0304"
"WT_PBS_Epi_mouse3","B0305"

WT_PBS_CD45_Sample2Barcode.csv (with double-hashing):

"sample_id","cmo_ids"
"WT_PBS_CD45_mouse1","B0303"
"WT_PBS_CD45_mouse2","B0301|B0304"
"WT_PBS_CD45_mouse3","B0302|B0305"

SUSHI Configuration Used

TenXLibrary = GEX,Multiplexing
MultiplexingType = antibody
FeatureBarcodeFile = ""
MultiplexBarcodeSet = 10x_BL_TotalSeqB_20230620_v1_AntibodyCapture.csv

Validation Checklist

OCM Multiplexing

Column names: sample_id,ocm_barcode_ids (NOT quoted)
Values NOT quoted
Barcode IDs are OB1, OB2, OB3, or OB4 only
One file per pool: {PoolName}_Sample2Barcode.csv
Files placed in /srv/gstore/projects/pXXXXX/oXXXXX_metaData/
MultiplexingType = ocm in SUSHI
MultiplexBarcodeSet left empty (not needed)

HTO/CMO Multiplexing

Column names: "sample_id","cmo_ids" (quoted)
All values quoted
Barcode IDs match system reference (B0301 not HTO1)
One file per pool: {PoolName}_Sample2Barcode.csv
Files placed in /srv/gstore/projects/pXXXXX/oXXXXX_metaData/
MultiplexingType = antibody in SUSHI
MultiplexBarcodeSet points to correct _AntibodyCapture.csv file

Resources

Bundled Files

File	Purpose
`assets/templates/sample2barcode_ocm_template.csv`	OCM template (OB1-OB4)
`assets/templates/sample2barcode_template.csv`	Basic HTO/CMO single-hashtag template
`assets/templates/sample2barcode_double_hash.csv`	Double-hashing template
`assets/templates/sample2barcode_flexv2_template.csv`	Flex v2 probe barcode template
`references/barcode_sequences.md`	Complete barcode sequences reference
`references/sushi_integration.md`	Detailed SUSHI parameter guide
`references/flex_v2_cellranger.md`	Flex v2 deep reference: probe set paths, plex ranges, g-req example, ezRun detection logic

Related Skills

Read structures and bcl2fastq base masks per 10x chemistry (3'/5' GEX, VDJ, ATAC, Multiome, Visium, Flex): see draugr-demultiplexing → references/sequencing-read-requirements.md. Use this when you need to know what R1/R2/i7/i5 lengths a chemistry requires (e.g. why ATAC i5 must be capital I, why VisiumHD R1=43bp).

External Resources

BioLegend TotalSeq: https://www.biolegend.com/en-us/totalseq
10x CellRanger Multi: https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-multi
FGCZ SUSHI: https://fgcz-sushi.uzh.ch

Troubleshooting

"No cells assigned to sample"

Check hashtag antibody quality
Verify correct barcode sequences were used
May indicate labeling or sequencing issue

Double-hashing samples not demultiplexing

Ensure ezRun is updated (commit fef7f98c or later)
Verify pipe-separated format: "B0301|B0304" (no spaces)

Feature type mismatch errors

Use _AntibodyCapture.csv reference files for HTO
These have Antibody Capture feature type
Plain .csv files have Multiplexing Capture for CMO only

sample2barcode-generation

Invocation

Context Preview

Supporting Files

SKILL.md

sample2barcode-generation

Invocation

Context Preview

Supporting Files

SKILL.md

Sample2Barcode Generation Skill

Overview

When to Use

Quick Reference

File Format - OCM (Sample2Barcode.csv)

File Format - HTO/CMO (Sample2Barcode.csv)

File Format - Flex v2 Probe Barcodes (Sample2Barcode.csv)

Folder Structure

System Reference Files

Core Workflows

Workflow 1: OCM (On-Chip Multiplexing)

Workflow 2: Flex v2 Probe Barcode Multiplexing

Workflow 3: Single-Hashtag Samples (HTO/CMO Standard)

Workflow 4: Double-Hashtag Samples (HTO)

Workflow 5: Multiple Pools (OCM, HTO, or Flex v2)

Step-by-Step Guide

Step 1: Create metaData Folder

Step 2: Determine Hashtag-to-Sample Mapping

Step 3: Generate Sample2Barcode.csv Files

Step 4: Copy to gstore

Step 5: Configure SUSHI Parameters

SUSHI Integration

Required Parameters by Multiplexing Type

OCM (On-Chip Multiplexing)

HTO/Hashtag (TotalSeq-B/C)

Flex v2 (Fixed RNA Panel)

MultiplexBarcodeSet Options

Common Errors and Solutions

Complete Example: Project p39685

Background

Sample2Barcode Files Created

SUSHI Configuration Used

Validation Checklist

OCM Multiplexing

HTO/CMO Multiplexing

Resources

Bundled Files

Related Skills

External Resources

Troubleshooting

"No cells assigned to sample"

Double-hashing samples not demultiplexing

Feature type mismatch errors

Similar Skills

Sample2Barcode Generation Skill

Overview

When to Use

Quick Reference

File Format - OCM (Sample2Barcode.csv)

File Format - HTO/CMO (Sample2Barcode.csv)

File Format - Flex v2 Probe Barcodes (Sample2Barcode.csv)

Folder Structure

System Reference Files

Core Workflows

Workflow 1: OCM (On-Chip Multiplexing)

Workflow 2: Flex v2 Probe Barcode Multiplexing

Workflow 3: Single-Hashtag Samples (HTO/CMO Standard)

Workflow 4: Double-Hashtag Samples (HTO)

Workflow 5: Multiple Pools (OCM, HTO, or Flex v2)

Step-by-Step Guide

Step 1: Create metaData Folder

Step 2: Determine Hashtag-to-Sample Mapping

Step 3: Generate Sample2Barcode.csv Files

Step 4: Copy to gstore

Step 5: Configure SUSHI Parameters

SUSHI Integration

Required Parameters by Multiplexing Type

OCM (On-Chip Multiplexing)

HTO/Hashtag (TotalSeq-B/C)

Flex v2 (Fixed RNA Panel)