Skill

cytetype-annotation

AI-powered cell type annotation for single-cell RNA-seq data using CyteTypeR on FGCZ infrastructure. This skill should be used when performing automated cell type annotation on clustered Seurat objects via the Nygen Analytics CyteType API, especially for PBMC, immune, or other well-characterized cell types. Handles complete workflow from preprocessing through annotation mapping and visualization.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/single-cell-ml:cytetype-annotation

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Perform AI-powered automated cell type annotation on clustered Seurat objects using CyteTypeR and the Nygen Analytics CyteType API. This workflow integrates with FGCZ infrastructure for batch processing and follows FGCZ reporting standards.

SKILL.md

477 lines · ~4k tokens

Stats

LanguagePython

Parent stars0

MaintenanceFair

Last CommitMay 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

CyteTypeR Cell Type Annotation

Purpose

CyteTypeR leverages large language models to automatically annotate cell types based on marker gene expression patterns. It provides:

Automated cluster-level cell type annotations with ontology terms
Granular cell state/subtype classifications
Justifications with supporting and conflicting markers
Integration with Seurat workflows on FGCZ HPC

The tool works best for well-characterized cell types (immune cells, PBMCs, neurons, etc.) and requires clustered Seurat objects with identified marker genes.

When to Use This Skill

Use this skill when:

Performing cell type annotation on clustered scRNA-seq data
Working with Seurat objects that have completed clustering and marker discovery
Annotating immune cells, PBMCs, or other well-characterized cell populations
Requiring consistent, reproducible cell type assignments across multiple samples
Needing ontology-based annotations for downstream integration

Requirements:

Access to Nygen Analytics CyteType API (not OpenRouter or generic LLM APIs)
Clustered Seurat object with completed UMAP/t-SNE
Marker genes identified via FindAllMarkers()

Complete Workflow

Prerequisites Check

Before starting, verify:

Seurat object has been normalized and clustered (seurat_clusters present)
UMAP or t-SNE dimensionality reduction computed
Marker genes identified with FindAllMarkers()
R environment has CyteTypeR, Seurat, dplyr packages installed

Data Preprocessing

Ensure the Seurat object meets CyteTypeR requirements. If starting from an older Seurat object or missing preprocessing steps:

library(Seurat)
library(dplyr)

# Update old Seurat objects if needed
seurat_obj <- UpdateSeuratObject(seurat_obj)

# Standard preprocessing (if not already done)
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)

all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)

# Clustering (REQUIRED for CyteTypeR)
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:10)
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)

# Dimensionality reduction (REQUIRED for CyteTypeR)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:10)

Marker Gene Discovery

CyteTypeR requires cluster marker genes. Always use positive markers only and apply quality filtering:

# Find markers for all clusters (positive markers only)
cluster_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE)

# Apply quality filtering
cluster_markers_filtered <- cluster_markers |>
  group_by(cluster) |>
  dplyr::filter(avg_log2FC > 1)

# Verify markers exist for all clusters
cat("Markers found for", length(unique(cluster_markers_filtered$cluster)), "clusters\n")

Prepare Data for CyteTypeR

Use PrepareCyteTypeR() to package the Seurat object and markers for API submission:

library(CyteTypeR)

# Prepare data for annotation
prepped_data <- PrepareCyteTypeR(
  seurat_obj,
  cluster_markers_filtered,
  n_top_genes = 10,                    # Number of top marker genes per cluster
  group_key = 'seurat_clusters',       # Metadata column with cluster IDs
  aggregate_metadata = TRUE,           # Aggregate cell-level metadata by cluster
  coordinates_key = "umap"             # Dimensionality reduction to use (umap/tsne)
)

cat("✓ CyteTypeR data preparation successful!\n")
cat("\nPrepped data structure:\n")
str(prepped_data, max.level = 1)

# Save prepped data for reuse
output_dir <- "/srv/GT/analysis/pXXXXX/Outputs/"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
saveRDS(prepped_data, file.path(output_dir, "cytetype_prepped_data.rds"))

Important notes:

PrepareCyteTypeR() validates marker table structure and checks for required columns
Gene symbols are extracted from Seurat rownames if not in meta.features
Expression percentages are calculated in batches (reports progress for large datasets)

Submit Annotation Job

Submit the prepared data to the Nygen Analytics CyteType API. This requires valid API credentials:

# Define optional metadata for tracking
metadata <- list(
  title = 'scRNA-seq cell type annotation',
  run_label = paste0('analysis_', format(Sys.Date(), "%Y%m%d")),
  experiment_name = 'project_pXXXXX'
)

# Submit annotation job
# Note: This submits to Nygen Analytics API (NOT OpenRouter or other LLM providers)
annotation_result <- CyteTypeR(
  obj = seurat_obj,
  prepped_data = prepped_data,
  study_context = "Human peripheral blood mononuclear cells (PBMCs)",  # Describe sample type
  metadata = metadata
)

cat("✓ Annotation complete!\n")

# Save annotation results
saveRDS(annotation_result, file.path(output_dir, "cytetype_annotation.rds"))

Expected behavior:

Function submits job to https://nygen-labs-prod--cytetype-api.modal.run/annotate
Provides job ID and live report URL
Polls API for completion (typically 5-15 minutes for standard datasets)
Returns Seurat object with annotations in @misc$cytetype_results

API output example:

INFO [2025-11-19 12:51:36] Submitting job to CyteType...
INFO [2025-11-19 12:52:10] CyteType job (id: 78f00e00-...) submitted. Polling for results...
INFO [2025-11-19 12:52:15] Report (updates automatically) available at:
https://nygen-labs-prod--cytetype-api.modal.run/report/78f00e00-...
INFO [2025-11-19 12:52:15] If disconnected, retrieve results with: GetResults()
[DONE] [✔✔✔✔✔✔✔✔✔] 9/9 completed
INFO [2025-11-19 13:06:39] Job 78f00e00-... completed successfully.

Important: CyteTypeR requires Nygen Analytics API access. OpenRouter or other LLM provider API keys will not work. If receiving "Job submission failed" errors, verify API credentials with Nygen Analytics.

Extract and Map Annotations

CyteTypeR returns cluster-level annotations that must be mapped to individual cells:

# Access annotation results (stored in Seurat @misc slot)
cytetype_results <- seurat_obj@misc$cytetype_results

# Inspect result structure
cat("Annotation fields:\n")
names(cytetype_results)
# Expected: clusterId, annotation, ontologyTerm, granularAnnotation, cellState,
#          justification, supportingMarkers, conflictingMarkers,
#          missingExpression, unexpectedExpression

# View annotations table
head(cytetype_results)

# Map cluster-level annotations to cells
cytetype_map <- setNames(
  cytetype_results$annotation,
  cytetype_results$clusterId
)

[email protected]$cytetype_annotation <- cytetype_map[as.character([email protected]$seurat_clusters)]

# Optional: Add granular annotations
cytetype_granular_map <- setNames(
  cytetype_results$granularAnnotation,
  cytetype_results$clusterId
)

[email protected]$cytetype_granular <- cytetype_granular_map[as.character([email protected]$seurat_clusters)]

cat("✓ CyteTypeR annotations mapped to cells\n")
cat("  Cell types identified:", paste(unique([email protected]$cytetype_annotation), collapse = ", "), "\n")

Visualize Annotations

Create publication-quality visualizations following FGCZ standards:

library(ggplot2)
library(patchwork)

# FGCZ cluster colors (polychrome palette)
cluster_colors <- unname(pals::polychrome())

# UMAP with cell type annotations
p_celltype <- DimPlot(
  seurat_obj,
  reduction = "umap",
  group.by = "cytetype_annotation",
  label = TRUE,
  repel = TRUE,
  order = TRUE,
  cols = cluster_colors,
  raster = FALSE
) +
  coord_fixed() +
  ggtitle("UMAP - CyteTypeR Cell Type Annotations") +
  theme(legend.position = "right")

print(p_celltype)
ggsave(file.path(output_dir, "cytetype_umap.png"), p_celltype,
       width = 12, height = 8, dpi = 300, bg = "white")

# Cell type distribution
celltype_counts <- as.data.frame(table(seurat_obj$cytetype_annotation))
colnames(celltype_counts) <- c("CellType", "Count")
celltype_counts$Percentage <- sprintf("%.1f%%", 100 * celltype_counts$Count / sum(celltype_counts$Count))

p_bar <- ggplot(celltype_counts, aes(x = reorder(CellType, -Count), y = Count, fill = CellType)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = Percentage), vjust = -0.5, size = 3) +
  scale_fill_manual(values = cluster_colors) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none") +
  labs(x = "Cell Type", y = "Number of Cells",
       title = "CyteTypeR Cell Type Distribution")

print(p_bar)
ggsave(file.path(output_dir, "cytetype_barplot.png"), p_bar,
       width = 10, height = 6, dpi = 300, bg = "white")

Inspect Annotation Quality

Review supporting evidence and potential conflicts for each annotation:

# Check supporting markers for each cluster
for (i in 1:nrow(cytetype_results)) {
  cat("\n========================================\n")
  cat("Cluster:", cytetype_results$clusterId[i], "\n")
  cat("Annotation:", cytetype_results$annotation[i], "\n")
  cat("Ontology:", cytetype_results$ontologyTerm[i], "\n")
  cat("Granular:", cytetype_results$granularAnnotation[i], "\n")
  cat("\nSupporting markers:", cytetype_results$supportingMarkers[i], "\n")
  cat("Conflicting markers:", cytetype_results$conflictingMarkers[i], "\n")
  cat("\nJustification:", substr(cytetype_results$justification[i], 1, 200), "...\n")
}

# Export full annotation table
write.csv(cytetype_results, file.path(output_dir, "cytetype_annotations.csv"), row.names = FALSE)

FGCZ Integration

SBATCH Script Template

For batch processing on FGCZ HPC:

#!/bin/bash
#SBATCH --job-name=cytetype_annotation
#SBATCH --output=/srv/GT/analysis/pXXXXX/cytetype_%j.log
#SBATCH --error=/srv/GT/analysis/pXXXXX/cytetype_%j.err
#SBATCH --time=02:00:00
#SBATCH --mem-per-cpu=8G
#SBATCH --cpus-per-task=8
#SBATCH --partition=employee

echo "=========================================="
echo "CyteTypeR Cell Type Annotation"
echo "Started: $(date)"
echo "=========================================="

# Load R module
module load Dev/R/4.5.0

# Navigate to analysis directory
cd /srv/GT/analysis/pXXXXX

# Run annotation script
Rscript cytetype_annotation.R

if [ $? -eq 0 ]; then
  echo "✓ Annotation completed successfully!"
  echo "Results saved to: /srv/GT/analysis/pXXXXX/Outputs/"
else
  echo "✗ Annotation failed - check error log"
  exit 1
fi

echo "Completed: $(date)"

Submit with: sbatch cytetype_annotation.sh

File Organization

Standard FGCZ directory structure:

/srv/GT/analysis/pXXXXX/
├── cytetype_annotation.R           # Main analysis script
├── cytetype_annotation.sh          # SBATCH submission script
├── Outputs/
│   ├── cytetype_prepped_data.rds   # Prepared data for reuse
│   ├── cytetype_annotation.rds     # Full annotation results
│   ├── cytetype_annotations.csv    # Annotation table
│   ├── cytetype_umap.png           # UMAP visualization
│   └── cytetype_barplot.png        # Distribution plot
└── seurat_annotated.qs2            # Final annotated Seurat object

Saving Annotated Objects

Use qs2 format for efficient storage on FGCZ infrastructure:

library(qs2)

# Save final annotated object
qs2::qs_save(
  seurat_obj,
  "/srv/GT/analysis/pXXXXX/seurat_annotated.qs2",
  nthreads = 16
)

# Copy to gstore for long-term storage
# Note: Use g-req copynow for small files, g-req copy -w for large files
# g-req copynow /srv/GT/analysis/pXXXXX/seurat_annotated.qs2 /srv/gstore/projects/pXXXXX/Analyses_Paul/

R Markdown Integration

For reports following FGCZ standards, use this template:

## Cell Type Annotation {.tabset}

### CyteTypeR Preparation

```{r cytetype_prep}
library(CyteTypeR)

# Prepare data for annotation
prepped_data <- PrepareCyteTypeR(
  seurat_obj,
  cluster_markers,
  n_top_genes = 10,
  group_key = 'seurat_clusters',
  aggregate_metadata = TRUE,
  coordinates_key = "umap"
)

cat("✓ Data preparation complete\n")
```

### Run Annotation

```{r cytetype_annotate, eval=FALSE}
# Note: Set eval=TRUE only if API credentials are configured
annotation_result <- CyteTypeR(
  obj = seurat_obj,
  prepped_data = prepped_data,
  study_context = "Human PBMCs from healthy donors"
)

seurat_obj@misc$cytetype_results <- annotation_result
```

### Visualization

```{r cytetype_viz, fig.width=12, fig.height=8}
# Map annotations to cells
cytetype_map <- setNames(
  seurat_obj@misc$cytetype_results$annotation,
  seurat_obj@misc$cytetype_results$clusterId
)

[email protected]$cytetype_annotation <-
  cytetype_map[as.character([email protected]$seurat_clusters)]

# UMAP plot
cluster_colors <- unname(pals::polychrome())
p <- DimPlot(seurat_obj, reduction = "umap",
             group.by = "cytetype_annotation",
             label = TRUE, repel = TRUE, order = TRUE,
             cols = cluster_colors, raster = FALSE) +
  coord_fixed() +
  ggtitle("CyteTypeR Cell Type Annotations")

print(p)
ggsave(file.path(output_dir, "cytetype_umap.png"), p,
       width = 12, height = 8, dpi = 300, bg = "white")
```

### Annotation Results

```{r cytetype_table}
library(knitr)

# Display annotation summary
annotation_summary <- seurat_obj@misc$cytetype_results |>
  select(clusterId, annotation, granularAnnotation, ontologyTerm)

kable(annotation_summary, caption = "CyteTypeR Cell Type Annotations")
```

Troubleshooting

API Access Issues

Problem: "Job submission failed" error Cause: CyteTypeR requires Nygen Analytics API credentials (not OpenRouter) Solution: Contact Nygen Analytics for API access or use alternative annotation methods (Seurat label transfer, scType, etc.)

Old Seurat Object Errors

Problem: "no slot of name 'images' for this object of class 'Seurat'" Cause: Seurat object from older version Solution: Update object with seurat_obj <- UpdateSeuratObject(seurat_obj)

Missing Clusters in Results

Problem: Some clusters not annotated Cause: Insufficient marker genes for those clusters Solution:

Reduce avg_log2FC threshold in marker filtering
Increase n_top_genes in PrepareCyteTypeR()
Check marker discovery with table(cluster_markers$cluster)

Large Dataset Performance

Problem: Slow data preparation or API submission Cause: Large number of cells/features Solution:

Use subset for testing: seurat_subset <- subset(seurat_obj, downsample = 1000)
Increase --mem-per-cpu in SBATCH script
Process in batches by splitting clusters

Additional Resources

CyteTypeR GitHub: https://github.com/NygenAnalytics/CyteTypeR
Nygen Analytics Documentation: Check references/cytetype_guide.md for detailed API specifications
FGCZ Seurat Workflows: See CLAUDE.md in project repository
Example Analysis: /home/pgueguen/git/paul-scripts/Internal_Dev/CyteType_test/

Key Considerations

API Access Required: CyteTypeR is not a purely local tool - it submits to Nygen Analytics' cloud API
Preprocessing Critical: Must have completed clustering and marker discovery before annotation
Best for Known Cell Types: Works best with well-characterized populations (immune, PBMC, neurons)
Cluster-Level Annotations: Returns cluster-level assignments that must be mapped to cells
Processing Time: API jobs typically take 5-15 minutes; use GetResults() to retrieve if disconnected
FGCZ Standards: Follow polychrome colors, coord_fixed(), 300 DPI output, and qs2 storage

cytetype-annotation

Invocation

Context Preview

SKILL.md

cytetype-annotation

Invocation

Context Preview

SKILL.md

CyteTypeR Cell Type Annotation

Purpose

When to Use This Skill

Complete Workflow

Prerequisites Check

Data Preprocessing

Marker Gene Discovery

Prepare Data for CyteTypeR

Submit Annotation Job

Extract and Map Annotations

Visualize Annotations

Inspect Annotation Quality

FGCZ Integration

SBATCH Script Template

File Organization

Saving Annotated Objects

R Markdown Integration

Troubleshooting

API Access Issues

Old Seurat Object Errors

Missing Clusters in Results

Large Dataset Performance

Additional Resources

Key Considerations

Similar Skills

CyteTypeR Cell Type Annotation

Purpose

When to Use This Skill

Complete Workflow

Prerequisites Check

Data Preprocessing

Marker Gene Discovery

Prepare Data for CyteTypeR

Submit Annotation Job

Extract and Map Annotations

Visualize Annotations

Inspect Annotation Quality

FGCZ Integration

SBATCH Script Template

File Organization

Saving Annotated Objects

R Markdown Integration

Troubleshooting

API Access Issues

Old Seurat Object Errors

Missing Clusters in Results

Large Dataset Performance

Additional Resources

Key Considerations

Similar Skills