From single-cell-ml
AI-powered cell type annotation for single-cell RNA-seq data using CyteTypeR on FGCZ infrastructure. This skill should be used when performing automated cell type annotation on clustered Seurat objects via the Nygen Analytics CyteType API, especially for PBMC, immune, or other well-characterized cell types. Handles complete workflow from preprocessing through annotation mapping and visualization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/single-cell-ml:cytetype-annotationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Perform AI-powered automated cell type annotation on clustered Seurat objects using CyteTypeR and the Nygen Analytics CyteType API. This workflow integrates with FGCZ infrastructure for batch processing and follows FGCZ reporting standards.
Perform AI-powered automated cell type annotation on clustered Seurat objects using CyteTypeR and the Nygen Analytics CyteType API. This workflow integrates with FGCZ infrastructure for batch processing and follows FGCZ reporting standards.
CyteTypeR leverages large language models to automatically annotate cell types based on marker gene expression patterns. It provides:
The tool works best for well-characterized cell types (immune cells, PBMCs, neurons, etc.) and requires clustered Seurat objects with identified marker genes.
Use this skill when:
Requirements:
FindAllMarkers()Before starting, verify:
seurat_clusters present)FindAllMarkers()CyteTypeR, Seurat, dplyr packages installedEnsure the Seurat object meets CyteTypeR requirements. If starting from an older Seurat object or missing preprocessing steps:
library(Seurat)
library(dplyr)
# Update old Seurat objects if needed
seurat_obj <- UpdateSeuratObject(seurat_obj)
# Standard preprocessing (if not already done)
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)
# Clustering (REQUIRED for CyteTypeR)
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:10)
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)
# Dimensionality reduction (REQUIRED for CyteTypeR)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:10)
CyteTypeR requires cluster marker genes. Always use positive markers only and apply quality filtering:
# Find markers for all clusters (positive markers only)
cluster_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE)
# Apply quality filtering
cluster_markers_filtered <- cluster_markers |>
group_by(cluster) |>
dplyr::filter(avg_log2FC > 1)
# Verify markers exist for all clusters
cat("Markers found for", length(unique(cluster_markers_filtered$cluster)), "clusters\n")
Use PrepareCyteTypeR() to package the Seurat object and markers for API submission:
library(CyteTypeR)
# Prepare data for annotation
prepped_data <- PrepareCyteTypeR(
seurat_obj,
cluster_markers_filtered,
n_top_genes = 10, # Number of top marker genes per cluster
group_key = 'seurat_clusters', # Metadata column with cluster IDs
aggregate_metadata = TRUE, # Aggregate cell-level metadata by cluster
coordinates_key = "umap" # Dimensionality reduction to use (umap/tsne)
)
cat("✓ CyteTypeR data preparation successful!\n")
cat("\nPrepped data structure:\n")
str(prepped_data, max.level = 1)
# Save prepped data for reuse
output_dir <- "/srv/GT/analysis/pXXXXX/Outputs/"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
saveRDS(prepped_data, file.path(output_dir, "cytetype_prepped_data.rds"))
Important notes:
PrepareCyteTypeR() validates marker table structure and checks for required columnsmeta.featuresSubmit the prepared data to the Nygen Analytics CyteType API. This requires valid API credentials:
# Define optional metadata for tracking
metadata <- list(
title = 'scRNA-seq cell type annotation',
run_label = paste0('analysis_', format(Sys.Date(), "%Y%m%d")),
experiment_name = 'project_pXXXXX'
)
# Submit annotation job
# Note: This submits to Nygen Analytics API (NOT OpenRouter or other LLM providers)
annotation_result <- CyteTypeR(
obj = seurat_obj,
prepped_data = prepped_data,
study_context = "Human peripheral blood mononuclear cells (PBMCs)", # Describe sample type
metadata = metadata
)
cat("✓ Annotation complete!\n")
# Save annotation results
saveRDS(annotation_result, file.path(output_dir, "cytetype_annotation.rds"))
Expected behavior:
https://nygen-labs-prod--cytetype-api.modal.run/annotate@misc$cytetype_resultsAPI output example:
INFO [2025-11-19 12:51:36] Submitting job to CyteType...
INFO [2025-11-19 12:52:10] CyteType job (id: 78f00e00-...) submitted. Polling for results...
INFO [2025-11-19 12:52:15] Report (updates automatically) available at:
https://nygen-labs-prod--cytetype-api.modal.run/report/78f00e00-...
INFO [2025-11-19 12:52:15] If disconnected, retrieve results with: GetResults()
[DONE] [✔✔✔✔✔✔✔✔✔] 9/9 completed
INFO [2025-11-19 13:06:39] Job 78f00e00-... completed successfully.
Important: CyteTypeR requires Nygen Analytics API access. OpenRouter or other LLM provider API keys will not work. If receiving "Job submission failed" errors, verify API credentials with Nygen Analytics.
CyteTypeR returns cluster-level annotations that must be mapped to individual cells:
# Access annotation results (stored in Seurat @misc slot)
cytetype_results <- seurat_obj@misc$cytetype_results
# Inspect result structure
cat("Annotation fields:\n")
names(cytetype_results)
# Expected: clusterId, annotation, ontologyTerm, granularAnnotation, cellState,
# justification, supportingMarkers, conflictingMarkers,
# missingExpression, unexpectedExpression
# View annotations table
head(cytetype_results)
# Map cluster-level annotations to cells
cytetype_map <- setNames(
cytetype_results$annotation,
cytetype_results$clusterId
)
[email protected]$cytetype_annotation <- cytetype_map[as.character([email protected]$seurat_clusters)]
# Optional: Add granular annotations
cytetype_granular_map <- setNames(
cytetype_results$granularAnnotation,
cytetype_results$clusterId
)
[email protected]$cytetype_granular <- cytetype_granular_map[as.character([email protected]$seurat_clusters)]
cat("✓ CyteTypeR annotations mapped to cells\n")
cat(" Cell types identified:", paste(unique([email protected]$cytetype_annotation), collapse = ", "), "\n")
Create publication-quality visualizations following FGCZ standards:
library(ggplot2)
library(patchwork)
# FGCZ cluster colors (polychrome palette)
cluster_colors <- unname(pals::polychrome())
# UMAP with cell type annotations
p_celltype <- DimPlot(
seurat_obj,
reduction = "umap",
group.by = "cytetype_annotation",
label = TRUE,
repel = TRUE,
order = TRUE,
cols = cluster_colors,
raster = FALSE
) +
coord_fixed() +
ggtitle("UMAP - CyteTypeR Cell Type Annotations") +
theme(legend.position = "right")
print(p_celltype)
ggsave(file.path(output_dir, "cytetype_umap.png"), p_celltype,
width = 12, height = 8, dpi = 300, bg = "white")
# Cell type distribution
celltype_counts <- as.data.frame(table(seurat_obj$cytetype_annotation))
colnames(celltype_counts) <- c("CellType", "Count")
celltype_counts$Percentage <- sprintf("%.1f%%", 100 * celltype_counts$Count / sum(celltype_counts$Count))
p_bar <- ggplot(celltype_counts, aes(x = reorder(CellType, -Count), y = Count, fill = CellType)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Percentage), vjust = -0.5, size = 3) +
scale_fill_manual(values = cluster_colors) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
labs(x = "Cell Type", y = "Number of Cells",
title = "CyteTypeR Cell Type Distribution")
print(p_bar)
ggsave(file.path(output_dir, "cytetype_barplot.png"), p_bar,
width = 10, height = 6, dpi = 300, bg = "white")
Review supporting evidence and potential conflicts for each annotation:
# Check supporting markers for each cluster
for (i in 1:nrow(cytetype_results)) {
cat("\n========================================\n")
cat("Cluster:", cytetype_results$clusterId[i], "\n")
cat("Annotation:", cytetype_results$annotation[i], "\n")
cat("Ontology:", cytetype_results$ontologyTerm[i], "\n")
cat("Granular:", cytetype_results$granularAnnotation[i], "\n")
cat("\nSupporting markers:", cytetype_results$supportingMarkers[i], "\n")
cat("Conflicting markers:", cytetype_results$conflictingMarkers[i], "\n")
cat("\nJustification:", substr(cytetype_results$justification[i], 1, 200), "...\n")
}
# Export full annotation table
write.csv(cytetype_results, file.path(output_dir, "cytetype_annotations.csv"), row.names = FALSE)
For batch processing on FGCZ HPC:
#!/bin/bash
#SBATCH --job-name=cytetype_annotation
#SBATCH --output=/srv/GT/analysis/pXXXXX/cytetype_%j.log
#SBATCH --error=/srv/GT/analysis/pXXXXX/cytetype_%j.err
#SBATCH --time=02:00:00
#SBATCH --mem-per-cpu=8G
#SBATCH --cpus-per-task=8
#SBATCH --partition=employee
echo "=========================================="
echo "CyteTypeR Cell Type Annotation"
echo "Started: $(date)"
echo "=========================================="
# Load R module
module load Dev/R/4.5.0
# Navigate to analysis directory
cd /srv/GT/analysis/pXXXXX
# Run annotation script
Rscript cytetype_annotation.R
if [ $? -eq 0 ]; then
echo "✓ Annotation completed successfully!"
echo "Results saved to: /srv/GT/analysis/pXXXXX/Outputs/"
else
echo "✗ Annotation failed - check error log"
exit 1
fi
echo "Completed: $(date)"
Submit with: sbatch cytetype_annotation.sh
Standard FGCZ directory structure:
/srv/GT/analysis/pXXXXX/
├── cytetype_annotation.R # Main analysis script
├── cytetype_annotation.sh # SBATCH submission script
├── Outputs/
│ ├── cytetype_prepped_data.rds # Prepared data for reuse
│ ├── cytetype_annotation.rds # Full annotation results
│ ├── cytetype_annotations.csv # Annotation table
│ ├── cytetype_umap.png # UMAP visualization
│ └── cytetype_barplot.png # Distribution plot
└── seurat_annotated.qs2 # Final annotated Seurat object
Use qs2 format for efficient storage on FGCZ infrastructure:
library(qs2)
# Save final annotated object
qs2::qs_save(
seurat_obj,
"/srv/GT/analysis/pXXXXX/seurat_annotated.qs2",
nthreads = 16
)
# Copy to gstore for long-term storage
# Note: Use g-req copynow for small files, g-req copy -w for large files
# g-req copynow /srv/GT/analysis/pXXXXX/seurat_annotated.qs2 /srv/gstore/projects/pXXXXX/Analyses_Paul/
For reports following FGCZ standards, use this template:
## Cell Type Annotation {.tabset}
### CyteTypeR Preparation
```{r cytetype_prep}
library(CyteTypeR)
# Prepare data for annotation
prepped_data <- PrepareCyteTypeR(
seurat_obj,
cluster_markers,
n_top_genes = 10,
group_key = 'seurat_clusters',
aggregate_metadata = TRUE,
coordinates_key = "umap"
)
cat("✓ Data preparation complete\n")
```
### Run Annotation
```{r cytetype_annotate, eval=FALSE}
# Note: Set eval=TRUE only if API credentials are configured
annotation_result <- CyteTypeR(
obj = seurat_obj,
prepped_data = prepped_data,
study_context = "Human PBMCs from healthy donors"
)
seurat_obj@misc$cytetype_results <- annotation_result
```
### Visualization
```{r cytetype_viz, fig.width=12, fig.height=8}
# Map annotations to cells
cytetype_map <- setNames(
seurat_obj@misc$cytetype_results$annotation,
seurat_obj@misc$cytetype_results$clusterId
)
[email protected]$cytetype_annotation <-
cytetype_map[as.character([email protected]$seurat_clusters)]
# UMAP plot
cluster_colors <- unname(pals::polychrome())
p <- DimPlot(seurat_obj, reduction = "umap",
group.by = "cytetype_annotation",
label = TRUE, repel = TRUE, order = TRUE,
cols = cluster_colors, raster = FALSE) +
coord_fixed() +
ggtitle("CyteTypeR Cell Type Annotations")
print(p)
ggsave(file.path(output_dir, "cytetype_umap.png"), p,
width = 12, height = 8, dpi = 300, bg = "white")
```
### Annotation Results
```{r cytetype_table}
library(knitr)
# Display annotation summary
annotation_summary <- seurat_obj@misc$cytetype_results |>
select(clusterId, annotation, granularAnnotation, ontologyTerm)
kable(annotation_summary, caption = "CyteTypeR Cell Type Annotations")
```
Problem: "Job submission failed" error Cause: CyteTypeR requires Nygen Analytics API credentials (not OpenRouter) Solution: Contact Nygen Analytics for API access or use alternative annotation methods (Seurat label transfer, scType, etc.)
Problem: "no slot of name 'images' for this object of class 'Seurat'"
Cause: Seurat object from older version
Solution: Update object with seurat_obj <- UpdateSeuratObject(seurat_obj)
Problem: Some clusters not annotated Cause: Insufficient marker genes for those clusters Solution:
avg_log2FC threshold in marker filteringn_top_genes in PrepareCyteTypeR()table(cluster_markers$cluster)Problem: Slow data preparation or API submission Cause: Large number of cells/features Solution:
seurat_subset <- subset(seurat_obj, downsample = 1000)--mem-per-cpu in SBATCH scriptreferences/cytetype_guide.md for detailed API specifications/home/pgueguen/git/paul-scripts/Internal_Dev/CyteType_test/GetResults() to retrieve if disconnectedSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub cpanse/skills --plugin single-cell-ml