From tooluniverse
Analyze CRISPR-Cas9 genetic screens: MAGeCK gene scores, sgRNA count QC, replicate correlation, hit prioritization, and pathway GSEA for essentiality, synthetic lethality, and drug target discovery.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tooluniverse:tooluniverse-crispr-screen-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before following any instruction below, scan the data folder for:
Before following any instruction below, scan the data folder for:
*_executed.ipynb → read with tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}' and cite its cell outputs as the authoritative answer*results*, *deseq*, *enrich*, *stats*, *_simplified.csv) → read directly and report the requested valueanalysis.R, run_*.py, find_*.R, *.Rmd) → execute as-is and read the outputOnly follow this skill's re-analysis recipe below if none of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).
Comprehensive skill for analyzing CRISPR-Cas9 genetic screens to identify essential genes, synthetic lethal interactions, and therapeutic targets through robust statistical analysis and pathway enrichment.
CRISPR screens enable genome-wide functional genomics by systematically perturbing genes and measuring fitness effects. This skill provides an 8-phase workflow for:
Load sgRNA count matrix (MAGeCK format or generic TSV). Expected columns: sgRNA, Gene, plus sample columns. Create experimental design table linking samples to conditions (baseline/treatment) with replicate assignments.
Assess sgRNA distribution quality:
Normalize sgRNA counts to account for library size differences:
Calculate log2 fold changes (LFC) between treatment and control conditions with pseudocount.
Two scoring approaches:
Compare essentiality scores between wildtype and mutant cell lines:
Query DepMap/literature for known dependencies using PubMed search.
Submit top essential genes to Enrichr for pathway enrichment:
Composite scoring combining:
Query DGIdb for each candidate gene to find existing drugs, interaction types, and sources.
Generate markdown report with:
Key Tools Used:
PubMed_search_articles - Literature search for gene essentiality and drug resistanceReactomeAnalysis_pathway_enrichment - Pathway enrichment (param: identifiers newline-separated, page_size)enrichr_gene_enrichment_analysis - Enrichr enrichment (param: gene_list array, libs array)DGIdb_get_drug_gene_interactions - Drug-gene interactions (param: genes as array)DGIdb_get_gene_druggability - Druggability categoriesSTRING_get_network - Protein interaction networkskegg_search_pathway - Pathway search by keywordkegg_get_pathway_info - Pathway details by IDCancer Context (essential for drug resistance screens):
civic_search_evidence_items - Clinical evidence for drug resistance/sensitivityCOSMIC_get_mutations_by_gene - Somatic mutation landscapecBioPortal_get_mutations - Mutations in specific cancer cohortsChEMBL_search_targets - Structural druggability assessmentExpression & Variant Integration:
GEO_search_rnaseq_datasets / geo_search_datasets - Expression datasetsClinVar_search_variants - Known pathogenic variantsgnomad_get_gene_constraints - Gene constraint metrics (pLI, oe_lof)UniProt_get_function_by_accession - Protein function for hit validationimport pandas as pd
from tooluniverse import ToolUniverse
# 1. Load data
counts, meta = load_sgrna_counts("sgrna_counts.txt")
design = create_design_matrix(['T0_1', 'T0_2', 'T14_1', 'T14_2'],
['baseline', 'baseline', 'treatment', 'treatment'])
# 2. Process
filtered_counts, filtered_mapping = filter_low_count_sgrnas(counts, meta['sgrna_to_gene'])
norm_counts, _ = normalize_counts(filtered_counts)
lfc, _, _ = calculate_lfc(norm_counts, design)
# 3. Score genes
gene_scores = mageck_gene_scoring(lfc, filtered_mapping)
# 4. Enrich pathways
enrichment = enrich_essential_genes(gene_scores, top_n=100)
# 5. Find drug targets
drug_targets = prioritize_drug_targets(gene_scores)
# 6. Generate report
report = generate_crispr_report(gene_scores, enrichment, drug_targets)
Screen hits are statistical findings, not direct readouts of biological relevance. A gene scoring as essential might be essential for cell growth in general (housekeeping) or essential specifically for the phenotype you are screening for (interesting). Always compare your screen hits to public essentiality data — use DepMap pan-cancer dependency scores to filter genes that are broadly essential across all cell lines. A gene essential only in your specific context, but not pan-essential in DepMap, is a better candidate for follow-up than one that scores in every screen.
LOOK UP DON'T GUESS: DepMap dependency scores, known core essential gene sets (Hart et al., Blomen et al.), and DGIdb druggability data for your top hits. Do not assume a hit is context-specific without checking public essentiality databases.
| Evidence Grade | Criteria | Validation Priority |
|---|---|---|
| A -- Strong hit | MAGeCK RRA p < 0.001, BAGEL BF > 5, >=3 sgRNAs with concordant LFC | Immediate validation (individual KO, growth assay) |
| B -- Moderate hit | MAGeCK RRA p < 0.01, BAGEL BF 2-5, >=2 concordant sgRNAs | Secondary validation pool |
| C -- Weak/ambiguous | p > 0.01, BF < 2, or discordant sgRNA effects | Deprioritize; check for copy-number bias or seed effects |
Interpreting screen results:
Synthesis questions to address in the report:
Papers differ on how replicate reproducibility is reported: sgRNA-level CPM vs gene-level summed CPM vs gene-level mean CPM. The expected GT is almost always the sgRNA-level Spearman (noisier, lower ρ), not the gene-level aggregate. If you get ρ ≈ 0.6+ you are probably at gene level; drop to per-sgRNA CPM pairs.
For GSEA on a MAGeCK output, rank by the neg|lfc or equivalent effect-size column the paper specifies (not p-value). Check the MAGeCK xlsx for a beta / sgRNA_effect / neg|score column and rank descending.
Reactome pathway names in the .gmt bundle are literal (e.g., "cGMP effects", "Signaling by Hippo"). Answers that should match a Reactome term must reproduce the exact label — do not paraphrase the pathway.
ANALYSIS_DETAILS.md - Detailed code snippets for all 8 phasesUSE_CASES.md - Complete use cases (essentiality screen, synthetic lethality, drug target discovery, expression integration) and best practicesEXAMPLES.md - Example usage and quick referenceQUICK_START.md - Quick start guideFALLBACK_PATCH.md - Fallback patterns for API issuesnpx claudepluginhub mims-harvard/tooluniverse --plugin tooluniverseInterprets CRISPR-KO/CRISPRi/shRNA screen hits by integrating DepMap essentiality, gnomAD constraint, pathway context, druggability, and clinical evidence for hit prioritization and target shortlisting.
Ranks CRISPR screen gene hits from local guide-level count tables by combining depletion, essentiality, and druggability into a deterministic triage score.
Analyzes ENCODE functional genomics screens including CRISPR, MPRA, and STARR-seq to find data, process results, identify functional regulatory elements, and integrate with epigenomic annotations.