From histai-skillsets
Submit slide classification training jobs on Azure GPU compute. Supports two workflows: full pipeline (recommended for best results) and quick mode (single model, fast iteration).
How this skill is triggered — by the user, by Claude, or both
Slash command
/histai-skillsets:ai_model_trainerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Submit slide classification training jobs on Azure GPU compute. Supports two workflows: **full pipeline** (recommended for best results) and **quick mode** (single model, fast iteration).
Submit slide classification training jobs on Azure GPU compute. Supports two workflows: full pipeline (recommended for best results) and quick mode (single model, fast iteration).
https://prod.celldx.net/v1/ml-jobs
All requests require X-API-Key: ${CELLDX_API_KEY} header.
Read this before doing anything else. This is the most common mistake agents make.
CellDX has two completely independent workflows, each with its own dataset, cost model, and skill:
| Workflow | Skill | Dataset | What it costs | When to use |
|---|---|---|---|---|
| Buy WSIs | cohort_builder | Whole Slide Images (220K+ slides instantly available, 1M+ via custom request) | Per-slide pricing: $5/H&E, $40/IHC (volume discounts apply) | User wants to download WSI files for external use, manual review, or their own pipeline |
| Train a model | ai_model_trainer (this skill) | Pre-extracted feature vectors (~66K H&E slides only — IHC slides are not in the feature store) | GPU compute only (session billing in $/GPU-hour) | User wants to train a classifier on CellDX infrastructure |
/v1/datahub/cohorts, do not call /pay, do not download WSIs when the user asks you to train a model. The trainer reads features directly from the server-side feature store. The user does not own and does not need to own the underlying WSIs.cohort_builder skill is irrelevant to training and must not be quoted to the user in a training context./v1/ml-jobs/data/slides/features to check which file_ids have features available. Two shapes on the same endpoint: GET returns the full set of available file_ids in one round trip (preferred — cache locally, intersect against your candidate list); POST {file_ids: [...]} filters a specific list and reports what's missing. Both are read-only, free, and do NOT trigger any purchase. This is the ONLY correct way to assemble a training cohort.cohort.train / cohort.val / cohort.test) is an in-memory job parameter — it is NOT the same as a Datahub /v1/datahub/cohorts cohort. Same word, different concept. Never reuse a Datahub cohort ID as a training cohort.cohort_builder, GPU cost from this skill). Do not bundle them; do not imply one requires the other.If at any point you find yourself about to call a /v1/datahub/cohorts* or /v1/billing/topup endpoint as part of a training task — stop. You are in the wrong workflow.
The recommended workflow runs three phases as separate jobs to find the best model automatically:
Submit a parameter tuning job with hp_tuning.enabled = true. This systematically tests different training configurations — 2 settings per stage, locks the best values, then moves on to the next stage.
Recommended stages (adjust values based on /v1/ml-jobs/options tunable_params):
hyperparams.learning_rate × finetune.hidden_dimsfinetune.head_dropout × finetune.patch_dropoutfinetune.label_smoothing × hyperparams.weight_decayUse attention_mil (default strategy) for parameter tuning.
hp_tuning.method)grid (default) — exhaustive Cartesian product per stage. Use when each stage has at most ~6 combos (e.g. 2 params × 2–3 values). Predictable cost, no missed combinations.random — sample n_trials_per_stage combos uniformly without replacement, seeded by hp_tuning.seed. Use when the stage's grid is large (≥ 8 combos, e.g. 3 params × 3 values, or 2 params × 4–5 values) and a full grid would blow the GPU budget. Random search is also a good default when the user gives many candidate values per param and you want to cap total trials.Pick one method per job — it applies to all stages. If you switch to random, set n_trials_per_stage to a value ≤ the smallest stage grid; combos beyond that count are simply skipped, so picking too small a number weakens the search.
When the tuning job completes, read the results from GET /v1/ml-jobs/jobs/{job_id}/metrics to extract the best parameter values.
Submit one job per strategy using the best parameters from Phase 1. Available strategies:
| Strategy | Description | Best for |
|---|---|---|
pooling_mlp | Simple pooling + classifier. Fast, robust baseline. | Small cohorts, noisy data |
attention_mil | Attention-based aggregation. Learns which tissue regions matter most. | Default, medium cohorts |
clam_mil | CLAM with region-level supervision. | Large cohorts, interpretability |
lora_mil | Fine-tunes the feature extractor with lightweight adapters. | Adapting to rare tissue types |
Submit all strategy jobs in parallel. Use the same cohort, data_source, and tuned parameters for fair comparison. Vary only finetune.strategy (and strategy-specific params like aggregator.type).
Aggregator pairing: pooling_mlp uses mean_pool (or mean_max_pool); attention_mil, clam_mil, and lora_mil use abmil.
After all strategy jobs complete, compare val/auroc (or the target metric) across jobs. The job with the highest metric has the best checkpoint. Report the winning strategy and its metrics to the user.
After selecting the best model, do not deploy automatically. Present the results to the user and let them decide whether to deploy.
When the user approves deployment, submit:
POST /v1/ml-jobs/jobs/{job_id}/deploy
X-API-Key: ${CELLDX_API_KEY}
{
"title": "Breast Cancer Subtype Classifier",
"description": "Binary classifier for breast cancer subtypes trained on 200 H&E slides.",
"organ": "Breast"
}
Required fields:
title — short name for the widget (max 200 chars)organ — target organ/tissue type (e.g. "Breast", "Skin", "Lung")Optional fields:
description — detailed description (max 2000 chars). If omitted, auto-generated from training metrics.Response 201:
{
"deploymentId": "a1b2c3d4-...",
"status": "PENDING"
}
Deployment is asynchronous. Poll for status:
GET /v1/ml-jobs/deployments/{deploymentId}
X-API-Key: ${CELLDX_API_KEY}
Response:
{
"deploymentId": "a1b2c3d4-...",
"jobId": "67a123...",
"status": "SUCCESS",
"widgetId": "e5f6g7h8-..."
}
Status values: PENDING → PROCESSING → SUCCESS | FAILED. The widgetId is only present on SUCCESS.
If deploy is called again for the same job, the existing deployment is returned (idempotent).
Error responses:
404 — job not found409 — job is not SUCCEEDEDGuidelines for the description: Include the classification objective, cohort size, and key performance metrics (val/auroc, val/accuracy). Example: "Classifies breast tissue slides as Lobular carcinoma or Invasive breast carcinoma NOS. Trained on 180 slides (120 train / 60 val). Best val/auroc: 0.95."
To archive a previously deployed custom widget (hides it from the user's installed widgets without deleting it):
POST /custom-widgets/{widgetId}/archive
X-API-Key: ${CELLDX_API_KEY}
Response — CustomWidgetDto:
{
"widgetId": "e5f6g7h8-...",
"name": "Lobular vs NOS breast carcinoma",
"description": "...",
"isInstalled": false,
"imageURL": "https://...",
"tags": ["WSI", "classification"],
"recommendedMagnification": 20,
"labels": [{"...": "..."}],
"status": "ARCHIVED"
}
Use this when the user wants to retire an old model from their widget list while keeping it recoverable.
When the user wants to skip parameter tuning and run a single model (e.g., they already know good settings or want a quick test):
GET /v1/ml-jobs/options for valid rangesPOST /v1/ml-jobs/jobs with hp_tuning.enabled = false (default) and the desired finetune.strategyCall GET /v1/ml-jobs/options to get available strategies, aggregators, parameter ranges, tuning config, and data formats.
Ask the user to choose:
pooling_mlp, attention_mil, clam_mil, or lora_milmean_pool, max_pool, mean_max_pool, or abmilBefore building the cohort, filter the slide list to only slides that have extracted features. Not all slides in the database have features yet (~66K available).
Two shapes on the same endpoint — pick whichever matches the workload:
Bulk fetch (preferred for cohort assembly): GET returns every file_id with extracted features in one round trip. Cache the response and intersect against your candidate list locally — avoids one round-trip per slide.
GET /v1/ml-jobs/data/slides/features
X-API-Key: ${CELLDX_API_KEY}
Response:
{
"file_ids": ["<file_id_1>", "<file_id_2>", ...],
"total_available": 66191
}
Filter a known list: POST filters a specific candidate set and reports what's missing.
POST /v1/ml-jobs/data/slides/features
X-API-Key: ${CELLDX_API_KEY}
{ "file_ids": ["<file_id_1>", "<file_id_2>", ...] }
Response:
{
"available": ["<file_id_1>", ...],
"missing": ["<file_id_2>", ...],
"total_requested": 200,
"total_available": 144
}
Use only the available list (POST) or the intersection with file_ids (GET) when building the cohort. If the available count is too low for meaningful training (e.g. fewer than 20 samples per class), inform the user and do not submit the job.
Note: The feature store currently contains extracted features for H&E slides only. IHC slides will always be missing — this is expected, not a data gap. If the user's cohort includes IHC slides, inform them before checking availability so they are not surprised.
Always split the available slides into train / val / test ≈ 75 / 15 / 15, stratified by class label. A non-empty test set is required — it is the only unbiased estimate of generalization, since val is implicitly overfit through HP tuning, early stopping, and best-checkpoint selection.
Rules:
test = [] when a 15% test set would yield ≥ 5 slides per class. The agent must produce a test cohort whenever the data permits — do not skip it to grow train.test, fall back to 80 / 20 (train / val) and explicitly tell the user that no held-out test set was created and why.test cohort is held out — it is not used for HP tuning, early stopping, or checkpoint selection. It is reported once at the end on the best checkpoint.When reporting results, always quote both val/* (model selection) and test/* (generalization) metrics.
Every job must belong to a session. A session groups all jobs for a single experiment — the same classification question, cohort, and objective.
Same session: Parameter tuning, multi-strategy comparison, and final model training for the same task all go in one session. Resuming a stopped/failed job also stays in the same session.
New session: Create a new session when starting a different experiment — different cohort, different classification objective (e.g. switching from "tumor vs normal" to "grade 1 vs grade 2"), or a fresh attempt with a fundamentally different approach.
Create one first:
POST /v1/ml-jobs/sessions
X-API-Key: ${CELLDX_API_KEY}
{ "name": "Breast cancer experiment v1" }
Response:
{
"session_id": "679def...",
"name": "Breast cancer experiment v1",
"status": "active",
"billing": { "total_gpu_hours": 0.0, "total_estimated_cost_usd": 0.0, "job_count": 0 }
}
Construct a POST /v1/ml-jobs/jobs request. Example for a single training job:
{
"name": "<descriptive-name>",
"session_id": "<session_id>",
"cohort": {
"train": [{"id": "<sample_id>", "label": "<class>"}],
"val": [{"id": "<sample_id>", "label": "<class>"}],
"test": [{"id": "<sample_id>", "label": "<class>"}]
},
"finetune": {"strategy": "attention_mil", "hidden_dims": [128]},
"aggregator": {"type": "abmil"},
"hyperparams": {"epochs": 50, "learning_rate": 2e-4},
"early_stop": {"enabled": false},
"tags": {"organ": "breast", "experiment": "v1"}
}
Example for a parameter tuning job:
{
"name": "<organ>-param-tune-v1",
"session_id": "<session_id>",
"cohort": { ... },
"finetune": {"strategy": "attention_mil", "hidden_dims": [128]},
"hp_tuning": {
"enabled": true,
"method": "grid",
"n_trials_per_stage": 8,
"seed": 42,
"metric": "val/auroc",
"mode": "max",
"epochs_per_trial": 25,
"early_stop_patience": 8,
"stages": [
{"params": [
{"path": "hyperparams.learning_rate", "values": [5e-5, 1e-4, 2e-4, 5e-4]},
{"path": "finetune.hidden_dims", "values": [[64], [128], [256]]}
]},
{"params": [
{"path": "finetune.head_dropout", "values": [0.3, 0.5, 0.7]},
{"path": "finetune.patch_dropout", "values": [0.0, 0.1, 0.3]}
]},
{"params": [
{"path": "finetune.label_smoothing", "values": [0.0, 0.1, 0.2]},
{"path": "hyperparams.weight_decay", "values": [0.0, 0.01, 0.1]}
]}
]
},
"tags": {"organ": "breast", "experiment": "param-tune"}
}
Response 201:
{ "job_id": "67a123...", "state": "QUEUED" }
Error responses:
402 Payment Required — insufficient balance (minimum $20 required to submit or resume a job). Top up the account before retrying.404 — session not found or belongs to another user409 — session is not active (closed, failed, or succeeded)502 — training infrastructure errorUse the Training Monitor skill to track progress. For the full pipeline, proceed through phases sequentially — each phase depends on the previous one's results.
/v1/ml-jobs/options/v1/ml-jobs/data/slides/features before submitting a job — prefer the GET form to fetch and cache the full available set, then build the cohort only from file_ids that are in it. (POST {file_ids: [...]} is the same check in filter form.) The endpoint is free and does not purchase anything.file_id UUIDs (slide-level); case_id UUIDs are not accepted by the trainerSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub histai/skillsets --plugin slide-analyzer