From tao-skill-bank
Run the canonical NVIDIA AOI three-phase training pipeline — Phase 1 AutoML baseline (HPO), Phase 2 DEFT loop (RCA → SDG → mining → plain-train retrain), Phase 3 AutoML refinement on the DEFT-augmented dataset. This is the default entry point for any "run the AOI workflow", "fine-tune my PCB AOI model end-to-end", "improve my AOI ChangeNet model", or "AOI workflow with AutoML" request — route here instead of tao-run-deft-aoi directly unless the user explicitly asks for the DEFT loop ONLY (e.g. "run JUST the DEFT loop", "skip AutoML, only DEFT"). Also handles the same three-phase pattern for non-AOI DEFT applications — AutoML baseline then DEFT loop warm-started from AutoML's winning HPs then post-DEFT AutoML refinement on the iteration-augmented dataset. Trigger phrases include "run the AOI workflow", "AOI end-to-end", "AutoML + DEFT", "AutoML then DEFT", "tune hyperparameters then DEFT", "DEFT with AutoML at both ends", "warm-start DEFT", "improve my AOI model".
How this skill is triggered — by the user, by Claude, or both
Slash command
/tao-skill-bank:tao-run-automl-deft-pipelineThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A workflow-bridge skill that runs **three phases** in sequence by delegating to two existing skills — `tao-run-automl` for HPO and a DEFT application skill (default `tao-run-deft-aoi` for AOI; other `skills/applications/deft-*` skills for non-AOI cases) for the iterative data-improvement loop.
A workflow-bridge skill that runs three phases in sequence by delegating to two existing skills — tao-run-automl for HPO and a DEFT application skill (default tao-run-deft-aoi for AOI; other skills/applications/deft-* skills for non-AOI cases) for the iterative data-improvement loop.
This skill does not re-implement AutoML or DEFT. It owns only the connective tissue: HPO spec inputs, the spec-handoff between AutoML and DEFT, and the post-DEFT AutoML re-run on the augmented dataset.
tao-run-deft-aoi directly. The bare DEFT loop is the inner stage of this pipeline.tao-run-deft-aoi directlytao-run-automl directlyPhase 1 (AutoML baseline) Phase 2 (DEFT loop, plain train) Phase 3 (AutoML refinement)
───────────────────────── ──────────────────────────────── ───────────────────────────
specs/baseline_spec.yaml (Phase 1 winner pre-seeds baseline ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv — DEFT skips its baseline train) train_combined_iter${N}.csv
│ │ │
▼ ▼ ▼
[ AutoML HPO sweep ] [ DEFT: baseline-inference → RCA [ AutoML HPO sweep ]
N recommendations → iter 1..N (plain retrain) ] re-tunes HPs against the
pick best by val_loss / FAR RCA / route / SDG / mining DEFT-augmented dataset
│ │ │
▼ ▼ ▼
best HPs spec + ckpt ─────► DEFT-augmented CSV ───────────► final best checkpoint
+ iter winner checkpoint (the deliverable; no
(Phase 3 warm-starts from it) further retrain)
The handoffs are:
baseline_spec.yaml, copies the winning checkpoint into ${RESULTS_DIR}/baseline/train/ under the filename DEFT expects, and pre-populates deft_state.json + loop_log.jsonl so DEFT resumes at baseline inference → evaluate → RCA → iter 1. DEFT itself stays plain-train (automl_policy: off preserved). Verbatim 4-step procedure in references/handoff.md.train_combined_iter${N_final}.csv) is AutoML's training data; the checkpoint (iterations.<best>.best_ckpt_path from deft_state.json) is wired into each rec's train.pretrained_model_path so Phase 3 fine-tunes from Phase 2's winner rather than from scratch. Without this warm-start Phase 3 routinely regresses vs the iter winner. Phase 3's winning checkpoint is the deliverable — no separate retrain after Phase 3. See references/handoff.md.specs/baseline_spec.yaml was hand-authored with — usually not optimal.Running all three: AutoML cheap-tunes once on the original data, DEFT does the heavy data work with reasonable HPs, then AutoML tunes again on the now-richer dataset. Phase 3 is the most important of the three for the final deployed FAR/recall.
The pipeline is sequential. Total wall-clock ≈ Phase 1 (N_automl × per-rec train) + Phase 2 (M iterations × per-iter cost) + Phase 3 (N_automl × per-rec train).
Note that Phase 2 has no separate baseline train — Phase 1's winning checkpoint is reused as DEFT's baseline, so the baseline cost lands inside Phase 1's N_automl trainings rather than as an extra retrain. Surface this to the user before kickoff. Typically Phase 2's iterations still dominate (each includes SDG + retrain), but Phase 1 and Phase 3 each add several hours on a single-GPU box. Use the per-job estimate from the user's setup (if they have one) rather than guessing minutes. See references/pitfalls.md for the per-phase cost breakdown.
The pipeline has exactly one user gate. Before any side-effecting action (docker pull, docker login, any job-launch call delegated to a downstream skill, file mutations under ${RESULTS_DIR}/), the agent must produce a single consolidated Pre-Flight Summary that subsumes every downstream skill's preflight. Once the user approves, the run is autonomous through all three phases — no further interactive pauses.
The user explicitly does not want to be paged between phases. The DEFT loop's own inline ## Pre-Flight Summary gate becomes a zero-question display step (every value pre-supplied), as does tao-run-automl's shared launch preflight in Phases 1 and 3.
Before printing the gate the agent must read every downstream preflight section in full and run every read-only check those sections prescribe, surfacing each outcome in the summary. Running every step of the DEFT skill's ## Pre-Flight is mandatory — if any step is skipped the consolidated gate is invalid and the pipeline must not advance. The summary must include, in order: (1) workspace/host/platform/network, (2) credentials SET/UNSET status, (3) resolved container image URIs with PRESENT/MISSING, (4) dataset table with leakage check, (5) Phase 1 config, (6) Phase 2 config incl. pre-seeded baseline source, (7) Phase 3 config, (8) compute estimate, (9) the confirmation line. After the gate, pass every collected value through to each downstream skill so it has nothing to ask. The only allowed post-gate pauses are mid-run hard-stop safety gates (e.g. DEFT's KPI regression gate); call them out in the summary.
See references/preflight.md for the full build procedure, the exact mandatory contents of each summary section (with the GPU memory rule of thumb, DEFT loop defaults, and required inputs verbatim), the downstream gate-suppression inputs, and the fallback when an older skill-bank version hard-codes its own STOP gate.
Invoke tao-skill-bank:tao-run-automl with:
| Input | AOI default | Notes |
|---|---|---|
network_arch | visual-changenet | Same model the DEFT loop expects |
train_dataset_uri | <workspace>/train/base/training_set.csv | Same training set DEFT will start from |
eval_dataset_uri | <workspace>/train/base/validation_set.csv | Held-out — must NOT be the KPI test set (<workspace>/kpi/testing_set.csv), since that set is reserved for DEFT's final reporting |
metric | FAR @ 100% recall (preferred) or val_loss | See references/pitfalls.md — ChangeNet AOI is class-imbalanced, val_loss alone can mode-collapse |
algorithm | bayesian | LLM-brain or autoresearch if compute is tight |
automl_max_recommendations | 5–10 for AOI | More recs = better HPs but linear in compute |
spec_overrides | Pin epochs / batch_size; sweep optimizer-related HPs only | Otherwise AutoML wanders into long-train regimes that blow Phase 2's budget |
After the sweep finishes, AutoML's result["best"]["specs"] is the winning hyperparameter dict.
Phase 1 hands over two artifacts: the winning spec and the winning checkpoint. Instead of retraining the same HPs in DEFT's baseline step, pre-seed DEFT's baseline state from Phase 1's outputs so DEFT starts at baseline inference → evaluate → RCA → iter 1. The four steps — write the merged baseline_spec_automl.yaml, copy the winning checkpoint into ${RESULTS_DIR}/baseline/train/, initialise deft_state.json with iterations.baseline.stage_completed == "train" (and append the matching loop_log.jsonl entry), then invoke DEFT — are given verbatim with the exact code in references/handoff.md. automl_policy: off inside the loop is preserved.
Run a quick eval of the winning checkpoint against the held-out set: per-class prediction counts (if it collapsed to one class, evaluate the 2nd or 3rd best instead) and a comparison to a zero-shot ChangeNet baseline (if AutoML did not improve over zero-shot, surface that and pause). See references/handoff.md.
Invoke tao-skill-bank:tao-run-deft-aoi (read its SKILL.md for the full interface). For non-AOI applications, invoke the matching DEFT skill; the handoff shape is the same.
The DEFT loop's baseline-train sub-step is skipped. Phase 1 already produced a checkpoint trained at the winning HPs, and Phase 1's handoff (see above) pre-populated ${RESULTS_DIR}/baseline/train/ and ${RESULTS_DIR}/deft_state.json so DEFT resumes at baseline inference → evaluate → RCA → iter 1. The rest of the DEFT loop runs unchanged. Do not modify its automl_policy: off invariant.
The DEFT loop owns:
baseline/train/ source but must not re-prompt, since every input was collected in the consolidated gate.${RESULTS_DIR}/ layout, deft_state.json, loop_log.jsonl, DEFT_Loop_Report.html.After the loop exits (KPI met or max_iterations reached), capture two values from deft_state.json:
iterations.<best>.best_ckpt_path — the loop's best plain-train checkpointN_final — used to locate the augmented training CSVIf the DEFT loop hard-stops on an unrecoverable gate, skip Phase 3. There is no validated augmented CSV to feed AutoML.
Re-invoke tao-skill-bank:tao-run-automl with the augmented training CSV as the train dataset, the same held-out validation CSV as before, and Phase 2's iter winner checkpoint as the warm-start:
| Input | AOI value |
|---|---|
network_arch | visual-changenet |
train_dataset_uri | ${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv |
eval_dataset_uri | Same as Phase 1 (<workspace>/train/base/validation_set.csv) — keep the comparison apples-to-apples |
metric | Same metric as Phase 1 |
algorithm | Same as Phase 1 |
automl_max_recommendations | 5–10 |
| Initial spec | Start from <workspace>/specs/baseline_spec_automl.yaml (Phase 1's winner) — gives the sweep a strong centroid to refine around |
| Warm-start checkpoint | iterations.<best>.best_ckpt_path from ${RESULTS_DIR}/deft_state.json — set spec_overrides["train"]["pretrained_model_path"] to this path. Each Phase 3 rec then fine-tunes from Phase 2's winner instead of training from scratch. |
The warm-start is mandatory: with no warm-start, every rec starts from random init with only 10-20 epochs to reconverge, Phase 3's val_loss regresses 0.03-0.05 vs iter1, and the _pick_best safety net silently rolls back to the iter winner — wasting Phase 3's compute. The concrete spec_overrides code (selecting the lowest-far_pct iteration, excluding any prior final_automl), the broad-exploration tradeoff, output to ${RESULTS_DIR}/final_automl/, and wiring Phase 3's checkpoint back into the DEFT report via iterations.final_automl + re-running prepare_inference_spec.py (with the _pick_best regression safety net) are all in references/handoff.md.
These apply to both AutoML phases — bake them into agent behavior, don't just paste once. The full detail is in references/pitfalls.md:
val_loss can mode-collapse to a zero-recall PASS-everything model. Prefer FAR @ 100%-recall directly, or gate val_loss with a pred_counts sanity check, or decide top-K by FAR @ 100%-recall. For balanced / regression tasks, val_loss is fine.kpi/testing_set.csv), which stays untouched until DEFT's evaluate stage. Phase 3 trains on the augmented CSV but keeps the same val set so Phase 1 and Phase 3 numbers stay comparable.When starting fresh from "run the AOI workflow", the agent delivers a three-phase worded message to the user (Phase 1 AutoML baseline → Phase 2 DEFT loop → Phase 3 AutoML refinement, with the cost framing and "OK to proceed?" close), then after confirmation invokes tao-run-automl (Phase 1), writes the merged spec, pre-seeds deft_state.json, invokes tao-run-deft-aoi (Phase 2) with every input pre-supplied, and invokes tao-run-automl again (Phase 3) — with no further pauses unless a downstream skill hits an unrecoverable hard-stop gate — then summarizes the trajectory (baseline AutoML best → DEFT iter 1 → ... → DEFT iter N_final → Phase 3 best).
See references/quick-start.md for the verbatim customer-facing message and the exact post-confirmation invoke sequence.
The same three-phase pattern applies to other DEFT skills — swap network_arch, the Phase 2 DEFT skill, the spec/checkpoint path conventions, and the Phase 3 augmented-CSV path. The handoff shape (Phase 1 emits spec + checkpoint that pre-seeds the DEFT baseline, Phase 2 emits an augmented dataset, Phase 3 emits the final checkpoint) is identical, and the baseline-skip mechanism is generic to any DEFT-style loop with a resumable baseline state. See references/quick-start.md.
tao-skill-bank:tao-run-automl — AutoML interface, algorithms, HP rangestao-skill-bank:tao-run-deft-aoi — full DEFT AOI loop (Phase 2 default)tao-skill-bank:tao-train-visual-changenet — underlying ChangeNet train/eval/infer skill (used by both AutoML and DEFT)skills/applications/deft-* skills — non-AOI Phase 2 targetsreferences/preflight.md — building the consolidated pre-flight gatereferences/handoff.md — Phase 1→2 pre-seed, Phase 2 quality check, Phase 3 warm-start + report wiringreferences/pitfalls.md — metric, noise, leakage, and compute-budget guidancereferences/quick-start.md — verbatim worked-example message and non-AOI variantnpx claudepluginhub nvidia-tao/tao-skills-bank --plugin tao-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.