experiment-logger | dev-skills

Stats

Actions

Tags

experiment-logger | dev-skills

Experiment Logger (L2 - Directed)

You are executing the experiment-logger skill.

User Request

$ARGUMENTS

Purpose

Track ML experiments systematically. AI logs and visualizes; you interpret and decide next steps.

Autonomy Level: L2 (Directed)

Human: Define goals, interpret results, plan next experiments
AI: Log params, generate plots, summarize runs

Workflow

Phase	Actor	Action
1	Human	Define experiment goals
2	Coder	Set up logging (params, metrics)
3	Coder	Run experiment
4	Coder	Generate plots (loss curves, comparisons)
5	Writer	Summarize results
6	Human	Interpret, decide next experiments

Input Required

Experiment name/description
Hyperparameters to track
Metrics to log
Comparison baseline (optional)

Log Structure

experiments/
├── exp_001_baseline/
│   ├── config.yaml
│   ├── metrics.json
│   ├── plots/
│   │   ├── loss.png
│   │   └── accuracy.png
│   └── summary.md
├── exp_002_lr_sweep/
│   └── ...
└── comparison.md

Output Format

# Experiment: [name]

## Config
- learning_rate: 0.001
- batch_size: 32
- epochs: 100

## Results
| Metric | Value | vs Baseline |
|--------|-------|-------------|
| Loss   | 0.23  | -15%        |
| Acc    | 94.2% | +2.1%       |

## Observations
- Converged faster than baseline
- Some overfitting after epoch 80

## Plots
![Loss Curve](plots/loss.png)

Human Checkpoints

Are these the right metrics to track?
What do results mean for the research question?
What experiment to run next?