From nbgrader-to-otter
This skill should be used when the user asks to "convert an nbgrader notebook", "refactor to otter-grader", "migrate from nbgrader", "create an instructor notebook", or mentions nbgrader-to-otter conversion, otter assign, or grading format migration. Operates on one notebook at a time within its containing folder.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nbgrader-to-otter:refactoring-nbgrader-to-otterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<context>
references/assignment-config-template.mdreferences/refactoring-rules.mdreferences/scratchpad-template.mdreferences/transformation-patterns.pyreferences/troubleshooting.mdscripts/_lib.pyscripts/analyze_notebook.pyscripts/cleanup_metadata.pyscripts/diff_notebooks.pyscripts/fix_cells.pyscripts/validate_structure.pyscripts/wrap_transform.pyVerify files before listing in config. Run ls -la before populating the files: key
in assignment config. Listing non-existent files (e.g., utils.py when absent) causes
otter assign to fail with FileNotFoundError.
Validate after transform. Run validate_structure.py --skip-cleanup after
wrap_transform.py to catch structural issues before the eval cycle.
Keep a backup. Copy the original notebook before starting. This allows restoring and restarting if the workflow goes wrong.
<environment_constraints>
python3 (not python)<spark_checklist>
If the notebook uses PySpark, confirm the following before creating environment.yml.
These four must be handled in a single pass — each one is a separate Gradescope failure
mode if missed.
Ask the user before proceeding:
pyspark=3.5). Gradescope's conda-forge defaults to the latest, which may be a
major version ahead and break the notebook.openjdk=11). Without it, PySpark cannot start.Then verify in the notebook:
os.environ['JUPYTERHUB_USER'] must become
os.environ.get('JUPYTERHUB_USER', 'grader') — the key is absent on Gradescope.output.csv/part-00000.csv). The autograder zip needs a single file.
Copy only the part file:
cp shared/<assignment>/output.csv/part-*.csv <assignment>/output.csv
</spark_checklist>
Copy this checklist and track progress in a scratchpad file:- [ ] Step 0: Backup original notebook, create scratchpad
- [ ] Step 1: Scan working directory, verify files
- [ ] Step 2: Analyze notebook structure (analyze_notebook.py)
- [ ] Step 3: Insert assignment config (raw cell at index 0)
- [ ] Step 4: Run wrap_transform.py (single pass, all questions)
- [ ] Step 5: Eval cycle (max 3 iterations)
- [ ] Step 6: Save instructor notebook, rename original to -nbgrader.ipynb
- [ ] Step 7: Strip metadata + full validation
cp <notebook>.ipynb <notebook>-original.ipynb
touch <notebook-stem>-scratchpad.md
Populate the scratchpad with the template from references/scratchpad-template.md. Update it incrementally after each step. The scratchpad survives context compaction and provides an audit trail for the Testing Agent.
ls -la
Identify companion files (.csv, .json, .txt, .md, .png, utils.py) for the
assignment config files: key. Only list files that actually exist.
If the notebook imports PySpark, complete the <spark_checklist> now — before Step 3.
The PySpark and Java versions must be confirmed before writing environment.yml.
python3 scripts/analyze_notebook.py <notebook.ipynb>
Read the full JSON output to understand the notebook before transforming anything.
Insert a raw cell at index 0. Template: references/assignment-config-template.md.
Verify each file exists before listing:
ls utils.py data-card.md diagram.png
Only name: and files: change per notebook.
Single pass transforms all questions at once:
python3 scripts/wrap_transform.py <notebook.ipynb>
5a: python3 scripts/validate_structure.py --skip-cleanup <notebook.ipynb>
5b: python3 scripts/diff_notebooks.py <original-nbgrader.ipynb> <notebook.ipynb>
5c: If both pass → done
5d: If diff fails → python3 scripts/fix_cells.py <original-nbgrader.ipynb> <notebook.ipynb> <report.json>
5e: If validate fails → flag for manual review
5f: Abort if dropped+misplaced count did not decrease from previous iteration
Update the scratchpad after each iteration with validation and diff results.
Write the transformed notebook as {name}.ipynb (NOT {name}-instructor.ipynb). Otter
assign uses the input filename for the student notebook — naming it -instructor would
expose that name to students. Rename the original nbgrader notebook to
{name}-nbgrader.ipynb or {name}-original.ipynb as backup. Preserve existing cell
outputs. Do not execute — that is the Testing Agent's responsibility.
After the eval cycle passes, remove nbgrader metadata and run full validation:
python3 scripts/cleanup_metadata.py <notebook>.ipynb
python3 scripts/validate_structure.py <notebook>.ipynb
This removes cell.metadata.nbgrader, orphaned markers, and placeholder assignments.
Running this earlier breaks the transformation scripts — keep it as the final step.
<structural_rules> All delimiter cells are raw cells (not code, not markdown). Nesting order:
QUESTION
├── SOLUTION
└── TESTS
SOLUTION and TESTS are siblings inside QUESTION. Question config format:
# BEGIN QUESTION
name: qN
points: P
all_or_nothing: true
Place # BEGIN QUESTION before the first solution cell, after the question's markdown
header and any read-only setup code.
</structural_rules>
<context_management> This is a long task that may consume significant context. Save the scratchpad frequently to persist progress. Work systematically — accuracy over speed. If approaching output limits, ensure the current question transformation is complete and validated before stopping. The scratchpad allows pausing and resuming without losing progress. </context_management>
<critical_pitfalls>
Execute the notebook before otter assign. Otter parses cell outputs to generate
test doctests. Missing outputs cause NameError failures in the autograder because
solution variables are never defined. Execute locally with:
cd {notebook_dir}
jupyter nbconvert --to notebook --execute --inplace \
--ExecutePreprocessor.allow_errors=True {name}.ipynb
The --allow-errors flag lets grader.check_all() cells fail harmlessly while
solution cells still produce outputs. No JupyterHub needed.
Resolve shared data paths locally. Notebooks may reference absolute paths like
~/shared/<assignment-name>/. Create symlinks to the downloaded data:
ln -s /path/to/downloaded/data/<assignment-name> ~/shared/<assignment-name>
Without these, data-loading cells fail silently under --allow-errors, and all
downstream solution cells produce empty outputs that cascade into NameErrors.
Always include environment: environment.yml in the assignment config. Create an
environment.yml in the notebook directory listing all required packages. Gradescope
uses this to replicate the execution environment. Without it, otter auto-generates a
minimal environment that may lack packages (scipy, seaborn, etc.), causing ImportError
or different numerical results on Gradescope.
Scan both the notebook AND every bundled .py helper file for imports. A helper
module like helper_functions.py may import seaborn, networkx, or other packages
not used directly in the notebook. If any of these are missing from environment.yml,
import helper_functions fails on Gradescope — and because it is in the same cell as
import os, pathlib and the path setup, ALL subsequent imports in that cell are lost.
Symptoms look like a data-loading failure (NameError: path_data not defined), not a
missing package. This is a silent cascade that is easy to misdiagnose.
Checklist before uploading to Gradescope:
# list every import in all bundled .py helper files
grep -h "^import\|^from" *.py | sort -u
Also scan the notebook imports visually or via:
jupyter nbconvert --to script {name}.ipynb --stdout 2>/dev/null | grep "^import\|^from" | sort -u
Cross-reference both against environment.yml and add any gaps.
Test cells must be self-contained. If a test references variables defined outside
its # BEGIN QUESTION / # END QUESTION block, the autograder will fail with
NameError. Either move the data setup inside the test cell, or wrap dependent
assertions in try/except (air-gap pattern).
Provided computation cells must live inside the question block, before # BEGIN TESTS.
A common NBGrader pattern is: student writes means and standard_deviations (solution
cells), then a provided non-solution cell computes ratings_pivot_table_standardized
from those values, and then the test checks ratings_pivot_table_standardized. During
conversion, the provided computation cell often ends up AFTER # END QUESTION. When
otter runs the test, the computation has not executed yet and the test fails with
NameError. Fix: move any provided computation cell that produces a tested variable to
between # END SOLUTION and # BEGIN TESTS inside the question block.
</critical_pitfalls>
Before handing off to the Testing Agent, verify outputs exist:
python3 scripts/check_outputs.py {name}.ipynb
If any solution cells lack outputs, execute locally (see critical pitfall #1 above). Ensure shared data symlinks are in place first (critical pitfall #2). This script is in the testing-otter-grader skill's scripts directory.
npx claudepluginhub nyu-tandon-tmi/nbgrader-to-otter-grader --plugin nbgrader-to-otterProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.