From workflows
Submits jobs to UVA HPC (Rivanna/Afton), writes Slurm scripts (sbatch/srun/squeue), converts SGE to Slurm, and builds WRDS data pipelines with polars on Slurm-managed clusters.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:hpcThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- [When to Use What](#when-to-use-what)
Three compute environments, each with a clear role:
| Environment | Use For | Examples |
|---|---|---|
| Local / RJDS | Exploration, prototyping, notebooks | EDA, quick plots, marimo/Jupyter, test on small samples, iterate on code |
| WRDS (SGE) | Data access, SAS ETL, file parsing | SAS jobs against WRDS libraries, SEC filing parsers on /wrds/sec/, scan_covers, ad-hoc SQL |
| UVA HPC (Slurm) | Scale compute | Model estimation (PIN), large polars pipelines, anything needing >10 cores or >1 hour |
1. EXPLORE (local/RJDS) → Prototype code, test on 5-10 items
2. BUILD DATA (WRDS) → SAS ETL or PostgreSQL queries (data lives there)
3. ESTIMATE AT SCALE (HPC) → sbatch when you need 100+ cores
4. ANALYZE RESULTS (local) → Pull results back, notebooks, regressions, tables
/wrds/sec/, SAS libraries) → WRDSThe interactive partition (42 nodes, 12h max) is for testing sbatch scripts on one chunk before submitting 176 tasks, not for replacing local dev work:
salloc -p interactive --cpus-per-task=4 --mem=16G --time=1:00:00
# test your script, then exit and sbatch the real job
PIN estimation proved it: WRDS SGE has 10 concurrent slots and took 8+ hours without starting OWR. UVA HPC ran 70+ OWR tasks simultaneously and finished in 30 minutes. But WRDS is still the right place to build the data — the SAS libraries and SEC filings live there.
ALWAYS write a Slurm submission script and submit via sbatch. No exceptions.
ssh uva-hpc 'python3 est.py owr 2020' → WRONG. Use sbatch.ssh uva-hpc 'nohup ./process &' → WRONG. Still the login node. Use sbatch.ssh uva-hpc 'for year in 2003..2024; do python3 ...; done' → WRONG. Use sbatch --array.sbatch run_est.sh owr → CORRECT.The login node is for: sbatch, squeue, scancel, sinfo, scp, ls, head, short queries.
--array=1-1. The login-node "quick test" is the run that flags the account — one stock becomes 5,000 when the args change, and you don't know it "only takes 30 seconds" until it runs.ssh uva-hpc 'python3 ... > output' → STOP. Write a submit script.ssh uva-hpc 'nohup ... &' → STOP. Use sbatch.--array.ssh uva-hpc (configured with ProxyJump through Mac via tailnet)vwh7mb/home/vwh7mb (GPFS, 12PB shared, no per-user quota displayed)/scratch/vwh7mb/ (Weka, 12TB)| Partition | Nodes | CPUs/Node | RAM/Node | MaxTime | MinNodes | MaxNodes | Use For |
|---|---|---|---|---|---|---|---|
standard | 301 | 40+ | 384GB+ | 7 days | 0 | 1 | Single-node jobs, array tasks |
parallel | 179 | 96 | 768GB | 3 days | 2 | 64 | Multi-node MPI jobs only |
gpu | 44 | 36+ | 257GB+ | 3 days | — | — | GPU workloads |
interactive | 42 | 32+ | 128GB+ | 12 hrs | — | — | Interactive/debugging |
The parallel partition requires MinNodes=2 — it will reject single-node jobs with "Node count specification invalid". It is designed for MPI jobs that span multiple nodes.
Wrong: #SBATCH --partition=parallel for array jobs → submission fails
Right: #SBATCH --partition=standard for array jobs → 301 nodes available
standard (default choice for most research computing):
ProcessPoolExecutor, multiprocessing, mclapplyparallel (multi-node distributed computing):
mpi4py, OpenMPI, MVAPICH)ProcessPoolExecutor and multiprocessing are single-node onlygpu (GPU-accelerated workloads):
interactive (debugging and development):
salloc -p interactive --cpus-per-task=4 --mem=16G --time=1:00:00$HOME/.pixi/bin/pixi via curl -fsSL https://pixi.sh/install.sh | bash$HOME/projects/<name>/.pixi/envs/default/bin/pythonmodule load python — but pixi preferred for reproducibility#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=standard
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=3:00:00
#SBATCH --output=logs/job-%A_%a.log
mkdir -p logs
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
PYTHON=$HOME/projects/my-project/.pixi/envs/default/bin/python
$PYTHON -u my_script.py --workers ${SLURM_CPUS_PER_TASK:-8}
sbatch script.sh # submit
sbatch script.sh arg1 arg2 # args passed to script as $1, $2
Note: unlike SGE's qsub run.sh <model>, Slurm passes arguments after the script name directly. Use ${1:?Usage: sbatch script.sh <arg>} to enforce.
#SBATCH --array=1-176 # tasks 1 through 176
#SBATCH --array=1-176%50 # max 50 concurrent tasks
#SBATCH --array=1,5,9,13 # specific tasks only
#SBATCH --array=1-176
# 22 years × 8 chunks = 176 tasks
# Decode: year = START_YEAR + (id-1)/NCHUNKS, chunk = (id-1)%NCHUNKS
NCHUNKS=8
START_YEAR=2003
idx=$((SLURM_ARRAY_TASK_ID - 1))
year=$((START_YEAR + idx / NCHUNKS))
chunk=$((idx % NCHUNKS))
# Equivalent to SGE's sed -n "${SGE_TASK_ID}p" pattern
ITEM=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$TASK_LIST")
# Re-run specific tasks
sbatch --array=5,12,87 script.sh
# Re-run a range
sbatch --array=10-20 script.sh
| SGE | Slurm | Notes |
|---|---|---|
#$ -N job_name | #SBATCH --job-name=job_name | |
#$ -cwd | (default behavior) | Slurm runs from submit dir by default |
#$ -l m_mem_free=4G | #SBATCH --mem=4G | Per-node memory |
#$ -pe onenode N | #SBATCH --ntasks=1 --cpus-per-task=N | Single-node parallel |
#$ -j y | (default behavior) | Slurm merges stderr into stdout by default |
#$ -o logs/out-$TASK_ID.log | #SBATCH --output=logs/out-%A_%a.log | %A=job, %a=array task |
#$ -t 1-176 | #SBATCH --array=1-176 | |
| (no equivalent) | #SBATCH --partition=standard | Required — no default partition |
| (no equivalent) | #SBATCH --time=3:00:00 | Default 5h, max 7d on standard |
| SGE | Slurm | Description |
|---|---|---|
$SGE_TASK_ID | $SLURM_ARRAY_TASK_ID | Array task index |
$JOB_ID | $SLURM_JOB_ID | Job ID |
$NSLOTS | $SLURM_CPUS_PER_TASK | Allocated CPUs |
$HOSTNAME | $SLURM_NODELIST | Assigned node(s) |
$SGE_TASK_FIRST | $SLURM_ARRAY_TASK_MIN | First array index |
$SGE_TASK_LAST | $SLURM_ARRAY_TASK_MAX | Last array index |
| SGE | Slurm | Description |
|---|---|---|
qsub script.sh | sbatch script.sh | Submit job |
qstat -u $USER | squeue -u $USER | List running jobs |
qdel job_id | scancel job_id | Cancel job |
qstat -j job_id | scontrol show job job_id | Job details |
qacct -j job_id | sacct -j job_id | Job accounting |
| (no equivalent) | sinfo -p partition | Partition info |
When converting an SGE script to Slurm:
#$ directives with #SBATCH equivalents (see table above)#SBATCH --partition=standard (SGE has no equivalent — partition is implicit)#SBATCH --time= (SGE defaults to unlimited on WRDS)$SGE_TASK_ID → $SLURM_ARRAY_TASK_ID$NSLOTS → $SLURM_CPUS_PER_TASK$JOB_ID → $SLURM_JOB_ID#$ -cwd and #$ -j y (Slurm defaults)$TASK_ID → %a, $JOB_ID → %Asqueue -u $USER # all my jobs
squeue -j 12345678 # specific job
squeue -j 12345678 -t R | wc -l # count running tasks
squeue -j 12345678 -t PD # show pending tasks + reasons
squeue -u $USER --format='%.10i %.9P %.12j %.2t %.10M %.4C %R' # detailed
| Reason | Meaning |
|---|---|
(Priority) | Lower priority than other queued jobs — will run eventually |
(Resources) | Not enough free nodes/CPUs — waiting for running jobs to finish |
(QOSMaxCpuPerUserLimit) | Hit per-user CPU limit on this QOS |
(AssocMaxJobsLimit) | Hit max concurrent jobs for this account |
sacct -j 12345678 --format=JobID,State,ExitCode,Elapsed,MaxRSS,NCPUS
sacct -j 12345678 -a --format=JobID,State,ExitCode # all array tasks
Output goes to --output path. With %A_%a pattern:
logs/est-12345678_1.log — job 12345678, array task 1grep -rl 'Error\|Traceback' logs/est-12345678_*.logUVA HPC bills in Service Units (SUs), which are weighted CPU-core-hours:
SU = (CPU_cores × 4.6369 + Memory_GB × 0.2842) × hours
| Config | SU/hour | 176 tasks × 3 hrs |
|---|---|---|
| 1 CPU, 4GB | ~5.8 | ~3,062 |
| 8 CPU, 32G | ~46.2 | ~24,404 |
| 40 CPU, 160G | ~231 | ~121,968 |
With 10M SUs allocated, even aggressive usage (8 CPU × 176 tasks × 3 hrs = ~24K SUs) is negligible (<0.25% of allocation).
allocations # show allocation balance
allocations -a myallocation # specific allocation
WRDS PostgreSQL is accessible from HPC compute nodes. Use polars + connectorx for fast data pipelines that replace SAS entirely.
wrds-pgdata.wharton.upenn.edu:9737~/.pgpass (chmod 600)edwin_hu (UVA account)from wrds_conn import read_wrds
# WRDS SQL → polars DataFrame in one line
df = read_wrds("SELECT * FROM crsp.msf WHERE date >= '2020-01-01'")
# Write to Parquet for reuse
df.write_parquet("/scratch/vwh7mb/data/crsp_msf.parquet")
wrds_conn.py (see examples/wrds_conn.py) parses .pgpass and builds a connectorx-compatible URI — connectorx doesn't read .pgpass natively.
Old: WRDS SAS → .sas7bdat (7GB) → Python HDF5 conversion → .h5 (390MB)
New: WRDS PostgreSQL → polars/connectorx → .parquet
No SAS license needed. Single step. Portable output.
See references/wrds-polars-pipeline.md for full examples (joins, partitioned output, Slurm submission for large queries).
npx claudepluginhub edwinhu/workflows --plugin workflowsGenerates and submits sbatch scripts for GPU compute jobs on Slurm clusters. Handles partition, GPU types (A100_40G, V100, A800), node selection, Python paths, and cluster rules.
Processes larger-than-RAM datasets in parallel with Dask's DataFrames (parallel pandas), Arrays (parallel NumPy), Bags, Futures, Schedulers. Scales from laptop to HPC clusters.
Runs Python workloads on Hugging Face managed infrastructure (CPUs, GPUs, TPUs) with Hub persistence. For batch inference, data processing, experiments, or any job without local GPU setup.