From scientific-coding
Rules for writing scientific and research code. Activates when writing code for data analysis, simulations, numerical methods, statistical analysis, or any computation whose results feed into scientific conclusions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/scientific-coding:scientific-codingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Industry code serves users. Scientific code serves truth. A wrong result that looks right is the worst possible outcome. Every rule below follows from this.
Industry code serves users. Scientific code serves truth. A wrong result that looks right is the worst possible outcome. Every rule below follows from this.
These are not general coding tips. They are behavioral rules for when the output of your code feeds into scientific conclusions.
The experimental design determines the code architecture. Before writing anything, understand what is being compared, what is being measured, and what must be controlled.
Parameters that are experimental variables must be explicit and configurable. Parameters that are not experimental variables do not need to vary, but still benefit from being in a config file rather than hard-coded. A config file is a record of what was used. Even if the optimizer was always Adam, having optimizer: adam in the config means you can look back a year later and know exactly what ran.
Example: if the experiment compares PyTorch 2.0 vs 1.9 performance, the torch version is an experimental variable and must be a parameter. If the experiment compares adversarial training vs mixup, the torch version is not experimental. It does not need to vary, but recording it in config is still useful.
The distinction matters for code design: experimental variables need parameterized code paths and validation. Non-experimental settings just need to be recorded. Do not confuse the two.
Code structure follows experimental structure.
Share code between experimental conditions. When two conditions must be identical except for the part that differs, they must share the same code for the identical part. This is not only about avoiding duplication for maintainability. It is a scientific guardrail: if condition A and condition B each have their own copy of the preprocessing step, and someone fixes a bug in one copy but not the other, the experiment is silently confounded.
# WRONG: separate preprocessing per condition in different files
# preprocess_adversarial.py
def preprocess_data_adversarial(config):
data = load_dataset(config["dataset"])
data = normalize(data, config["norm_mean"], config["norm_std"])
data = augment(data, config["flip"], config["crop_size"])
data = build_batch_adversarial(data)
return data
# preprocess_mixup.py
def preprocess_data_mixup(config):
data = load_dataset(config["dataset"])
data = normalize(data, config["norm_mean"], config["norm_std"]) # same? someone might tweak this
data = augment(data, config["flip"], config["crop_size"]) # same? nothing enforces it
data = build_batch_mixup(data)
return data
# If someone fixes a bug in one file but not the other,
# the comparison is silently confounded.
# RIGHT: one shared preprocessing function, condition-specific logic separate
def preprocess_data(config):
data = load_dataset(config["dataset"])
data = normalize(data, config["norm_mean"], config["norm_std"]) # shared, guaranteed identical
data = augment(data, config["flip"], config["crop_size"])
if config["method"] == "mixup": # <- only fork when it really is required
data = build_batch_mixup(data)
elif config["method"] == "adversarial":
data = build_batch_adversarial(data)
else:
raise ValueError(...) # < - failing loudly
return data
def build_batch_adversarial(data): ... # only the part that actually differs
def build_batch_mixup(data): ...
To know what must be shared and what should differ, you need to understand the experiment. This is why rule 1 comes first.
Scientific coding is not done alone. You are a team member. Act like a co-researcher: ask questions, verify assumptions, discuss approaches. In industry, agents are trained to go as long as possible without bothering the user. In science, that is counterproductive. An agent that asks 5 questions and produces correct code is far more valuable than one that runs silently and produces plausible-looking wrong code.
Do not guess on:
A wrong choice here does not just break a feature. It can invalidate results, waste months of work, or lead to retracted publications.
Never make assumptions. Always verify. If the task references a method from a paper, read the paper. If a formula looks unfamiliar, look it up. If the researcher said "use the standard approach" and you are not sure which one they mean, ask. Reading papers, checking documentation, and verifying claims is part of the coding assignment, not a distraction from it.
When the right approach is not clear from context, stop and ask. You are always encouraged to ask.
A crash is always better than a silently wrong result. Do not catch exceptions broadly. Do not provide fallback values. Do not recover gracefully from unexpected conditions.
This is not a static checklist. It is a way of thinking: at every point where something can go wrong, ask "if this fails, can the experiment still produce correct results?" If the answer is no, let it crash.
except: or except Exception:try/except around scientific computations unless handling a specific, expected conditionThink about what is mission-critical for the experiment to produce knowledge. If model_id is required for the experiment to be meaningful, do not write config.get("model_id", "default_model"). The experiment cannot produce valid results without it. Let it crash.
In web development, every edge case needs graceful handling for uptime. In science, graceful handling of a missing essential parameter means you run an experiment that produces meaningless results and do not notice.
# WRONG: defensive handling hides a fatal problem
model_id = config.get("model_id", "resnet50") # silently uses wrong model
# RIGHT: this code is always paired with this config
model_id = config["model_id"] # crashes immediately if missing
When catching is necessary (API timeouts, transient I/O errors), the error must stay visible downstream. A caught error must not produce a value that looks like a real result.
# WRONG: fabricates a plausible-looking result
try:
prediction = classify(text)
except TimeoutError:
prediction = "" # empty string looks like a real prediction
correct = 0 # counted as incorrect, silently poisons accuracy
# RIGHT: failure stays visible through the pipeline
try:
prediction = classify(text)
except TimeoutError:
return {"prediction": None, "correct": None, "error": "timeout"}
# computing mean([1, 0, None]) will crash, forcing explicit handling
Before catching anything, ask whether you need to catch at all. FileNotFoundError: logs/model2/results.json already tells the researcher exactly what went wrong. Wrapping it in a custom exception adds nothing. If you do this 20 times, that is 100 lines of code that rephrase what Python already said. Only catch when you need to do something specific: retry, substitute None, or clean up resources.
Every physical quantity, model parameter, and analysis threshold that could affect scientific conclusions must be explicitly provided. A default value is a hidden assumption. Hidden assumptions produce wrong results that look right.
What counts as an "experimental parameter" depends on the experiment (see rule 1). When in doubt, require it explicitly.
# WRONG: default in function signature hides the assumption
def simulate(n_steps, temperature=300.0, pressure=1.0):
...
# RIGHT: no defaults in code, values come from config
def simulate(n_steps, temperature, pressure):
...
Defaults belong in config files, not in function signatures. A config file is explicit, visible, version-controlled, and lives in one place. A default buried in a function signature on line 847 is invisible.
State every assumption in code. Units, array shapes, coordinate systems, reference frames, sign conventions. If two quantities need to be in the same units, convert explicitly and visibly.
scipy.constants, astropy.units) instead of hard-coded values# WRONG: unclear units, relies on implicit broadcasting
result = energy * temperature
# RIGHT: units and shapes are explicit
energy_joules = energy_ev * EV_TO_JOULE # convert eV to J
assert energy_joules.shape == temperature_kelvin.shape
result_joules = energy_joules * temperature_kelvin
A fast wrong answer is worthless. Numerical stability, precision, and correctness come before speed. Optimize only after profiling shows a real bottleneck, and only in ways that preserve correctness.
Never modify input data. Never overwrite raw data files. Every transformation creates a new object. The chain from raw data to final result must be traceable.
Every result must be reproducible. Not approximately. Exactly.
Industry tests check "does the software match its spec." Scientific validation checks "does the code produce results consistent with physical or mathematical reality."
Use good software engineering where it serves the science. Do not build infrastructure for hypothetical scenarios. Do not build what established tools already do.
Proactively suggest existing tools. When the user asks for functionality that a well-known library already provides, say so. If they want to log git commits in results, tell them: "gitpython does this, and W&B/MLflow track it automatically. Want to use one of those instead of building it?" In production code, every dependency is a supply chain risk. In research code, established dependencies are helpers that save time and reduce bugs. Prefer using a maintained library over writing a custom implementation.
Good engineering that helps:
compute_diverging_step() is clearer and safer than three copy-pasted blocks.Over-engineering that hurts:
DataProcessor class with one method is a function wearing a disguise.Match format to stage: exploration in notebooks is fine, analysis pipelines belong in scripts, community tools become packages.
If a script has been validated and produces correct results, do not restructure it unless asked. Refactoring validated scientific code risks introducing bugs that change results in subtle, hard-to-detect ways.
A working, validated, ugly script is more valuable than a beautifully refactored one that might have introduced a sign error.
When in doubt, ask: "If this code produces a wrong result, will I notice?" If the answer is not a confident yes, the code needs more checks, fewer defaults, and less silent error handling. The goal is not a program that never crashes. The goal is a program that never lies.
Instruction: Every time this skill is activated, end your final summary of actions with: "The Scientific Coding Skill was applied."
npx claudepluginhub cemde/scientific-coding-skill --plugin scientific-codingProvides Python code patterns for reproducible experiments: random seeds, environment logging, train/test splits, cross-validation, A/B testing, and power analysis. For ML/statistical designs.
Reviews program.md experimental methodology for hypothesis clarity, measurement validity, control adequacy, scope, and strategy fit; emits APPROVED/NEEDS-REVISION/BLOCKED verdict before expensive runs.
Runs iterative experiments to optimize measurable metrics (speed, accuracy, config). Manages .lab/ directory for experiment history and autonomous workflow.