From clab
Guides hypothesis-driven LLM research with principles for explicit hypotheses, red-teaming results, rigorous documentation, uncertainty reduction, and prompting escalation ladder. For AI experiments.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clab:research-principlesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
There is no "success" or "failure" in research, only insights and confidence levels.
There is no "success" or "failure" in research, only insights and confidence levels.
The goal of research is not to run experiments — it's to update your beliefs. Every decision should optimize for information gain per unit time.
When trying to get a model to do something, try approaches in this order. Only escalate when simpler methods fail or plateau:
Each step is roughly an order of magnitude more expensive in time and complexity. Don't skip steps.
Any experiment involving LLM API calls should cache responses to disk. This lets you:
Use one file per response keyed by a deterministic hash of the request (model, prompt, temperature, etc.). Use hashlib.md5, not Python's built-in hash() (which is non-deterministic across runs).
npx claudepluginhub butanium/claude-lab --plugin clabActs as AI/ML research collaborator: searches literature with query variations, analyzes codebases/logs, designs minimal falsification experiments, records predictions, and audits bugs.
Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.
Automates LLM-driven hypothesis generation and testing on tabular datasets using HypoGeniC, combining literature insights with data-driven testing for empirical research like deception detection or content analysis.