Slash Command

/design

Set up a new autoresearch project. Use when the user wants to research any topic, improve anything, run iterative experiments, or says /autoresearch:design. Works for code, documents, analysis, research questions, arguments — anything.

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/autoresearch:design

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

# Autoresearch Design

You are setting up an autonomous research project. The human has a goal — something they want to understand, analyze, or improve. Your job is to understand that goal, produce the configuration files the orchestrator needs, and then run the iterative experiment loop.

## MANDATORY: You MUST follow Phases 1-6 in order. Do NOT skip phases. Do NOT "just do it yourself."

The phase count is 6: (1) understand the goal, (2) do bounded domain research, (3) propose the agenda, (4) write the config files, (5) run the loop, (6) present results.

The entire point of autoresearch ...

Command Content

189 lines · ~2.7k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitMay 19, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Autoresearch Design

You are setting up an autonomous research project. The human has a goal — something they want to understand, analyze, or improve. Your job is to understand that goal, produce the configuration files the orchestrator needs, and then run the iterative experiment loop.

MANDATORY: You MUST follow Phases 1-6 in order. Do NOT skip phases. Do NOT "just do it yourself."

The phase count is 6: (1) understand the goal, (2) do bounded domain research, (3) propose the agenda, (4) write the config files, (5) run the loop, (6) present results.

The entire point of autoresearch is the iterative loop — multiple rounds of parallel supportive/adversarial workers, each producing evidence. If you bypass the orchestrator and do the work directly, you have defeated the purpose.

What kind of project is this?

Two modes:

qualitative: workers produce write-ups (evidence consistent/inconsistent with directions), an LLM judge scores and synthesizes into a main document. Default for research, analysis, documents, DD, decisions, evaluations.
quantitative: workers edit code, eval script returns a number, best score wins. Default for optimization (faster code, better accuracy, lower latency).

Determine which fits the user's goal before proceeding.

Phase 1: Understand the goal

Start with the goal. If the user's request is clear, proceed directly. If ambiguous, ask one concise clarifying question — not a checklist.

Any goal is valid: code optimization, document writing, market research, argument development, due diligence, product rebuilds, personal decisions. Do not redirect the user based on goal type.

If the goal involves code: read the codebase — structure, imports, existing tests, benchmarks.

For everything else: determine what the output should contain and what "better" means.

Phase 2: Do bounded domain research

Before you can propose directions, you need to understand the domain. Do the research yourself, without a separate confirmation gate — this is fast scaffolding, not the run.

For code goals: read the target files, imports, call graphs. For everything else: 2-5 web searches or source reads to understand the domain enough to propose directions.

Keep it bounded — 5 minutes of work max. This is setup, not execution. The loop does the deep work. If the goal is already clear and you can name 3-6 good directions without research, skip straight to Phase 3.

Phase 3: Propose the agenda

Present a list of broad initial directions to investigate. Do NOT pre-decompose into specific sub-directions — the workers will discover specific angles during research and propose them via the roadmap.

Derive directions from the goal itself. If the goal is a thesis with claims, each claim is a direction. If the goal is analysis, each major area of concern is a direction. If the user provides proprietary context or prior analysis, incorporate those as directions too.

**<name>** — <one-line goal>

Directions:
- <broad direction 1>
- <broad direction 2>
- <broad direction 3>
- <broad direction 4>

<N> workers (N/2 supportive + N/2 adversarial), <M> rounds, ~$<X>. Ready?

Aim for 3-6 broad directions. Workers will discover sub-directions during research and the judge curates them into the roadmap each round.

Wait for the human to edit and confirm. They may strike, add, or rearrange.

Things you figure out yourself (do NOT ask the human):

The initiative name: derive from the goal. Short, lowercase, hyphenated.
Measurement mode: determine from the goal type.
What files to edit: for code — look at imports, call graphs. For research/documents — declare TWO files by default: evidence.md (citation catalog) and synthesis.md (decision-grade output). The synthesizer judge produces both with appropriate shapes each round. Use a single file only when the output is genuinely a single artifact (a press release, a code change, a configuration file). Never ask the user which they want — pick based on the goal.
What's off limits: tests, configs, CI, build files, eval infrastructure.
How to measure it: for code — existing benchmarks or write an eval script. For documents — design a rubric. The five universal soft gates are mandatory and validator-enforced (technical_specificity, analytical_reasoning, causal_implications, investigative_effort, neutral_synthesis). Add domain-specific soft gates as needed. Do not invent hard gates — only correctness and evidence are allowed.
Rounds: default 5.
Audience: derive from the user's prompt (e.g. "for an IA preparing a management session", "for a CTO assessing a vendor", "for an executive briefing the board"). Write it as a ## Audience section in program.md — the orchestrator reads this to produce a final audience-targeted brief.md after the run. If the user did not specify an audience, infer one from the goal type and state it explicitly so the user can correct it. Do not ask the user to specify it — capture it from context.

Phase 4: Write the config files

Each initiative gets its own directory under autoresearch/. Create autoresearch/<name>/ with:

autoresearch//program.md

# Research Program

## Target
{what we're investigating, in plain language}

## Metric
{what "better" means and how we measure it}

## Strategy
collaborative

## Measurement
{quantitative or qualitative}

## Direction
maximize

## Audience
{one-sentence description of who will read the output and what they'll do with it — used by the post-run brief judge to target the brief.md output. Required for qualitative initiatives. Example: "Senior Industry Advisor preparing for a management session with the company's CTO/CIO. Needs concrete technical questions to put to engineering leadership."}

## Editable files
- {file1}
- {file2}

## Directions
- {broad direction 1}
- {broad direction 2}
- {broad direction 3}

For qualitative initiatives that produce a research output: declare TWO editable files — evidence.md (citation catalog, exhaustive, no inferences) and synthesis.md (decision-grade, under ~2500 words, observation/inference structure). The synthesizer judge will produce both each round, with different shapes appropriate to each. Use a single editable file only when the initiative produces a single artifact type (e.g., a press release draft, a code change, a single document).

For qualitative measurement, add a ## Rubric section with hard and soft gates:

## Rubric

Hard gates (fail any = score 0):
- correctness: no factual errors — every specific claim backed by a named, plausible, verifiable source
- evidence: every non-trivial claim has a specific, named, non-marketing source

Soft gates (each pass = +1 point):
- technical_specificity: concrete details (numbers, versions, measurements), not generalizations
- analytical_reasoning: connects facts into arguments with stated conclusions, named alternative readings considered
- causal_implications: traces cause -> effect -> consequence with evidence; downstream claims labelled as inferences not facts
- investigative_effort: evidence of real digging (source code, commits, APIs, configs) not just summarizing docs pages
- neutral_synthesis: distinguishes observations from inferences; load-bearing words ("fragile," "exposed," "collapses," "structurally weak," "doesn't survive") only used where specific cited evidence supports them; language calibrated to the evidence, not the other way around
{add domain-specific soft gates here based on the initiative's goal}

Score: 0 (hard gate fail) or 0-N (soft gate count).

The five universal soft gates (technical_specificity, analytical_reasoning, causal_implications, investigative_effort, neutral_synthesis) are validator-enforced. The run will not start without all five. Add domain-specific gates on top — as soft gates only. The two hard gates (correctness, evidence) are also validator-enforced; custom hard gates are rejected.

autoresearch//eval.sh

For quantitative: an executable bash script that accepts a directory argument ($1) and prints one number to stdout. Make it executable.

For qualitative: a bash script that calls the LLM-as-judge evaluator:

#!/usr/bin/env bash
set -euo pipefail
WORKER_DIR="$1"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
/opt/homebrew/bin/python3.13 "$SCRIPT_DIR/../../bin/eval_qualitative.py" "$WORKER_DIR" "$SCRIPT_DIR"

Make it executable.

autoresearch//lockfile.txt

Files the workers must not edit, one per line.

For qualitative/document projects: Create the initial document

If the editable file is a document, write an initial version with a solid outline. This becomes the baseline the judge iteratively improves. Don't leave it empty.

Phase 5: Run the experiments

After writing the files, ask: "Ready to start? How many rounds? (default: 5)"

When they confirm, find the orchestrator script:

Common location: ~/Desktop/Projects/autoresearch-skills/bin/orchestrator.py
Search: find ~ -path "*/autoresearch-skills/bin/orchestrator.py" -maxdepth 4 2>/dev/null | head -1

Run it directly using the Bash tool (NOT in background). The orchestrator uses from bin.program_parser import ..., so you must invoke it as a module from the repo root — running the script path directly will crash with ModuleNotFoundError: No module named 'bin':

cd <repo_root_containing_bin/> && /opt/homebrew/bin/python3.13 -m bin.orchestrator <rounds> <project_dir> <name> --workers <N>

Where <repo_root_containing_bin/> is the directory holding bin/orchestrator.py (typically ~/Desktop/Projects/autoresearch-skills), and <project_dir> is the project whose autoresearch/<name>/ you're running (use . if the project and the repo root are the same).

Optional flags: --workers <N> (must be even, default 2), --max-cost <USD>, --max-writeup-words <N>, --max-proposals <N>.

Set the Bash timeout to 600000 (10 minutes).

Phase 6: Present results

When the orchestrator finishes, invoke /autoresearch:review to present the results.

CRITICAL REMINDERS

NEVER skip the orchestrator. Do not do the work yourself.
NEVER skip Phase 4. You must write program.md, eval.sh, and lockfile.txt before running.
NEVER skip Phase 5. The iterative loop must run.

/design

Invocation

Context Preview

Command Content

/design

Invocation

Context Preview

Command Content

Autoresearch Design

MANDATORY: You MUST follow Phases 1-6 in order. Do NOT skip phases. Do NOT "just do it yourself."

What kind of project is this?

Phase 1: Understand the goal

Phase 2: Do bounded domain research

Phase 3: Propose the agenda

Phase 4: Write the config files

autoresearch//program.md

autoresearch//eval.sh

autoresearch//lockfile.txt

For qualitative/document projects: Create the initial document

Phase 5: Run the experiments

Phase 6: Present results

CRITICAL REMINDERS

Other plugins with /design

Autoresearch Design

MANDATORY: You MUST follow Phases 1-6 in order. Do NOT skip phases. Do NOT "just do it yourself."

What kind of project is this?

Phase 1: Understand the goal

Phase 2: Do bounded domain research

Phase 3: Propose the agenda

Phase 4: Write the config files

autoresearch//program.md

autoresearch//eval.sh

autoresearch//lockfile.txt

For qualitative/document projects: Create the initial document

Phase 5: Run the experiments

Phase 6: Present results

CRITICAL REMINDERS

Other plugins with /design