Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/auto-research-with-eyes:idea-creatorThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate publishable research ideas for: $ARGUMENTS
Generate publishable research ideas for: $ARGUMENTS
Given a broad research direction from the user, systematically generate, validate, and rank concrete research ideas. This skill composes with /research-lit, /novelty-check, and /research-review to form a complete idea discovery pipeline.
All constants (PILOT_MAX_HOURS, PILOT_TIMEOUT_HOURS, MAX_PILOT_IDEAS, MAX_TOTAL_GPU_HOURS, REVIEWER_MODEL) are defined in the project's CLAUDE.md. Read them from there before proceeding.
💡 Override via argument, e.g.,
/idea-creator "topic" — pilot budget: 4h per idea, 20h total.
Map the research area to understand what exists and where the gaps are.
Scan local paper library first: Check papers/ and literature/ in the project directory for existing PDFs. Read first 3 pages of relevant papers to build a baseline understanding before searching online. This avoids re-discovering what the user already knows.
Search recent literature using WebSearch:
Build a landscape map:
Identify structural gaps:
Use the external LLM via Codex MCP for divergent thinking:
mcp__codex__codex:
model: REVIEWER_MODEL
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are a senior ML researcher brainstorming research ideas.
Research direction: [user's direction]
Here is the current landscape:
[paste landscape map from Phase 1]
Key gaps identified:
[paste gaps from Phase 1]
Generate 8-12 concrete research ideas. For each idea:
1. One-sentence summary
2. Core hypothesis (what you expect to find and why)
3. Minimum viable experiment (what's the cheapest way to test this?)
4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
6. Estimated effort: days / weeks / months
Prioritize ideas that are:
- Testable with moderate compute (8x RTX 3090 or less)
- Likely to produce a clear positive OR negative result (both are publishable)
- Not "apply X to Y" unless the application reveals genuinely surprising insights
- Differentiated from the 10-15 papers above
Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes.
Save the threadId for follow-up.
For each generated idea, quickly evaluate:
Feasibility check: Can we actually run this experiment with available resources?
Novelty quick-check: For each idea, do 2-3 targeted searches to see if it's already been done. Full /novelty-check comes later for survivors.
Impact estimation: Would a reviewer care about the result?
Eliminate ideas that fail any of these. Typically 8-12 ideas reduce to 4-6.
For each surviving idea, run a deeper evaluation:
Novelty check: Use the /novelty-check workflow (multi-source search + REVIEWER_MODEL cross-verification) for each idea
Critical review: Use REVIEWER_MODEL via mcp__codex__codex-reply (same thread):
Here are our top ideas after filtering:
[paste surviving ideas with novelty check results]
For each, play devil's advocate:
- What's the strongest objection a reviewer would raise?
- What's the most likely failure mode?
- How would you rank these for a top venue submission?
- Which 2-3 would you actually work on?
Combine rankings: Merge your assessment with REVIEWER_MODEL's ranking. Select top 2-3 ideas for pilot experiments.
Before committing to a full research effort, run cheap pilot experiments to get empirical signal. This is the key differentiator from paper-only validation.
Design pilots: For each top idea, define the minimal experiment that would give a positive or negative signal:
Deploy in parallel: Use /run-experiment to launch pilots on different GPUs simultaneously:
GPU 0: Pilot for Idea 1
GPU 1: Pilot for Idea 2
GPU 2: Pilot for Idea 3
Use run_in_background: true to launch all at once.
Collect results: Use /monitor-experiment to check progress. If any pilot exceeds PILOT_TIMEOUT_HOURS, kill it and collect partial results. Once all pilots complete (or timeout), compare:
Re-rank based on empirical evidence: Update the idea ranking using pilot results. An idea with strong pilot signal jumps ahead of a theoretically appealing but untested idea.
Note: Skip this phase if the ideas are purely theoretical or if no GPU is available. Flag skipped ideas as "needs pilot validation" in the report.
Write a structured report to IDEA_REPORT.md. When invoked from a workflow command (e.g., /autor.idea-discovery), save to OUTPUT_DIR/IDEA_REPORT.md. When invoked standalone, save to the project root:
# Research Idea Report
**Direction**: [user's research direction]
**Generated**: [date]
**Ideas evaluated**: X generated → Y survived filtering → Z piloted → W recommended
## Landscape Summary
[3-5 paragraphs on the current state of the field]
## Recommended Ideas (ranked)
### Idea 1: [title]
- **Hypothesis**: [one sentence]
- **Minimum experiment**: [concrete description]
- **Expected outcome**: [what success/failure looks like]
- **Novelty**: X/10 — closest work: [paper]
- **Feasibility**: [compute, data, implementation estimates]
- **Risk**: LOW/MEDIUM/HIGH
- **Contribution type**: empirical / method / theory / diagnostic
- **Pilot result**: [POSITIVE: metric +X% / NEGATIVE: no signal / SKIPPED: needs GPU]
- **Reviewer's likely objection**: [strongest counterargument]
- **Why we should do this**: [1-2 sentences]
### Idea 2: [title]
...
## Eliminated Ideas (for reference)
| Idea | Reason eliminated |
|------|-------------------|
| ... | Already done by [paper] |
| ... | Requires > 1 week GPU time |
| ... | Result wouldn't be interesting either way |
## Pilot Experiment Results
| Idea | GPU | Time | Key Metric | Signal |
|------|-----|------|------------|--------|
| Idea 1 | GPU 0 | 45 min | +2.3% CE | POSITIVE |
| Idea 2 | GPU 1 | 30 min | -0.1% CE | NEGATIVE |
| Idea 3 | GPU 2 | 1.5 hr | +0.8% CE | WEAK POSITIVE |
## Suggested Execution Order
1. Start with Idea 1 (positive pilot signal, lowest risk)
2. Idea 3 as backup (weak signal, may need larger scale to confirm)
3. Idea 2 eliminated by pilot — negative result documented
## Next Steps
- [ ] Scale up Idea 1 to full experiment (multi-seed, full dataset)
- [ ] If confirmed, invoke /autor.auto-review-loop for full iteration
After this skill produces the ranked report:
/idea-creator "direction" → ranked ideas
/novelty-check "top idea" → deep novelty verification (already done in Phase 4, but user can re-run)
/research-review "top idea" → external critical feedback
implement → write code
/run-experiment → deploy to GPU
/autor.auto-review-loop → iterate until submission-ready
npx claudepluginhub llv22/autoresearchwitheyesOrchestrates brainstormer, idea-critic, and research-strategist agents through 6-phase pipeline (Seed → Diverge → Evaluate → Deepen → Frame → Decide) for research ideation, evaluation, and decision-making. Triggers on brainstorming research or project triage.
Generates novel, differentiated research ideas grounded in project context, data feasibility, and literature gaps. Use when needing innovation points, research directions, or evaluating what a dataset can do.
Generates novel research ideas through open-ended brainstorming, interdisciplinary connections, and assumption challenging. Useful for early-stage research planning.