Skill

research

Use when user explicitly requests deep research or comprehensive analysis across many authoritative sources. Creates an agent team for parallel research, gated on answer sufficiency (not a source count), with confidence tracking and structured synthesis. NOT for simple questions answerable with a single search.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/amux:research

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

SKILL.md

324 lines · ~3.4k tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Tags

Comprehensive research using an agent team, web search, and web scraping. Iteratively decomposes topics, gathers evidence from quality sources via parallel researcher teammates, and synthesizes findings into structured reports.

Core principle: Decompose questions, research in parallel with an agent team, evaluate confidence, iterate until sufficient, synthesize with source attribution.

<success_criteria> Done when: state.json phase is "DONE", every question is answered at medium+ confidence (or remaining gaps are documented as limitations), and report.md synthesizes findings with source attribution, conflicts, gaps, and limitations. Sufficiency — not a source count — is the bar. </success_criteria>

<when_to_use> Use when the user explicitly asks for deep research / comprehensive analysis needing multiple authoritative sources, confidence tracking, and source attribution.

Don't use for simple factual questions a single search answers, or topics too narrow for an 8-question decomposition. </when_to_use>

<required_tools>

Tool / Feature	Purpose	Required
`WebSearch`	Search queries (built-in)	Yes
Agent teams	Spawn parallel researcher teammates	Yes
`firecrawl-mcp:firecrawl_scrape`	Scrape full page content (preferred)	No
`WebFetch`	Fetch page content (built-in fallback)	Fallback

Prerequisite: Agent teams must be enabled (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in settings or environment).

Tool Selection: In INIT phase, check if firecrawl-mcp:firecrawl_scrape is available. If not, use WebFetch (built-in). Record choice in state.json as "scraper": "firecrawl" or "scraper": "webfetch".

Tradeoffs:

firecrawl-mcp:firecrawl_scrape: Better content extraction, handles JS-rendered pages
WebFetch: Always available, sufficient for static pages </required_tools>

<state_machine>

INIT → DECOMPOSE → RESEARCH → EVALUATE → [RESEARCH or SYNTHESIZE] → DONE

State File: research/{slug}/state.json

{
  "topic": "string",
  "phase": "INIT|DECOMPOSE|RESEARCH|EVALUATE|SYNTHESIZE|DONE",
  "iteration": 0,
  "maxIterations": 5,
  "targetSources": 30,
  "sourcesGathered": 0,
  "totalSearches": 0,
  "teammateCompletions": 0,
  "codexCompletions": 0,
  "findingsCount": 0,
  "startTime": "ISO-8601 timestamp",
  "scraper": "firecrawl|webfetch",
  "questions": [{"id": 1, "text": "...", "status": "pending|done", "confidence": null}]
}

Rule: Read state.json before acting. Write state.json after acting. </state_machine>

<task_loop> A generic task loop hook prevents the session from ending while task-loop.json has complete: false, so research isn't abandoned mid-flight.

How it works:

When you try to exit, the hook reads research/{slug}/task-loop.json
If complete is false, exit is blocked and continuationPrompt is re-injected
Once you set complete: true, exit is allowed and completionMessage is displayed

You manage task-loop.json alongside state.json. Update statusMessage and continuationPrompt as progress changes. Set complete: true when EVALUATE's sufficiency gate passes or the iteration ceiling is hit — both are legitimate exits, so the loop always terminates. </task_loop>

<state_recovery> On skill invocation, first check for existing state:

If research/{slug}/state.json exists:
- Parse JSON; if invalid, offer to restart
- Resume from current phase
- Notify user: "Resuming research from {phase} phase"
Verify state consistency before resuming:
- RESEARCH: Ensure pending questions exist
- EVALUATE: Ensure findings.json has data
- SYNTHESIZE: Ensure all questions marked "done"
If inconsistent, offer user choice:
- Delete state and restart
- Attempt repair (mark incomplete questions as pending) </state_recovery>

1. Generate slug from topic: - Lowercase the topic - Replace spaces with hyphens - Remove special characters (keep only `a-z`, `0-9`, `-`) - Truncate to 50 characters - Example: "AI in Healthcare 2024!" → `ai-in-healthcare-2024`

Detect available scraper:
- Check if firecrawl-mcp:firecrawl_scrape tool exists
- If firecrawl available → "scraper": "firecrawl"
- If not available → "scraper": "webfetch" (uses built-in WebFetch)
Create working directory:
```
mkdir -p research/{slug}
```
Set targetSources as a rough expectation for the topic's breadth (narrow ~20, standard ~30, broad ~40) and maxIterations (default 5). These are signals/backstops, not the completion bar — sufficiency is.

Initialize state files:

state.json: use the schema from <state_machine> with phase: "DECOMPOSE", counters at 0, startTime set, questions: [].

task-loop.json (activates the generic task loop hook):

{
  "active": true,
  "complete": false,
  "continuationPrompt": "Continue researching: {topic}. Check research/{slug}/state.json and continue the RESEARCH phase.",
  "statusMessage": "Research in progress: {topic}",
  "completionMessage": "Research complete."
}

findings.json:

[]

Decompose the topic into the questions that actually matter for it. Use your judgment on count and framing — cover the angles the topic warrants and skip those it doesn't.

Common angles to draw from (a menu, not a checklist): definition/background, current state, key entities, core mechanisms, evidence and data, criticisms and limitations, comparisons to alternatives, future developments. Add others the specific topic demands.

Add questions to state.json with status="pending". Set phase="RESEARCH".

Create an agent team to research pending questions in parallel. Each teammate independently searches with Claude AND cross-validates with Codex via the `codex` MCP tool.

Claude teammates: One per pending question (up to 8 at a time). Each works independently with its own context window. Each teammate also calls the codex MCP tool to get Codex's perspective on the same question, providing genuine cross-validation — two engines may surface different sources and perspectives.

Read scraper from state.json. Spawn each Claude teammate with these instructions, substituting {SCRAPER} = firecrawl-mcp:firecrawl_scrape or WebFetch accordingly:

<teammate_instructions> You are a researcher teammate with WebSearch and {SCRAPER}.

TASK: {QUESTION}

Run enough searches to answer the question well, then scrape the best sources with {SCRAPER} ("Extract main content and key facts") and extract specific facts with their sources. Continue if a scrape fails. Use your judgment on how many searches and sources are enough.

Quality guide (favour higher tiers): Tier 1 — .gov, .edu, journals, official docs · Tier 2 — Reuters, AP, BBC, industry pubs · Tier 3 — company blogs, Wikipedia · Skip — forums, social media, SEO spam. </teammate_instructions>

<teammate_codex_crossvalidation>

Cross-Validation with Codex

After web research, call the codex MCP tool (model: gpt-5-codex, sandbox: read-only) with prompt: "Research this question: {QUESTION}. Return JSON findings with fields: fact, sourceNote, confidence (high/medium/low). Focus on facts confirmable from training data."

Treat codex as unavailable if the call throws/times out, or returns empty/non-JSON/MCP-error text (e.g. "Codex CLI Not Found") — then return Claude-only findings. If valid JSON, merge per question:

AGREE (same fact): boost confidence, mark cross-validated.
CHALLENGE (contradicts web fact): keep web version, document the contradiction.
COMPLEMENT (Codex-only): include with "status": "hypothesis" (no web citation).
COMPLEMENT (Claude-only): keep as-is. </teammate_codex_crossvalidation>

<teammate_return_format> RETURN ONLY THIS JSON:

{
  "questionId": {ID},
  "questionText": "{QUESTION}",
  "searchQueries": ["query1", "query2", "query3", "query4"],
  "searchesRun": 4,
  "urlsScraped": 4,
  "scrapeFailures": [],
  "findings": [
    {
      "fact": "...",
      "sourceUrl": "...",
      "tier": 1,
      "crossValidated": false,
      "engines": ["claude"],
      "status": "confirmed|hypothesis|disputed"
    }
  ],
  "gaps": ["what you couldn't find"],
  "contradictions": ["X says A, Y says B"],
  "confidence": "high|medium|low",
  "confidenceReason": "...",
  "codexAvailable": true
}

</teammate_return_format>

After each Claude teammate completes:

Validate JSON. Retry once if malformed.
Append to findings.json
Update state.json:
- Mark question done
- Increment totalSearches by searchesRun from response
- Increment teammateCompletions by 1
- Increment sourcesGathered by urlsScraped from response
- Increment findingsCount by length of findings array from response
- If codexAvailable is true, increment codexCompletions by 1
Log progress: "Sources: {sourcesGathered}/{targetSources}"

After all teammates complete:

Set phase="EVALUATE"

Decide whether the research is **sufficient** — judged on whether the questions are actually answered, not on a source count.

Sufficiency gate → SYNTHESIZE when both hold:

Every question is answered at medium confidence or better (high=3, medium=2, low=1).
Significant gaps and contradictions are resolved, or explicitly documented as limitations.

sourcesGathered vs targetSources is a sanity signal, not a gate: if you'd synthesize with far fewer sources than expected, double-check you haven't stopped short; if you're well past target but still thin, keep going. Don't pad to hit a number.

Otherwise → RESEARCH another round, unless the iteration ceiling (maxIterations, default 5) is reached — then SYNTHESIZE with the remaining gaps documented as limitations (futility exit; never loop forever).

If continuing to RESEARCH:

Generate follow-up questions from gaps/contradictions (as many as the gaps warrant)
Add to questions with status="pending"; increment iteration; set phase="RESEARCH"
Update task-loop.json: set statusMessage and continuationPrompt to the specific open questions
Log: "Continuing research: {N} questions still below confidence / {M} open gaps"

Set complete: true in task-loop.json only when the sufficiency gate passes or the iteration ceiling is hit.

Write `report.md`:

# {Topic}

## Executive Summary
[300-400 words. Most important finding first. State confidence. Note caveats.]

## Background
[200 words. Key terms. Context.]

## Key Findings

### [Theme 1]
[Grouped findings. Inline citations. Note source strength.]

### [Theme 2]
[3-5 themes total]

## Conflicting Information
[Both sides. Which has better sourcing.]

## Gaps & Limitations
[What's unknown. What needs more research.]

## Source Assessment
- **High confidence:** [claims with 3+ quality sources]
- **Medium confidence:** [claims with 1-2 sources]
- **Low confidence:** [single source or Tier 3 only]

## Sources

### Primary
[Tier 1 sources with URLs]

### Secondary
[Tier 2-3 sources with URLs]

---
*Sources: {sourcesGathered} | Searches: {totalSearches} | Teammates: {teammateCompletions} | Iterations: {iteration} | Duration: {duration} | Date: {date}*

Set phase="DONE".

Update task-loop.json:

{
  "active": true,
  "complete": true,
  "completionMessage": "Research complete: \"{topic}\"\n\nResources used:\n  Searches: {totalSearches}\n  Sources: {sourcesGathered}/{targetSources}\n  Teammates: {teammateCompletions}\n  Iterations: {iteration}\n\nReport: research/{slug}/report.md"
}

The task loop hook will display this message when the session exits.

<error_handling>

Error	Action
Malformed JSON	Retry once, then mark low confidence
Scrape fails	Continue with other URLs
Rate limit	Wait 60s, reduce batch to 2
No results	Mark low confidence, rephrase as follow-up
Tool not found	Fall back to WebFetch, update state.json
`codex` MCP unavailable, empty, or error-text response	Teammate returns Claude-only findings, research continues
</error_handling>

| Resource | Guide | Notes | |----------|-------|-------| | Target sources | ~30 | Breadth signal, not a gate (set in INIT, ~20-40 by complexity) | | Max iterations | 5 | Hard backstop — forces a futility exit so the loop always terminates | | Teammates per batch | 8 | Parallelism cap — one teammate per pending question |

Searches per teammate, URLs scraped, and follow-ups per iteration are the agent's discretion. Completion is governed by the sufficiency gate (EVALUATE), bounded by maxIterations.

<red_flags> Stop on sufficiency, not a number: synthesize once every question is answered at medium+ confidence with gaps documented — don't pad sources to hit a target, and don't quit while questions are still thin (unless maxIterations is reached). Weight Tier 1 sources higher. If stuck, narrow scope or generate sharper follow-ups. </red_flags>

research

Invocation

Context Preview

SKILL.md

research

Invocation

Context Preview

SKILL.md

Cross-Validation with Codex

Similar Skills

Cross-Validation with Codex

Similar Skills