Skill

Multi-Track Parallel Research Pipeline

Architecture pattern for building autonomous research agents using Claude Code CLI. Each agent runs N parallel research tracks, cross-pollinates findings, synthesizes insights, and produces a structured report.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/autonomous-research-pipeline:research-pipeline

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

reference/architecture.mdreference/example_queries.md

SKILL.md

295 lines · ~2.6k tokens

Stats

LanguagePython

Parent stars0

MaintenanceFair

Last CommitMar 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Multi-Track Parallel Research Pipeline

Why Multi-Track Parallel Research

Single-track research produces narrow, confirmation-biased results. Multi-track parallel research produces broader, more structurally coherent intelligence because:

Source diversity: Each track searches different source types (news, academic papers, forums, social media), preventing single-source bias.
Query diversity: Each track uses different query strategies, catching signals that a single query approach would miss.
Failure isolation: If one track fails (API timeout, bad results), others continue. Synthesis works with whatever data is available.
Time efficiency: N tracks running in parallel take the same wall-clock time as 1 track.
Cross-pollination: The synthesis step finds connections BETWEEN tracks that no single track would discover alone. This is where the highest-value insights emerge.

Architecture Overview

Phase 1: Parallel Research (N tracks, each is a separate Claude Code CLI invocation)
    Track A ──→ track_a.json
    Track B ──→ track_b.json
    Track C ──→ track_c.json
    Track D ──→ track_d.json
         (all run simultaneously via bash & + wait)

Phase 2: Cross-Pollination + Synthesis (sequential, reads all track outputs)
    Read all track JSONs ──→ Find connections ──→ synthesis.json

Phase 3: Report Generation (sequential, reads synthesis)
    Read synthesis.json ──→ Write structured report ──→ report.md

Phase 4: Notification + Distribution (optional)
    Send summary via Telegram / email / webhook

How to Define Tracks

Each track is a single Claude Code CLI invocation with:

A specific research focus (topic, domain, source type)
Its own query strategy (different search terms, different source priorities)
Its own output file (JSON with structured findings)
A defined set of allowed tools (WebSearch, WebFetch, Read, Write, etc.)

Track Definition Template

Each track's prompt should follow this pattern:

claude -p "Read {project}/.claude/CLAUDE.md and {project}/.claude/rules/source_registry.md. \
Then execute the research skill at {project}/.claude/skills/research-{track-name}/SKILL.md \
for date $DATE. Save output to {project}/.tmp/$DATE/{track_name}.json" \
    --max-turns 15 \
    --allowedTools "WebSearch,WebFetch,Read,Write,Glob,Grep,Bash"

Track Design Principles

Different query strategies per track: Track A searches official sources, Track B searches community forums, Track C searches academic papers, Track D searches social media. Overlap is waste.
Different languages per track: For international topics, dedicate tracks to different language sources (EN, JP, etc.).
Different source types per track: News, government data, academic, social media, industry reports. Each reveals different signal types.
Consistent output schema: All tracks output the same JSON structure so the synthesis step can process them uniformly.

Example Output Schema

{
  "track": "community-pain-points",
  "date": "20260312",
  "query_count": 8,
  "findings": [
    {
      "title": "Finding title",
      "summary": "2-3 sentence summary of the finding",
      "source_url": "https://...",
      "source_type": "reddit|arxiv|news|government|social",
      "relevance_score": 8,
      "raw_quotes": ["Direct quote from source"],
      "tags": ["tag1", "tag2"]
    }
  ],
  "meta": {
    "search_queries_used": ["query1", "query2"],
    "sources_consulted": 12,
    "execution_time_minutes": 4
  }
}

How to Run Tracks in Parallel

Use bash background processes with PID tracking and wait:

# Track A
claude -p "..." --max-turns 15 --allowedTools "..." \
    2>&1 | tee -a "$LOG_DIR/track_a.log" &
PID_A=$!

# Track B
claude -p "..." --max-turns 15 --allowedTools "..." \
    2>&1 | tee -a "$LOG_DIR/track_b.log" &
PID_B=$!

# Track C
claude -p "..." --max-turns 15 --allowedTools "..." \
    2>&1 | tee -a "$LOG_DIR/track_c.log" &
PID_C=$!

# Track D
claude -p "..." --max-turns 15 --allowedTools "..." \
    2>&1 | tee -a "$LOG_DIR/track_d.log" &
PID_D=$!

# Wait for all tracks (|| true so one failure doesn't kill the pipeline)
wait $PID_A || log "WARNING: Track A failed"
wait $PID_B || log "WARNING: Track B failed"
wait $PID_C || log "WARNING: Track C failed"
wait $PID_D || log "WARNING: Track D failed"

Key points:

& backgrounds each invocation
$! captures the PID of the last backgrounded process
wait $PID blocks until that specific process completes
|| log "WARNING: ..." handles failures gracefully without killing the pipeline
tee -a logs output while still streaming to stdout

Cross-Pollination Pattern

The synthesis step is the most valuable phase. It reads ALL track outputs and finds connections between them.

Synthesis Prompt Template

claude -p "Read {project}/.claude/CLAUDE.md and relevant rules. \
Read all research track outputs in {project}/.tmp/$DATE/: \
track_a.json, track_b.json, track_c.json, track_d.json. \
Cross-pollinate findings: identify connections between tracks, \
contradictions worth investigating, and emergent patterns that \
no single track reveals alone. \
Check dedup registry at {project}/.tmp/registry/history.json. \
Write synthesis.json and report.md to {project}/.tmp/$DATE/." \
    --max-turns 20 \
    --allowedTools "WebSearch,WebFetch,Read,Write,Edit,Glob,Grep,Bash"

What Cross-Pollination Looks For

Convergence: Multiple tracks independently discovered the same signal from different sources. High-confidence finding.
Contradiction: Track A found X, Track B found the opposite. Worth investigating deeper.
Gap-bridging: Track A found a problem, Track C found a solution from a different domain. Neither track would connect them alone.
Pattern emergence: Across all tracks, a meta-pattern appears (e.g., "three separate industries are all hitting the same regulatory constraint").

Synthesis Output Schema

{
  "date": "20260312",
  "cross_pollinations": [
    {
      "type": "convergence|contradiction|gap_bridge|pattern",
      "tracks_involved": ["track_a", "track_c"],
      "insight": "Description of the cross-track connection",
      "supporting_findings": ["finding_id_1", "finding_id_2"],
      "confidence": "high|medium|low"
    }
  ],
  "key_insights": [
    {
      "title": "Insight title",
      "analysis": "Detailed structural analysis",
      "implications": "What this means going forward"
    }
  ]
}

Dedup Registry Pattern

Prevent repeating the same findings across days. Maintain a JSON registry with TTL (time-to-live).

Registry Structure

{
  "seen_topics": {
    "topic-fingerprint-hash": {
      "title": "Topic title",
      "first_seen": "20260301",
      "last_seen": "20260312",
      "count": 4
    }
  },
  "ttl_days": 14
}

How Dedup Works

Before synthesis, load the registry.
For each finding, generate a fingerprint (title keywords, normalized).
If the fingerprint exists in the registry AND was seen within the TTL window, flag as "seen before" and deprioritize (don't exclude entirely, as recurring signals may indicate structural trends).
If new, add to registry.
Prune entries older than TTL after each run.

Implementation Notes

The registry file is persistent (never deleted with .tmp/ cleanup).
Store at {project}/.tmp/registry/topic_history.json or similar.
TTL is configurable. 14 days is a good default for daily research.
The synthesis prompt should explicitly reference the registry: "Check dedup registry and deprioritize findings that appeared in the last 14 days unless they show significant evolution."

Example 4-Track Setup: Industry Research

Track A: Community Pain Points

Sources: Reddit, Stack Overflow, industry forums, Twitter/X
Query strategy: Search for complaints, frustrations, unmet needs in a target industry
Value: Reveals real problems people face (demand signals)

Track B: Academic & Technical Research

Sources: arXiv, Google Scholar, IEEE, government research reports
Query strategy: Search for recent papers, breakthroughs, emerging techniques
Value: Reveals supply-side innovation that could address pain points

Track C: Business Patterns & Startups

Sources: Crunchbase, TechCrunch, Product Hunt, business media
Query strategy: Search for new companies, funding rounds, product launches in the space
Value: Reveals who is already working on solutions and what gaps remain

Track D: Macro Context & Trends

Sources: Government statistics, think tank reports, macro economic analysis
Query strategy: Search for regulatory changes, demographic shifts, geopolitical factors
Value: Reveals structural forces that shape the opportunity landscape

Wiring Into a Cron Job

The research pipeline runs as a shell script triggered by cron. See the daily-cron-agent plugin for the complete cron template.

Basic cron entry:

# Industry research pipeline, 4 AM JST daily
0 4 * * * /home/user/project/automation/run_nightly.sh >> /home/user/project/.tmp/cron.log 2>&1

Pipeline Shell Script Structure

#!/bin/bash
set -euo pipefail

# 1. Resolve paths
# 2. Source environment
# 3. Create date-based log directory
# 4. Phase 1: Run N tracks in parallel
# 5. Phase 2: Run synthesis (sequential)
# 6. Phase 3: Send notification

See daily-cron-agent/reference/cron_template.sh for a complete, production-ready template.

Scaling Guidelines

Tracks	Max Turns per Track	Typical Wall Clock	API Cost Estimate
2	10	3-5 min	Low
4	15	5-10 min	Medium
6	15	8-15 min	Medium-high
8+	10	10-20 min	High

More tracks with fewer turns beats fewer tracks with many turns (breadth over depth per track).
The synthesis step should get more turns than individual tracks (it does the hardest work).
Keep --max-turns conservative to avoid runaway costs. 10-15 per track, 20-35 for synthesis.

Architecture pattern based on production pipelines running daily across 10+ autonomous research agents.

Multi-Track Parallel Research Pipeline

Invocation

Context Preview

Supporting Files

SKILL.md

Multi-Track Parallel Research Pipeline

Invocation

Context Preview

Supporting Files

SKILL.md

Multi-Track Parallel Research Pipeline

Why Multi-Track Parallel Research

Architecture Overview

How to Define Tracks

Track Definition Template

Track Design Principles

Example Output Schema

How to Run Tracks in Parallel

Cross-Pollination Pattern

Synthesis Prompt Template

What Cross-Pollination Looks For

Synthesis Output Schema

Dedup Registry Pattern

Registry Structure

How Dedup Works

Implementation Notes

Example 4-Track Setup: Industry Research

Track A: Community Pain Points

Track B: Academic & Technical Research

Track C: Business Patterns & Startups

Track D: Macro Context & Trends

Wiring Into a Cron Job

Pipeline Shell Script Structure

Scaling Guidelines

Similar Skills

Multi-Track Parallel Research Pipeline

Why Multi-Track Parallel Research

Architecture Overview

How to Define Tracks

Track Definition Template

Track Design Principles

Example Output Schema

How to Run Tracks in Parallel

Cross-Pollination Pattern

Synthesis Prompt Template

What Cross-Pollination Looks For

Synthesis Output Schema

Dedup Registry Pattern

Registry Structure

How Dedup Works

Implementation Notes

Example 4-Track Setup: Industry Research

Track A: Community Pain Points

Track B: Academic & Technical Research

Track C: Business Patterns & Startups

Track D: Macro Context & Trends

Wiring Into a Cron Job

Pipeline Shell Script Structure

Scaling Guidelines

Similar Skills