Skill

live-research

Produces cited research briefs from live web data using Bright Data's Discover API by decomposing questions into multiple intent-ranked queries and synthesizing findings with inline citations.

data-engineering

Popularity

Stars

201

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/brightdata-plugin:live-research

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Turn one research question into a **cited, synthesized brief** by fanning out

Supporting Files

references/brief-template.mdscripts/merge_corpus.sh

SKILL.md

139 lines · ~1.8k tokens

Stats

LanguageShell

Stars201

Forks30

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Bright Data — Live Research

Turn one research question into a cited, synthesized brief by fanning out intent-ranked Discover queries, reading the best sources, and writing up findings with inline citations. This is a workflow on top of the discover-api skill — read that for the API mechanics, modes, and parameters.

Use this when the deliverable is understanding (a report/briefing), not a link list (that's search/discover-api) and not a standing system (that's rag-pipeline).

Setup gate

Discover must be reachable. Quick check (CLI path):

command -v bdata >/dev/null 2>&1 || echo "CLI missing — see bright-data-best-practices/references/cli-setup.md"
bdata zones >/dev/null 2>&1 || echo "not authenticated — run: bdata login"

(SDK/REST paths just need BRIGHTDATA_API_TOKEN.)

The method

Step 1 — Scope the question (do this first, don't skip)

If the question is broad or ambiguous, ask 2–3 clarifying questions before spending API calls: time horizon, geography/market, depth, and what decision the research supports. A sharp scope is what makes the intent parameters good.

Step 2 — Decompose into sub-questions

Break the topic into 4–8 angles (definitions, key players, mechanisms, evidence, counter-evidence, recent developments, risks). Each angle becomes one Discover call with its own tailored intent. This beats one broad query — num_results is capped at 20, so coverage comes from breadth of queries, not one big call.

Step 3 — Run Discover per angle (in parallel), with content

# one call per angle; --include-content so you read sources in the same pass
bdata discover "stablecoin regulation 2026" \
  --intent "recent regulatory actions and proposed legislation, primary sources" \
  --include-content --num-results 15 -o angle_regulation.json &

bdata discover "stablecoin reserve transparency" \
  --intent "audits, attestations, reserve composition disclosures" \
  --include-content --num-results 15 -o angle_reserves.json &
wait

For maximum coverage on a hard topic, use the raw REST flow with "mode":"deep" (see discover-api) — deep is exhaustive but slower and REST-only.

Step 4 — Merge, dedup, rank, quality-gate

Each bdata discover -o file is an object {status, results: [...]} — flatten .results[] from every file before merging.
Dedup by URL (normalize: strip query/fragment, lowercase host).
Sort by relevance_score desc.
Quality-gate the content (a high relevance_score can still be a 404 stub or a nav-only page): drop rows where content is null, matches a block-page signature, is shorter than ~200 chars, or looks like "not found".

# VERIFIED: this is the correct merge. `jq -s 'add | unique_by(.link)'` does NOT work —
# each file is {results:[...]}, so you must flatten .results[] first.
jq -s '
  [ .[].results[] ]                                   # flatten results from all files
  | unique_by(.link)                                  # dedup by URL
  | map(select(
      .content != null
      and (.content | length) > 200                   # drop empty / 404 stubs
      and ((.content | test("just a moment|captcha|access denied|cf-browser|page not found|post not found"; "i")) | not)
    ))
  | sort_by(-.relevance_score)
' angle_*.json > corpus.json
echo "kept $(jq length corpus.json) sources"

Or just run the helper (same logic, tested): scripts/merge_corpus.sh -o corpus.json angle_*.json (-m <n> sets the min content length). Copying the jq by hand is error-prone — prefer the script.

Note: with --include-content, the leading part of content is usually page nav/boilerplate (menus, logos). When extracting claims (Step 5), skip past the chrome to the article body.

Step 5 — Read & extract claims

From each kept source's content, pull the specific claims, numbers, dates, and quotes that answer a sub-question. Track which URL each claim came from — you'll cite it.

Step 6 — Synthesize the brief

Write the structured brief (template in references/brief-template.md). Every non-obvious claim gets an inline citation [n] mapping to a numbered source list. Note disagreements between sources rather than averaging them away.

Step 7 — Verify before delivering

Every claim traceable to a source in the list? (no orphan claims)
Conflicting sources surfaced, not hidden?
Gaps named explicitly ("no primary source found for X")?
Recency stated — when was this collected, how fresh are the sources?

Quality bar

Breadth via queries, depth via content. Many sharp intents > one vague query.
Cite everything. A research brief with uncited claims is an opinion. Map each [n] to a real URL from the corpus.
Prefer primary sources. Rank filings/docs/announcements over aggregators when relevance_score is comparable.
Surface dissent. If sources conflict, say so and attribute both sides.
Name the gaps. "Couldn't find …" is a finding, not a failure to hide.

Red flags

One broad Discover call and calling it "live research" — decompose into angles.
Writing claims from memory/training data instead of from retrieved content — every claim must come from the corpus.
Fabricating citations or relevance_scores — if a call failed, report the gap.
Ignoring --include-content and just listing links — that's discover-api, not research.
Averaging away contradictions between sources.
Forgetting to dedup — the same article via 3 aggregators inflates apparent consensus.

References

references/brief-template.md — the output structure (exec summary, findings per sub-question, contradictions, gaps, numbered sources) and a worked citation example.
scripts/merge_corpus.sh — Step 4 as a tested one-liner: flatten .results[] across angle files, dedup by URL, quality-gate (null/short/404/block-page), sort by relevance_score.

Related skills

discover-api — the underlying API (modes, params, trigger/poll). Read first.
rag-pipeline — when the user wants a reusable retrieval system, not a one-time report.
competitive-intel — competitor-focused research (pricing, hiring, positioning).
brand-listening — social/sentiment research across Reddit/X/TikTok/etc.

live-research

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

live-research

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Bright Data — Live Research

Setup gate

The method

Step 1 — Scope the question (do this first, don't skip)

Step 2 — Decompose into sub-questions

Step 3 — Run Discover per angle (in parallel), with content

Step 4 — Merge, dedup, rank, quality-gate

Step 5 — Read & extract claims

Step 6 — Synthesize the brief

Step 7 — Verify before delivering

Quality bar

Red flags

References

Related skills

Similar Skills

Bright Data — Live Research

Setup gate

The method

Step 1 — Scope the question (do this first, don't skip)

Step 2 — Decompose into sub-questions

Step 3 — Run Discover per angle (in parallel), with content

Step 4 — Merge, dedup, rank, quality-gate

Step 5 — Read & extract claims

Step 6 — Synthesize the brief

Step 7 — Verify before delivering

Quality bar

Red flags

References

Related skills

Similar Skills