Skill

linkedin-datasearch

Bulk LinkedIn profile enrichment via the Bright Data Datasets API. Input a list of profile URLs, get structured profile data (name, headline, current company, location, about, experience), write each profile to the tracker via Supabase MCP. Use for batch enrichment of profile URLs you already have — NOT for live discovery (use `linkedin-browser-automation` for that) and NOT for searching existing tracker data (query the tracker directly via Supabase MCP).

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/linkedin-tracker-plugin:linkedin-datasearch

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Batch profile enrichment for LinkedIn URLs you already have. Async: POST trigger → poll → GET results → write to `prospects`.

SKILL.md

227 lines · ~2.4k tokens

Stats

Stars0

MaintenanceGood

Last CommitApr 24, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

LinkedIn Datasearch (Bright Data)

Batch profile enrichment for LinkedIn URLs you already have. Async: POST trigger → poll → GET results → write to prospects.

Use when: you have a list of profile URLs and need structured data at scale (10s-1000s of profiles). Do not use for:

Live discovery of new prospects → use linkedin-browser-automation (has access to alumni filters, 2nd-degree signals, mutuals)
Querying your existing tracker rows → talk to Supabase MCP directly

Prerequisites

Both env vars must be set on the machine running this skill:

BRIGHTDATA_API_TOKEN  # Bright Data API token (platform-level, not per-user)
BRIGHTDATA_ZONE       # Zone or dataset identifier for LinkedIn People Profiles

Verify at skill activation:

!printenv BRIGHTDATA_API_TOKEN >/dev/null 2>&1 && echo "TOKEN_SET" || echo "TOKEN_MISSING" !printenv BRIGHTDATA_ZONE >/dev/null 2>&1 && echo "ZONE_SET" || echo "ZONE_MISSING"

If either is missing, stop and tell the user. Do not prompt them to paste the token into chat — tokens stay in the shell environment only.

Note on BRIGHTDATA_ZONE: the env var is named ZONE but the linkedin-tracker docs/CODEMAPS/api.md describes an async dataset API (POST trigger → poll → GET results). Bright Data's Web Unlocker uses "zones"; the Datasets API uses dataset_id. If the actual integration turns out to be Web Unlocker (proxy-based, you fetch LinkedIn HTML through their infra), the flow below needs to be replaced with a proxy request pattern. Ask the user which one they provisioned before writing the first request.

Cost and Quota Awareness

Bright Data bills per-profile. Every POST trigger costs money. Rules:

Dedupe before triggering. Query prospects via Supabase MCP for rows where profile_url is in the input batch. Exclude already-enriched rows (where enriched_at IS NOT NULL and updated within the last 30 days).
Cap batch size at 50 unless the user explicitly approves larger. Bright Data snapshots can run for many minutes on big batches.
Report estimated cost before triggering. Example: "Triggering enrichment for 27 profiles. Est. cost at $0.XX/profile = $Y.YY. Proceed?" — wait for confirmation on batches ≥ 10.
Log every snapshot_id to a file (~/.linkedin-datasearch.log or the tracker's notes table) so failed runs can be recovered without re-triggering.

Configuration

Shared config file (same as linkedin-browser-automation):

!cat ~/.linkedin-automation.config.yml 2>/dev/null || echo "NOT_CONFIGURED"

If NOT_CONFIGURED, onboard per linkedin-browser-automation's first-run flow, then return here.

This skill reads:

tracker.backend (must be supabase for MCP writes; other backends work but add steps)
tracker.supabase_project_id
user.school / user.school_slug (used to mark is_alumni = true when enriched profile shows the same school)

Workflow

Step 1 — Collect profile URLs

Input sources:

User-pasted list of URLs
Output of a prior linkedin-browser-automation session (rows in prospects with about IS NULL or enriched_at IS NULL)
A file the user points at

Normalize: strip query strings, ensure https://www.linkedin.com/in/<slug>/ form. Drop duplicates.

Step 2 — Dedupe against existing tracker rows

Via Supabase MCP:

supabase.execute_sql
  project_id: <from config>
  query: |
    SELECT profile_url, enriched_at
    FROM public.prospects
    WHERE profile_url = ANY($1::text[])
      AND user_id = $user_id;

Exclude:

URLs that already exist and were enriched_at within 30 days (considered fresh)
URLs that already exist and have about IS NOT NULL and full profile data (already enriched)

Keep:

New URLs
Stale enrichments (> 30 days old) if the user wants refresh (ask)

Step 3 — Trigger Bright Data snapshot

POST https://api.brightdata.com/datasets/v3/trigger?dataset_id=$BRIGHTDATA_ZONE&include_errors=true
Headers:
  Authorization: Bearer $BRIGHTDATA_API_TOKEN
  Content-Type: application/json
Body:
  [
    {"url": "https://www.linkedin.com/in/example-user-1/"},
    {"url": "https://www.linkedin.com/in/example-user-2/"},
    ...
  ]

Expected response:

{"snapshot_id": "s_example_abc123"}

Save snapshot_id immediately (log to file + share with user). If the session dies, you can resume from the snapshot.

Step 4 — Poll for completion

GET https://api.brightdata.com/datasets/v3/progress/<snapshot_id>
Headers: Authorization: Bearer $BRIGHTDATA_API_TOKEN

Responses: {"status": "running"} | {"status": "ready"} | {"status": "failed", "error": "..."}.

Poll every 30-60 seconds. Do not poll more aggressively — burns quota and does not speed up the snapshot. Reasonable timeout: 10 minutes for a 50-URL batch.

Step 5 — Download and parse

GET https://api.brightdata.com/datasets/v3/snapshot/<snapshot_id>?format=json
Headers: Authorization: Bearer $BRIGHTDATA_API_TOKEN

Response is a JSON array. Representative schema (field names may vary by Bright Data dataset version — verify against the first live response and adapt):

[
  {
    "url": "https://www.linkedin.com/in/example-user/",
    "name": "Example Name",
    "headline": "Software Engineer at Acme",
    "current_company": {"name": "Acme Corp", "title": "Software Engineer"},
    "location": "Irvine, CA",
    "about": "...",
    "experience": [...],
    "education": [...]
  }
]

Step 6 — Write to `prospects` via Supabase MCP

For each profile in the response, upsert into public.prospects. Field mapping:

Bright Data field	`prospects` column
`url`	`profile_url`
`name`	`name`
`headline`	`position` (take first segment before " at ")
`current_company.name`	`company`
`location`	`location`
`about`	`about`
`education[].school`	derive `is_alumni = true` if matches config `user.school` (case-insensitive)

Always set:

source = 'enrichment:brightdata'
enriched_at = now()
enrichment_source = 'brightdata'
user_id = <user's UUID from config or user_profiles lookup>
connection_status = 'To Review' (if inserting a new row; do not overwrite existing status on update)

Upsert SQL pattern:

INSERT INTO public.prospects (
  user_id, name, company, position, profile_url, location,
  about, source, enrichment_source, enriched_at,
  connection_status, is_alumni
) VALUES (
  $user_id, $name, $company, $position, $profile_url, $location,
  $about, 'enrichment:brightdata', 'brightdata', now(),
  'To Review', $is_alumni
)
ON CONFLICT (profile_url, user_id)
DO UPDATE SET
  about = EXCLUDED.about,
  location = EXCLUDED.location,
  position = EXCLUDED.position,
  enriched_at = now(),
  enrichment_source = 'brightdata',
  updated_at = now()
RETURNING id;

Note: the ON CONFLICT (profile_url, user_id) clause requires a unique constraint. Check list_tables output or query pg_constraint to confirm before relying on this pattern. If the constraint doesn't exist, either (a) add it via a migration (check with the user first — it's a schema change on production), or (b) do manual upsert with a SELECT-then-INSERT-or-UPDATE pattern.

Step 7 — Verify writes

After the batch completes:

SELECT profile_url, name, enriched_at
FROM public.prospects
WHERE enriched_at > (now() - interval '10 minutes')
  AND user_id = $user_id
  AND enrichment_source = 'brightdata';

Report: N profiles enriched. M new prospects. K updated. L skipped (already fresh).

Failure Modes

Failure	Response
`TOKEN_MISSING` / `ZONE_MISSING`	Stop. Tell user to `export BRIGHTDATA_API_TOKEN=…` and `export BRIGHTDATA_ZONE=…`.
HTTP 401 from Bright Data	Token expired or invalid. Stop.
HTTP 429	Rate-limited. Back off 60s, retry once, then stop and report.
Snapshot status `failed`	Log the error body, report to user, do not retry automatically.
Snapshot still `running` after 15 min	Stop polling. Log `snapshot_id`. Tell user to check later; the skill can pick up the ID via `SELECT` on the log file.
Profile URL returns no data (private/deleted)	Mark prospect with `tags = ['enrichment_failed']` so it's easy to filter. Do not retry.
Supabase RLS denies the insert	`user_id` is wrong. Re-query `user_profiles` for the correct UUID.

Integration With `linkedin-browser-automation`

Typical combined flow:

Discovery (browser skill) — Find 20 alumni at target companies. Writes rows with name, profile_url, thin connection_notes, minimal hooks.
Enrichment (this skill) — Take the 20 profile URLs from step 1, run through Bright Data, populate about, position, location, experience. Marks enriched_at.
Re-draft (browser skill again, or manual) — With the richer data, regenerate 5 connection notes per prospect using specific hooks from about and experience.
Send (human) — User reviews + sends on LinkedIn.

The two skills share the same prospects table and the same user. They do not need to be invoked together — each is useful alone.

linkedin-datasearch

Invocation

Context Preview

SKILL.md

linkedin-datasearch

Invocation

Context Preview

SKILL.md

LinkedIn Datasearch (Bright Data)

Prerequisites

Cost and Quota Awareness

Configuration

Workflow

Step 1 — Collect profile URLs

Step 2 — Dedupe against existing tracker rows

Step 3 — Trigger Bright Data snapshot

Step 4 — Poll for completion

Step 5 — Download and parse

Step 6 — Write to `prospects` via Supabase MCP

Step 7 — Verify writes

Failure Modes

Integration With `linkedin-browser-automation`

Similar Skills

LinkedIn Datasearch (Bright Data)

Prerequisites

Cost and Quota Awareness

Configuration

Workflow

Step 1 — Collect profile URLs

Step 2 — Dedupe against existing tracker rows

Step 3 — Trigger Bright Data snapshot

Step 4 — Poll for completion

Step 5 — Download and parse

Step 6 — Write to `prospects` via Supabase MCP

Step 7 — Verify writes

Failure Modes

Integration With `linkedin-browser-automation`

Similar Skills

linkedin-datasearch

Invocation

Context Preview

SKILL.md

linkedin-datasearch

Invocation

Context Preview

SKILL.md

LinkedIn Datasearch (Bright Data)

Prerequisites

Cost and Quota Awareness

Configuration

Workflow

Step 1 — Collect profile URLs

Step 2 — Dedupe against existing tracker rows

Step 3 — Trigger Bright Data snapshot

Step 4 — Poll for completion

Step 5 — Download and parse

Step 6 — Write to prospects via Supabase MCP

Step 7 — Verify writes

Failure Modes

Integration With linkedin-browser-automation

Similar Skills

LinkedIn Datasearch (Bright Data)

Prerequisites

Cost and Quota Awareness

Configuration

Workflow

Step 1 — Collect profile URLs

Step 2 — Dedupe against existing tracker rows

Step 3 — Trigger Bright Data snapshot

Step 4 — Poll for completion

Step 5 — Download and parse

Step 6 — Write to prospects via Supabase MCP

Step 7 — Verify writes

Failure Modes

Integration With linkedin-browser-automation

Similar Skills

Step 6 — Write to `prospects` via Supabase MCP

Integration With `linkedin-browser-automation`

Step 6 — Write to `prospects` via Supabase MCP

Integration With `linkedin-browser-automation`