From creatorwood-sushidata-gtm
Discover niche first-party signals that differentiate Closed Won vs Closed Lost accounts for ICP analysis. Use when the user provides won/lost customer domain lists and wants differential signals (website content, job listings, tech stack, maturity markers) to build account scoring models and prospecting criteria. Triggers: ICP analysis, niche signals, won vs lost analysis, differential signals, signal discovery, ICP signal report, account scoring signals, lead scoring, first-party signals, buyer signals.
How this skill is triggered — by the user, by Claude, or both
Slash command
/creatorwood-sushidata-gtm:niche-signal-discoveryThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Discover differential signals between Closed Won and Closed Lost accounts by extracting multi-page website content and job listings, then computing Laplace-smoothed lift scores to identify what distinguishes buyers from non-buyers.
Discover differential signals between Closed Won and Closed Lost accounts by extracting multi-page website content and job listings, then computing Laplace-smoothed lift scores to identify what distinguishes buyers from non-buyers.
/swarm/deploy/ (BASE URL in sushi-research SKILL.md)hunter_domain_search, hunter_email_finder, hunter_email_verify.0. Discover target company (what they sell, who they sell to) via Sushidata swarm
0.5. Discover ecosystem (competitors, tech stack, buyer personas) via Sushidata swarm
1. Prepare input CSV (deduplicate within won/lost groups)
1.0.5 Build "do not re-contact" index from user's existing list (scripts/dedupe_utils.py)
1.5. Generate vertical-specific configs (keywords, tools, job roles)
2. Multi-page website + job extraction (WebFetch + Apify)
3. Quality gate — verify file completeness + coverage (>80%)
3.5. Review configs against enriched data
4. Differential analysis (scripts/analyze_signals.py)
5. Generate report — every top signal must include cited evidence
6. Signal interpretation review
7. Top 10 net-new prospects [REQUIRED] + contacts/emails [optional, costs credits]
Step 7 is required. A signal report without 10 actionable companies forces the reader to do their own prospecting pass — exactly the expensive thing they wanted to skip. Contacts/emails are optional only because they cost extra; always offer them.
Highest → lowest confidence:
When website signals fail: For B2B back-office tools (AR, billing, compliance), buyers don't publish their pain on marketing pages. Prioritize jobs + tech stack + firmographics for these verticals.
CRM fields populated by AE activity — catalyst note count, OCR-derived counts, MEDDPICC picklists, any "did the AE do X on this opp" field — correlate with win-rate as engagement artifacts, not causal signals. They get filled in after the AE decides an opp is worth working. Never use them as scoring inputs.
Rule of thumb: every scoring input must be observable BEFORE the AE touches the account. Read references/scoring-pitfalls.md for the full list.
Do this FIRST. The entire pipeline adapts based on this discovery; skipping it produces generic/irrelevant signals.
Deploy a Sushidata research swarm:
POST /swarm/deploy/
{
"query": "Research {{company-domain}}. Summarize: (1) what the company sells, (2) who they sell to and what buyer personas, (3) what makes them different from competitors, (4) example customers. Be specific — avoid generic descriptions.",
"swarmSize": 3
}
Document: (1) product category, (2) target buyer persona, (3) key differentiation, (4) example customers.
Three Sushidata swarms (swarmSize 3 each) or WebSearch queries:
"{{product category}} software alternatives competitors site:g2.com OR site:capterra.com" → 3-5 names"{{buyer persona}} software stack tools" → 10-15 tools by category"{{buyer persona}} job titles seniority VP director manager" → 10-15 title variationsThese feed Step 1.5 config generation.
domain,status
customer1.com,won
non-customer1.com,lost
Deduplicate within the input. If a domain appears in BOTH won and lost, remove ALL rows for cross-group domains before enrichment:
from collections import Counter
counts = Counter(r['domain'] for r in rows)
duplicate_domains = {d for d, c in counts.items() if c > 1}
# Drop every row in duplicate_domains, not just one copy.
Before any prospects ship in Step 7, dedupe candidates against whatever "already known" list the user provides. Always ask explicitly; if the user has no list, note it as a caveat in the final report.
Order: apex domain first, fuzzy company name as fallback. Use the shipped helper:
python3 scripts/dedupe_utils.py --selftest # one-time sanity check
python3 scripts/dedupe_utils.py \
--existing customers.csv --candidates prospects_raw.csv \
--out-actionable prospects_actionable.csv --out-matched already_known.csv
Don't silently drop matches — categorize them: Net-new / Account-only / Re-engage / Active-open / Current-customer.
Read references/dedupe.md for the failure modes (raw-string match missing amsynergy.nikon.com → nikon.com cost 24 of 50 prospects in one run).
Create three JSON files in output/{{company}}/:
{{company}}-keywords.json # product category, pain language, competitor names, maturity terms
{{company}}-tools.json # niche SaaS tools by category
{{company}}-job-roles.json # buyer persona job titles
Read references/keyword-catalog.md for the JSON schema, generation patterns, and multi-vertical examples.
Never scrape just the homepage. Discover multiple relevant pages first, then extract content.
Step 2a — Discover pages (WebSearch):
WebSearch: site:{{domain}} product OR features OR integrations OR customers OR security OR pricing OR careers OR about
Collect the top 5-8 URLs. Adapt by vertical: add compliance OR audit for back-office, documentation OR api for developer tools.
Step 2b — Scrape pages (WebFetch):
For each discovered URL, call WebFetch to extract the page content. If WebFetch returns a shell with no real content (JavaScript-rendered page), use Browser Rendering. If a dedicated web crawler actor is required, follow the missing-actor feedback workflow in skills/sushi-research/provider-playbooks/apify.md.
Aggregate all page text into a per-domain JSON record stored in the website CSV column:
{
"data": {
"results": [
{"url": "https://example.com/pricing", "title": "Pricing", "text": "<page text>"},
{"url": "https://example.com/security", "title": "Security", "text": "<page text>"}
]
}
}
Step 2c — Job listings:
Use WebSearch, WebFetch, Browser Rendering, and focused Sushidata swarms to collect job listings from company career pages and public job boards. Sushidata does not currently expose a general job-listings Apify actor. If a dedicated jobs actor is required, follow the missing-actor feedback workflow in skills/sushi-research/provider-playbooks/apify.md.
Store results in the jobs CSV column as:
{"result": {"listings": [{"title": "...", "description": "...", "url": "..."}]}}
Total estimated cost: ~$0.50–1.00/company. Get user approval. Example: "60 companies × ~$0.75 = ~$45."
Verify CSV completeness before running the analysis script:
INPUT_ROWS=$(wc -l < output/{{company}}-icp-input.csv)
OUTPUT_ROWS=$(wc -l < output/{{company}}-enriched.csv)
echo "Input: $INPUT_ROWS, Output: $OUTPUT_ROWS" # should match
Then spot-check: won rows have job data, website coverage >80%, avg content depth 6-8 pages / 12-20K chars per company.
Read references/quality-gate.md for the full verification script and the "auto-extracted domain validation" check that has caught up to 53% false-positive rates in CRM-exported customer lists.
Open the enriched CSV and inspect a sample of 5-10 rows. Check that website columns contain multi-page content and job listings are present for ≥60% of won accounts.
Red flags:
Fix and regenerate configs if needed.
The analyze_signals.py script expects a CSV with website and jobs columns (JSON-formatted as in Step 2). Use --website-col and --jobs-col to pass column indices explicitly:
python3 scripts/analyze_signals.py \
--input output/{{company}}-enriched.csv \
--keywords output/{{company}}-keywords.json \
--tools output/{{company}}-tools.json \
--job-roles output/{{company}}-job-roles.json \
--website-col <N> \
--jobs-col <M> \
--output output/{{company}}-analysis.json
The script computes substring-match presence, Laplace-smoothed lift, source breakdown (website/jobs/both), tech-stack mentions, job-role prevalence, anti-fit signals, and per-keyword evidence quotes (±40 chars with URLs).
Read references/report-template.md for the full report structure. Critical rules:
15% (6), not just 15%); sample sizes in headers (Won (n=37))Read references/signal-interpretation.md before writing interpretation columns.
10 companies are required for every run; contacts + emails are optional. Always offer contact discovery; only run it if the user approves the spend.
Companies only (no extra cost):
Deploy a Sushidata research swarm to find ICP-matching prospects:
POST /swarm/deploy/
{
"query": "Find 15 companies that match this buyer profile: {{top 3 signals from analysis}}. Target vertical: {{vertical}}. Headcount: {{range}}. Exclude: {{input list domains}}. For each company: name, domain, which signals they exhibit, estimated headcount, why they're a fit.",
"swarmSize": 8
}
Then dedupe against the user's existing list and deliver the top 10:
python3 scripts/dedupe_utils.py \
--existing customers.csv --candidates prospects_raw.csv \
--out-actionable prospects_actionable.csv --out-matched already_known.csv
Companies + contacts + emails (Hunter credits required):
Ask for approval, then for each of the top 10 companies:
hunter_domain_search domain={{domain}} → returns known contacts with titleshunter_email_finder domain={{domain}} first_name={{first}} last_name={{last}}hunter_email_verify email={{email}} — non-send states: invalid, accept_all, webmail, disposable, blocked, or otherwise non-valid → mark "(email not found)"skills/sushi-research/provider-playbooks/apify.md.Read references/step-7-prospects.md for the required output fields, prospect-card skeleton, and the "10 is a ceiling, not a floor" guidance.
amsynergy.nikon.com ≠ nikon.com. Always use extract_apex() from scripts/dedupe_utils.py.references/scoring-pitfalls.md.references/keyword-catalog.md — JSON schema + multi-vertical examples for Step 1.5references/dedupe.md — Step 1.0.5 dedupe failure modes, categorization rules, library usagereferences/quality-gate.md — Step 3 verification scripts, auto-extracted-domain validationreferences/report-template.md — Step 5 full report structure, signal-strength scale, all quality rulesreferences/signal-interpretation.md — Step 6 buyer-vs-seller-vs-competitor rulesreferences/step-7-prospects.md — Step 7 prospect-card skeleton, apex validationreferences/scoring-pitfalls.md — Confirmation-biased CRM fields to exclude from scoringreferences/pitfalls.md — Full 18-item pitfalls listreferences/proven-signals.md — Typical lift ranges + 0-100 scoring model guidancescripts/analyze_signals.py — Step 4 differential analysis. Use --website-col/--jobs-col for column positions.scripts/dedupe_utils.py — Step 1.0.5 deduplication. extract_apex(), norm_name(), match_against_existing(). Stdlib only. --selftest for one-time verification.Save signal analysis results at two points:
After Step 5 (report complete):
POST /context/
{
"serverId": "26",
"content": "ICP signal analysis complete — {{target company}}. Dataset: {{won_count}} won + {{lost_count}} lost. Top 3 positive signals: {{signal 1 (lift)}} / {{signal 2 (lift)}} / {{signal 3 (lift)}}. Top anti-fit signals: {{list}}. Scoring model: 60+ = Tier 1, 35-59 = Tier 2, <35 = nurture. Full analysis JSON: {{output path}}.",
"messageId": "msg-{{Date.now()}}",
"userId": "claude-user",
"username": "Claude",
"createdDate": "<new Date().toISOString() — exact UTC timestamp, never local time or an approximation>",
"channelId": "claude-session",
"threadId": "<cowork-session-id>"
}
After Step 7 (prospects delivered):
POST /context/
{
"serverId": "26",
"content": "Top 10 prospects delivered for {{target company}} ICP. Companies: {{list of domains}}. Contacts found: {{count}}. Dedupe status: {{count net-new vs already-known}}. Signals used for scoring: {{top 3 signals from report}}.",
"messageId": "msg-{{Date.now()}}",
"userId": "claude-user",
"username": "Claude",
"createdDate": "<new Date().toISOString() — exact UTC timestamp, never local time or an approximation>",
"channelId": "claude-session",
"threadId": "<cowork-session-id>"
}
npx claudepluginhub georgeportillo/mitratech-sushidata-plugin --plugin sushidata-gtm-mitratechSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.