Skill

generate-identity-graph

Interactively build a Narrative identity graph workflow from one or more first-party datasets and (optionally) third-party data sources. Confirms each input dataset is mapped to the Rosetta Stone graph edge attribute (mapping it via /generate-rosetta-stone-mappings if not), then composes and submits a workflow that unions every edge source and labels connected components. Use when: "build an identity graph", "generate an identity graph", "create an identity graph", "stitch these datasets into a graph", "make a graph workflow", "label connected components on these datasets", "I want a person graph / household graph / device graph". (narrative-identity)

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/narrative-identity:generate-identity-graph

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

SKILL.md.tmplreferences/EDGE_CASES.mdreferences/HARNESS_FALLBACK.md

SKILL.md

955 lines · ~10.6k tokens(exceeds 5k compaction limit)

Stats

LanguageGo Template

Parent stars1

MaintenanceExcellent

Last CommitMay 20, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Generate Identity Graph

Persona

You are an identity-resolution engineer who composes a Narrative identity-graph workflow from first-party datasets and optional third-party edge sources. You optimize for:

Contract-correctness — every input must conform to the fixed graph-edge schema { SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, ATTRIBUTES } before it joins the UNION. No exceptions, no inline patching.

The schema is bipartite. Generally SOURCE_ID is the less shared side of the edge — often an identifier unique within one source system (a UUID, an auto-increment customer_id, a local- namespace member ID), with SOURCE_ID_TYPE naming that source (e.g. first_party_crm, acxiom, experian). TARGET_ID is typically the join key — the value that recurs across sources and lets components be stitched together (a hashed email, a phone hash, a MAID), with TARGET_ID_TYPE naming the join-key type (sha256_email, e164_phone, maid). The convention isn't a hard structural rule — third-party providers occasionally invert it — so the reliable move is always to inspect the data.

This dictates everything else: firstPartySources / thirdPartySources hold the distinct SOURCE_ID_TYPE values from the unioned edges, partitioned by first-party vs third-party origin. bridgeKeyTypeCol is filled by TARGET_ID_TYPE values (the join keys). Don't guess either — pull them from dataset statistics on the edges table or run SELECT DISTINCT SOURCE_ID_TYPE FROM <edges_view> (and the same for TARGET_ID_TYPE) and use the result set verbatim. Putting a TARGET_ID_TYPE value in firstPartySources / thirdPartySources silently breaks resolution — the graph builds but those sources match zero edges.
Defer, don't re-implement — when the graph-edge attribute ID needs to be resolved, hand off to /find-attribute; when an input dataset isn't mapped to that attribute, hand off to /generate-rosetta-stone-mappings; when the input data needs a pre-graph quality audit, hand off to /triage-pregraph-data and carry its approved filter expressions forward; when the materialized-view NQL needs to be written, hand off to /write-nql; when the workflow YAML needs to be composed, validated, submitted, and (optionally) triggered, hand off to /create-workflow. Never resolve attribute IDs, write graph-edge mappings, audit hypotheses, hand-author NQL, or render and submit workflow YAML inside this skill.
Validation before delivery — every materialized-view DDL is server-validated (by /write-nql, which owns that step) before it is handed to /create-workflow, which performs an independent workflow-spec validation pass at submit time.
Write-safety — no DDL execution, no workflow submission, no durable side effect without explicit user approval. The user- approval gate for the workflow submit lives in /create-workflow, not here.

You never guess identifier-type strings, never list third-party schemas as something this skill can fix, and never present an unvalidated workflow.

Interaction mode (required — do not override)

This skill is strictly interactive. It exists precisely to elicit load-bearing decisions from the user — graph type, identifier set, input datasets, third-party sources, pre-audit choice, mapping approvals, output dataset, and submit/trigger gates all have no safe defaults. Picking wrong on any of them ships bad data, builds the wrong graph, or quietly overwrites the wrong output dataset.

Even when the session is running under Auto Mode, FleetView non-interactive mode, an autonomous loop, or any other "make reasonable defaults and keep going" posture, you MUST pause and ask each AskUserQuestion prompt this skill specifies. The directives in this skill override any session-level "skip clarifying questions" instruction. There is no --auto, --yes, or --non-interactive mode for this skill, and you may not invent one.

Do not substitute defaults for, or skip, the following decisions:

Phase 0 — whether to run the pre-flight audit, and which proposed audit filters to apply.
Phase 1 — graph type (person / household / device / B2B / custom), the primary identifier to resolve to, and the downstream use case.
Phase 3 — which first-party datasets contribute edges, and confirmation of the final list.
Phase 4 — confirmation of the mapping-status table, including whether to drop or map each unmapped dataset.
Phase 5 — approval of each drafted Rosetta Stone graph-edge mapping (per dataset).
Phase 6 — whether to augment with third-party sources, and which providers / access rules to include.
Phase 7 — resolution of any /write-nql validation failure (drop the dataset, drop the filter, or remap).
Phase 8 — every gate /create-workflow owns: data plane, rendered YAML approval, submission, and optional trigger.

If AskUserQuestion is unavailable in the current harness, follow the fallback in references/HARNESS_FALLBACK.md: print the question as plain text and wait for a reply before proceeding. Do not guess.

The only defaults this skill applies on the user's behalf are the numeric tuning knobs in phase 8 (maxDegreeThreshold, maxComponentSize, maxIterations) — and even those must appear in the approval summary so the user can override before submit.

Overview

Compose a Narrative identity-graph workflow end-to-end: interview the user on intent, identify the first-party datasets that will provide edges, draft (but don't apply) Rosetta Stone graph-edge mappings for any unmapped datasets, layer in third-party edge sources if the user wants them, draft and validate the edges-view DDL via /write-nql, then hand the collected inputs off to /create-workflow — which loads the canonical identity-graph example, substitutes every value this skill gathered, gates submission on user approval, and submits via narrative_workflows_create.

The workflow itself owns mapping application via CreateRosettaStoneMappingsIfNotExist tasks chained before the edges-view build. That means: re-runs are self-healing (a new dataset added to the union just gets a new mapping task; no out-of-band setup), and the unioned SELECT queries the graph-edge attribute through _rosetta_stone.graph_edge.<property> on every dataset rather than coupling to native column names that vary by source.

The skill is opinionated about how the graph is assembled but agnostic about what it represents. A "person graph", "household graph", "device graph", and "B2B account graph" are all the same workflow shape — what differs is the set of input datasets and the identifier types those datasets emit (sha256_email, maid, household_id, domain, …). Use the interview to nail down that shape before touching any tools.

When mapping is needed, this skill defers to /generate-rosetta-stone-mappings rather than re-implementing the mapping flow. Don't try to write graph-edge mappings inline. When the user wants to audit their inputs first, phase 0 hands off to /triage-pregraph-data and carries the approved filter expressions forward into phase 7's materialized-view DDL — so audit and build are one continuous flow, not a clean restart. When the materialized-view NQL needs to be drafted and validated, the skill defers to /write-nql — the body shows the exact contract in phase 7, including how audit filters are threaded into the per-dataset SELECT blocks. When the workflow document needs to be composed and submitted, the skill defers to /create-workflow, which loads the canonical identity-graph example (example 11 in its assets/examples/) and owns the entire workflow lifecycle from substitution through optional trigger. Don't hand-write or validate NQL inside this skill; don't render or submit workflow YAML here.

When to use

Triggers:

"Build / create an identity graph from datasets X and Y"
"Stitch these datasets into a graph"
"Label connected components on these datasets"
"I want a person graph / household graph / device graph"
"Make a workflow that turns these datasets into a graph"

Do NOT use for:

One-off NQL LabelConnectedComponents queries with no productionization intent — write the NQL directly.
Mapping a single dataset to Rosetta Stone with no graph in mind — use /generate-rosetta-stone-mappings.
Activating / exporting an existing graph downstream — that's a different workflow.

Procedure

Run phases in order. Phase 0 is an optional pre-flight that can collect audit filters; phases 1-3 frame the problem; phases 4-6 prepare the inputs; phase 7 drafts the validated edges-view NQL with the phase-0 filters woven in; phase 8 hands every collected value off to /create-workflow for composition, render-and-approve, submission, and (optionally) trigger. Parallelize tool calls within a phase whenever the calls are independent (most attribute searches and dataset describes are).

Phase 0. Optional pre-flight data audit

Before designing the workflow, ask the user whether they want to audit any of their input datasets for graph-quality issues. Identity graphs are extremely sensitive to hub identifiers, leaky sentinel values ([email protected], 00000000...), and over-connected nodes — a single bad edge can collapse thousands of distinct entities into one component. An audit before the build is much cheaper than chasing a giant component back through the source data afterward.

Ask via AskUserQuestion:

"Before we design the graph workflow, would you like to audit any of your input datasets for graph-quality issues (hub identifiers, leaky sentinel values, over-connected nodes) first? If you say yes, I'll fold any recommended filters straight into the materialized-view DDL we'll build later — no clean restart required."

Options:

Yes — audit first. Run the audit handoff (steps below), then continue to phase 1 with the audit filters in hand.
No — skip the audit. Continue to phase 1.
Not sure — what does the audit do? Briefly explain: /triage-pregraph-data enumerates failure modes (hub identifiers, high-degree nodes, behaviorally suspicious values), tests each one against the data, quantifies damage in rows/edges/entities affected, and proposes minimal filter expressions ranked by severity. It produces a report; it does not modify any data. Then re-ask the same question.

0a. Hand off to `/triage-pregraph-data`

Invoke /triage-pregraph-data and let it run end-to-end — it has its own dataset-discovery, hypothesizing, testing, and reporting flow. Do not try to shortcut it or pre-bind datasets here; if the user already knows which datasets they want to graph, they'll name them inside the triage skill.

Wait for the triage skill to return its report. The report's findings include, per confirmed issue, a proposed filter expression (an NQL WHERE-clause-shaped condition like email != '[email protected]' or _degree_in_email_hub <= 100) along with the source dataset, severity, and quantified damage.

0b. Review filters with the user and capture approvals

Show the user the findings as a short table, default-selecting the high and medium severities:

Dataset	Finding	Severity	Filter expression	Apply?
`<dataset_id>`	`<finding title>`	high	`<expression>`	yes / no
…	…	…	…	…

Ask via AskUserQuestion per row whether to apply each filter, or batch the question if the user wants to accept all defaults. The default is "apply all high-severity, ask about each medium/low."

Record the approved filters as an in-memory list:

audit_filters = [
  { dataset_id: "<id>", expression: "<NQL WHERE-clause condition>", finding: "<title>" },
  …
]

This list is the contract phase 7 will consume. If audit_filters is empty (user approved nothing or audit found nothing), continue exactly as if the user had answered "No" at the top of this phase.

0c. Tell the user what's next

Surface one line back to the user so they know the audit didn't disappear:

"I'll fold these filters into the materialized-view DDL we build in phase 7. The graph build will see the cleaned edges, not the raw input."

Then continue to phase 1.

Phase 1. Frame the use case

Before touching any data, understand what the user is actually trying to build. Ask one question at a time via AskUserQuestion. Do not batch these — the answers gate later phases.

What kind of graph? Options to offer:
- Person graph (people-level identity resolution)
- Household graph (people → household stitching)
- Device graph (cookies, MAIDs, CTV IDs)
- B2B / account graph (domains, companies, employees)
- Custom / other (free text)
What's the primary identifier you want to resolve to? Common: sha256_email, raw email, maid, household_id, household_address, domain, company_id. This is the bridge-key identity (TARGET_ID_TYPE in the edge contract) that components are stitched around — it shapes which datasets you bring in during phases 3-6. It is not what populates firstPartySources / thirdPartySources in phase 8 — those lists are distinct SOURCE_ID_TYPE values (source systems like first_party_crm, acxiom), not the primary identifier.
What's the use case downstream? (Activation, measurement, modeling, analytics?) This is context for the workflow description and tag, not a hard gate.

Record the answers verbatim — they become the workflow's name, description, and TAGS strings later. If the user gives a short ambiguous answer ("a graph"), keep asking until you have enough specificity to pick identifier types in phase 7.

Handling incomplete or contradictory responses: If the user provides incomplete or conflicting answers during the interview:

Ask targeted follow-up questions to clarify the discrepancy
Summarize your understanding and confirm before proceeding
Do not advance to later phases until answers are consistent and specific enough to inform dataset and mapping selection

Phase 2. Pin the company / context

Most Narrative work is scoped to a company. Before any dataset, attribute, or workflow call:

narrative_context_get  → check the active company

If no company is set, or the user named a different one:

narrative_context_search_companies(search_term: "<name>")
narrative_context_set_company(companyId: <id>)

narrative_context_search_companies is global-admin-only. Skip the search/set entirely if the user invoked the skill from a Narrative Platform UI session where the company is implicit (narrative_context_get returns one).

Phase 3. Identify first-party input datasets

Ask the user which of their own datasets should contribute edges. Prefer concrete IDs; resolve names via search when only a phrase is given. Drive this with AskUserQuestion plus narrative_datasets_search.

For each candidate dataset the user names:

narrative_datasets_search(search_term: "<phrase>")

Then describe the shortlisted datasets in one batched call, opting into metadata, schema, and (crucially) mappings:

narrative_datasets_describe(
  dataset_ids: [<id>, <id>, ...],
  include: ["metadata", "schema", "mappings"]
)

dataset_ids accepts up to 50 IDs — batch them all into the same call. Confirm the final list with the user before moving on; mistakes here are expensive because phase 5 may trigger a full mapping flow.

Stop and confirm with the user if:

The user gave a vague description and 5+ datasets matched the search — ask them to narrow it.
A candidate dataset has zero rows or is stale (no recent freshness in metadata) — flag it and ask whether to include it anyway.

Phase 4. Check graph-edge mapping status

The graph-edge target is a Rosetta Stone attribute whose schema is the edge contract { SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, ATTRIBUTES }. Resolve its canonical ID by delegating to /find-attribute:

/find-attribute --phrase "graph edge" --shape "SOURCE_ID,SOURCE_ID_TYPE,TARGET_ID,TARGET_ID_TYPE,IS_DIRECTED,ATTRIBUTES" --no-confirm

/find-attribute searches the catalog with pagination, batch- describes the shortlist, ranks by name + shape, and returns the canonical attribute_id plus alternatives. Pass --no-confirm so it returns directly without prompting (this skill owns the user- facing surface for graph builds).

Take the returned attribute_id as the graph-edge target. If /find-attribute returns an empty result (no Rosetta Stone attribute matched the shape after walking the search), surface the warning verbatim and stop — without a graph-edge attribute, this skill cannot proceed.

Then, for each dataset from phase 3, inspect the mappings[] array returned by narrative_datasets_describe(include: ["mappings"]):

Mapped: at least one entry in mappings[] points at the graph-edge attribute ID. Record the dataset as ready.
Unmapped: no entry references that attribute ID. Record the dataset as needing mapping; it feeds phase 5.

Surface a short table back to the user — one row per dataset, two columns (dataset, status: ready | needs mapping) — and confirm before triggering phase 5. The user may opt to drop an unmapped dataset rather than map it.

Phase 5. Draft mappings for unmapped datasets — do NOT apply

For each dataset flagged "needs mapping" in phase 4, hand off to /generate-rosetta-stone-mappings, targeting the graph-edge attribute specifically:

"Map dataset <id> to the Rosetta Stone graph-edge attribute (attribute_id: <id>). I need every column that contributes to SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, and ATTRIBUTES — this will be an object_mapping with property_mappings, not a value_mapping. Return the draft; do not apply it. I'm threading the draft into a workflow that applies the mapping via CreateRosettaStoneMappingsIfNotExist."

Run that skill per-dataset, in parallel if more than one is unmapped. Wait for the user to approve each set of suggested mappings (or amend them).

Crucial: this skill no longer requires the mappings to be applied before the workflow runs. The workflow itself owns application via CreateRosettaStoneMappingsIfNotExist (idempotent — existing identical mappings are conflict-skipped). What phase 5 collects is the draft per dataset: the attributeId and the list of propertyMappings (path + NQL expression for each contract column).

Record approved drafts as an in-memory list, keyed by dataset:

pending_mappings = [
  {
    dataset_id: "<id>",
    attributeId: <graph-edge attribute ID from phase 4>,
    propertyMappings: [
      { path: "SOURCE_ID",       expression: "<NQL>" },
      { path: "SOURCE_ID_TYPE",  expression: "<NQL>" },
      { path: "TARGET_ID",       expression: "<NQL>" },
      { path: "TARGET_ID_TYPE",  expression: "<NQL>" },
      { path: "IS_DIRECTED",     expression: "<NQL>" },
      { path: "ATTRIBUTES",      expression: "<NQL>" },
    ],
  },
  …
]

Datasets that were "ready" in phase 4 do not need to appear in pending_mappings — CreateRosettaStoneMappingsIfNotExist is idempotent, so it's harmless to include them, but it's also wasted effort (the task would resolve to conflictMappings on every run). Default: only include datasets that phase 4 flagged.

If the user declines to map a flagged dataset, drop it from both the input list and pending_mappings. If every candidate dataset is unmapped and the user declines to map any, stop and report — there's nothing to build a graph from.

Phase 6. Identify third-party edge sources (optional)

Ask, via AskUserQuestion:

"Are you augmenting with any third-party data?" (yes / no / not sure).
If yes: "Which providers and which access rules?" Encourage the user to name provider + access-rule pairs (e.g., acxiom.consumer_identity_v3, liveramp.householding_edges_q1_2026).

Third-party datasets show up in NQL as <third_party_company>.<access_rule_name> (a different namespace from first-party company_data."<id>"). Their schemas must already conform to the graph-edge contract — you do not map them here; the provider does. If the user names a third-party source whose schema you can't verify, flag it as a global warning and add it anyway with a TODO comment in the workflow YAML.

If the user is not sure what third-party data is available, point them at the data marketplace via the Narrative Platform UI — this skill does not browse the catalog.

Phase 7. Draft and validate the edges-view NQL via `/write-nql`

Compose the CREATE MATERIALIZED VIEW statement that turns every input edge source into one unioned view — this is the createEdges.with.nql block in the workflow that phase 8 will hand to /create-workflow.

Do not hand-write the DDL inline. Delegate to /write-nql, which owns NQL drafting + server-side validation. Invoke it with --no-explain so it returns a clean validated statement (no user- facing prose) and without --run so the query is not executed.

Input (the free-text question passed to /write-nql):

Write a CREATE MATERIALIZED VIEW "<edges-view-name>" statement with:

DISPLAY_NAME = '<display name from phase 1>'

DESCRIPTION = '<one-sentence description from phase 1>'

TAGS = ('<graph-kind>', 'identity-graph')

WRITE_MODE = 'overwrite'

The body should SELECT DISTINCT the six graph-edge contract columns (SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, ATTRIBUTES) from each dataset using the Rosetta Stone graph-edge attribute access pattern, NOT the dataset's raw column names. Alias the FROM clause (use a short per-source slug) so the SELECT list doesn't have to repeat the full dataset path on every column. Each SELECT block should follow this exact shape:
SELECT DISTINCT
  <alias>._rosetta_stone.<graph_edge_attribute_name>.SOURCE_ID       AS SOURCE_ID,
  <alias>._rosetta_stone.<graph_edge_attribute_name>.SOURCE_ID_TYPE  AS SOURCE_ID_TYPE,
  <alias>._rosetta_stone.<graph_edge_attribute_name>.TARGET_ID       AS TARGET_ID,
  <alias>._rosetta_stone.<graph_edge_attribute_name>.TARGET_ID_TYPE  AS TARGET_ID_TYPE,
  <alias>._rosetta_stone.<graph_edge_attribute_name>.IS_DIRECTED     AS IS_DIRECTED,
  <alias>._rosetta_stone.<graph_edge_attribute_name>.ATTRIBUTES      AS ATTRIBUTES
FROM <dataset_reference> AS <alias>
[WHERE <audit filters>]
Pick a 2–4 character alias per source that's mnemonic for the dataset (e.g., fpc for first_party_crm_events, aci for acxiom.consumer_identity_v3). Aliases must be unique within the statement.

Use the graph-edge attribute name slug returned by /find-attribute in phase 4 (e.g., graph_edge) — not the numeric attribute ID. UNION ALL every SELECT block in the order listed. Apply the listed WHERE-clause conditions to each dataset as given — they're pre-flight audit filters and must be preserved verbatim (combine multiple conditions with AND):

Graph-edge attribute name (use verbatim in the _rosetta_stone.<name> access path): <attribute name slug from phase 4>

First-party datasets (use company_data.<id>):

<first_party_dataset_id_1> filters: <expression>, <expression>

<first_party_dataset_id_2> filters: (none)

…

Third-party datasets (use <provider>.<access_rule>):

<provider_1>.<access_rule_1> filters: <expression>

…

Validate the statement and return it. Don't run it.

Why the access pattern, not raw columns: each first-party dataset is mapped to the graph-edge Rosetta Stone attribute as a preceding workflow task (see phase 8). Querying through the _rosetta_stone.<name> field gives the six contract columns without coupling the workflow to native column names — different datasets emit different native columns, but every mapped dataset exposes the same graph-edge access path.

Third-party access rules are also queried through _rosetta_stone.<name>. The provider is responsible for mapping their access rule to the graph-edge attribute; the workflow does not map them. If a third party's access rule does not expose the graph-edge attribute, drop it from the input list — surface the gap to the user before continuing.

When building the prompt, look up each dataset's entries in audit_filters from phase 0. If a dataset has one or more approved filters, list them under that dataset; if it has none, write "filters: (none)" so /write-nql doesn't add anything it wasn't told to add. Do not silently drop filters — every approved filter must appear in the prompt.

Contract:

Input to /write-nql: the prompt above with placeholders filled from phases 0, 1, 3, 5, 6. Pass --no-explain only.
Output from /write-nql: a single validated NQL string (the full CREATE MATERIALIZED VIEW ... AS ... statement). Take the string as-is — do not edit it before embedding.

If /write-nql reports validation failure after its own internal retries (a referenced dataset doesn't exist, a column is named differently than the contract expects, an audit-filter expression references a column the dataset doesn't have), surface the verbatim error to the user, ask whether to drop the offending dataset / drop the offending filter / remap, and re-invoke /write-nql with the corrected input list. Do not hand an unvalidated DDL to phase 8. Do not drop an audit filter without explicit user approval — the user already approved each one in phase 0b.

Hold the returned NQL string as-is. Phase 8 will pass it through to /create-workflow verbatim.

Phase 8. Hand off composition and submission to `/create-workflow`

/create-workflow owns the workflow-platform mechanics: loading the canonical identity-graph example, substituting every value this skill collected, resolving the data plane, rendering the YAML for user approval, submitting via narrative_workflows_create, and (optionally) firing the first run. Do not render or submit the workflow inside this skill.

Invoke /create-workflow with a structured prompt that names example 11 explicitly and supplies every substitution. The shape:

/create-workflow Build the identity-graph workflow from assets/examples/11-identity-graph-multi-source-build.yaml. Substitute:
document.namespace: <kebab-case slug of the company name returned by narrative_context_get>

document.name: <graph-kind>-identity-graph (from phase 1 — person-identity-graph, household-identity-graph, etc.; append a qualifier if the user gave one, e.g. us-person-identity-graph)
Per-dataset mapping tasks (one CreateRosettaStoneMappingsIfNotExist task per entry in pending_mappings from phase 5, in the order the datasets appear in the createEdges UNION). Use this shape, substituting the per-dataset propertyMappings:
- map<DatasetSlug>:
    call: CreateRosettaStoneMappingsIfNotExist
    with:
      datasetName: <dataset id or slug>
      allowPartial: true
      mappings:
        - attributeId: <graph-edge attribute ID from phase 4>
          mapping:
            type: object_mapping
            propertyMappings:
              - path: SOURCE_ID
                expression: <NQL from phase 5>
              - path: SOURCE_ID_TYPE
                expression: <NQL from phase 5>
              - path: TARGET_ID
                expression: <NQL from phase 5>
              - path: TARGET_ID_TYPE
                expression: <NQL from phase 5>
              - path: IS_DIRECTED
                expression: <NQL from phase 5>
              - path: ATTRIBUTES
                expression: <NQL from phase 5>
Datasets that phase 4 reported as already-mapped do not need a task — CreateRosettaStoneMappingsIfNotExist is idempotent, but re-emitting an existing mapping is wasted effort.

Third-party access rules do NOT get mapping tasks — their schemas are the provider's contract.
The createEdges.with.nql block: replace verbatim with this already-validated NQL string. Do not modify it.
<full NQL string returned by /write-nql in phase 7>
labelComponents.with.edgeDataset: <edges-view-name> (the view created by createEdges above)

labelComponents.with.outputDataset: <graph-output-dataset-name>

labelComponents.with.firstPartySources: [<distinct **SOURCE_ID_TYPE** values emitted by the first-party datasets>]. Only SOURCE_ID_TYPE values belong here — TARGET_ID_TYPE bridge keys (sha256_email, maid, household_id, etc.) must not appear in either list. Discover the values empirically: ask for column statistics on the edges materialized view, or have /write-nql --run execute SELECT DISTINCT SOURCE_ID_TYPE FROM <edges_view> (split by contributing dataset) once the view exists. On a first build where the view doesn't exist yet, derive the candidate values from the mapping expressions in phase 5 (the literal each SOURCE_ID_TYPE propertyMapping emits) and ask the user to confirm; never invent values.

labelComponents.with.thirdPartySources: [<distinct **SOURCE_ID_TYPE** values emitted by the third-party access rules; empty array if none>]. Same discovery rule as above — query the data, never the bridge-key types.

labelComponents.with.maxDegreeThreshold: 100 (default)

labelComponents.with.maxComponentSize: 100 (default — surface the default in your approval summary so the user can override for B2B / household graphs)

labelComponents.with.maxIterations: 25 (default)

Pass any user-requested execution flags through the same invocation — --trigger if the user asked for an immediate run, --data-plane <id> if they already named a plane, --schedule if they want the cron activated on create (only valid if the user explicitly asked for a schedule, which this skill does not add by default — the example has no schedule: block).

If the user did not name a plane, do not invent one here; /create-workflow will ask. Same for trigger / schedule — let /create-workflow own those gates.

/create-workflow then runs end-to-end:

Loads example 11.
Substitutes the values above.
Resolves the data plane (asks if not provided).
Renders the YAML and explains it to the user.
Gates submission on explicit user approval.
Calls narrative_workflows_create.
Optionally triggers the first run.

When /create-workflow returns, take its result — workflow ID, data-plane ID, status, optional run ID — and pass it into "Final summary format" below, where you wrap it with the identity-graph context (input datasets, identifier types, output graph dataset) that /create-workflow does not know about.

Do not retry /create-workflow blindly on submission failure. If it returns a validator error, surface the verbatim error to the user, decide together what to fix (a misnamed identifier type, a wrong-plane dataset, a non-default tuning knob the user wants), and re-invoke /create-workflow with the corrected substitutions.

Final summary format

When phase 8 completes, return a single summary message that wraps /create-workflow's return values with the identity-graph context this skill collected (plain text, not JSON — this skill is a workflow-builder, not a structured-payload emitter like the mappings skill):

Submitted <graph kind> identity graph workflow.

Workflow: <id from /create-workflow>
Data plane: <id from /create-workflow>
Status: <status from /create-workflow>
Schedule: <none | cron expression>

Inputs:
  • <dataset_1_name> (<dataset_1_id>) — first-party — mapped ✓ / mapped this run
  • <dataset_2_name> (<dataset_2_id>) — first-party — mapped this run
  • <provider>.<access_rule> — third-party

Identifier types:
  first-party: [<list>]
  third-party: [<list>]

Output graph dataset: <name>

Next: <if /create-workflow triggered an immediate run, surface the
run_id and tell the user to poll with narrative_workflow_runs_list>
       <else if a schedule was activated, surface the next cron firing
       time in UTC>
       <else, tell the user the workflow is registered and can be
       triggered manually via narrative_workflows_trigger>

If the user opted to spot-check edges before the graph job runs, the materialized view can be created ahead of workflow submission by re-invoking /write-nql --run with the same DDL string returned in phase 7. Offer this explicitly only when the user has signaled they want to inspect edge counts — do not auto-run.

Common cases

Person graph (the default)

User wants to resolve people across two or more first-party CRM / event datasets, typically keyed on sha256_email and maid (those are TARGET_ID_TYPE bridge keys, not source list entries). Run phases 1-8 in order. Expect firstPartySources to be the distinct SOURCE_ID_TYPE values of the user's first-party systems (e.g. first_party_crm, first_party_loyalty); thirdPartySources to be empty unless the user explicitly named providers, in which case it's the providers' SOURCE_ID_TYPE values (e.g. acxiom, experian).

Household graph

Same shape as a person graph, plus one dataset (often a third-party householding edge source) that produces edges with TARGET_ID_TYPE = 'household_id' or 'household_address'. The UNION gains one or two more SELECT blocks. If that new dataset emits a new SOURCE_ID_TYPE (the household provider's system name), append it to thirdPartySources (or firstPartySources if the user owns the household source). Do not add household_id or household_address to either list — those are TARGET_ID_TYPE bridge keys, not source systems. Output dataset name defaults to household_identity_graph.

Device graph

Inputs are device-side datasets (MAID, IDFA, GAID, cookies, CTV IDs). Often no first-party data — entirely third-party (a device-graph provider's access rule). If so, phases 3-5 collapse to a single question: "Which provider's device graph?". Phase 7 emits a workflow whose UNION is a single SELECT ... FROM <provider>.<access_rule>.

B2B / account graph

Primary identifiers are domain and company_id; sometimes employee_email. Treat the same as a person graph, but warn the user in phase 8 that maxComponentSize: 100 may need to be raised — B2B graphs frequently have legitimate large clusters (every employee of a Fortune 500 connects through one domain).

Evaluate / re-run an existing graph

User points at an existing identity-graph workflow and asks to "refresh" or "rebuild". Pull the existing workflow's input list, re-validate each dataset's mapping status (phase 4), and surface which sources have changed. Append a version suffix (_v2, _v3, …) rather than overwriting the existing output dataset — downstream consumers may be pinned to it.

Edge cases and gotchas

See references/EDGE_CASES.md — covers the fixed edge-contract schema, identifier-type casing, directed / undirected mixing, third-party schemas, tuning-knob defaults (maxComponentSize / maxDegreeThreshold / maxIterations), materialized-view name collisions, write-safety, and empty-UNION detection. Read when something feels off or the user is asking about tuning.

Voice

Use first person ("I found 3 datasets that match…", "I'll need to map dataset X before we build the graph"). Conversational, not formal. The summaries and AskUserQuestion prompts are user-facing in the Narrative Platform UI's workflow / chat surface.

Harness fallbacks

See references/HARNESS_FALLBACK.md — covers narrative-mcp unavailable (paste-driven flow, hand-authored DDL, what to tell the user when /create-workflow also can't submit), narrative-knowledge-base unavailable (the mild case), partial degradation per MCP tool, and the AskUserQuestion fallback for harnesses that don't expose it. Read when a tool call errors or the user is invoking the skill outside the Narrative Platform UI.

Feedback (only if something could be improved)

Silence is success. Do not call submit_feedback when the skill worked. No "great skill!" submissions, no positive feedback, no "FYI" notes.

When friction does occur, submit immediately and without asking the user. submit_feedback is append-only telemetry — it is not a user-visible action and does not require confirmation. If you noticed something missing, unclear, incorrect, surprising, or that wasted your time, file it the moment you've worked around it. Do not defer the submission to a post-task recap, and do not ask the user "want me to submit feedback?" — that's the wrong default for this tool.

One submission per distinct friction point. Submit liberally.

Fields that matter most:

skill_name: narrative-identity:generate-identity-graph (use this verbatim).
severity: info (nit) | friction (slowed you down) | blocker (stopped you).
category: missing_info | unclear_instructions | incorrect_instructions | unexpected_behavior | tool_failure | other.
summary: one concrete line — what went wrong, not how you felt.
suggested_improvement: the sentence or paragraph that, if added to this skill, would have eliminated the friction. This is the highest-value field — be specific, quote the skill text you'd change.

Optional but useful when known: details, task_context, agent_model, time_lost_minutes.

generate-identity-graph

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

generate-identity-graph

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Generate Identity Graph

Persona

Interaction mode (required — do not override)

Overview

When to use

Procedure

Phase 0. Optional pre-flight data audit

0a. Hand off to /triage-pregraph-data

0b. Review filters with the user and capture approvals

0c. Tell the user what's next

Phase 1. Frame the use case

Phase 2. Pin the company / context

Phase 3. Identify first-party input datasets

Phase 4. Check graph-edge mapping status

Phase 5. Draft mappings for unmapped datasets — do NOT apply

Phase 6. Identify third-party edge sources (optional)

Phase 7. Draft and validate the edges-view NQL via /write-nql

Phase 8. Hand off composition and submission to /create-workflow

Final summary format

Common cases

Person graph (the default)

Household graph

Device graph

B2B / account graph

Evaluate / re-run an existing graph

Edge cases and gotchas

Voice

Harness fallbacks

Further reading

Feedback (only if something could be improved)

Similar Skills

Generate Identity Graph

Persona

Interaction mode (required — do not override)

Overview

When to use

Procedure

Phase 0. Optional pre-flight data audit

0a. Hand off to /triage-pregraph-data

0b. Review filters with the user and capture approvals

0c. Tell the user what's next

Phase 1. Frame the use case

Phase 2. Pin the company / context

Phase 3. Identify first-party input datasets

Phase 4. Check graph-edge mapping status

Phase 5. Draft mappings for unmapped datasets — do NOT apply

Phase 6. Identify third-party edge sources (optional)

Phase 7. Draft and validate the edges-view NQL via /write-nql

Phase 8. Hand off composition and submission to /create-workflow

Final summary format

Common cases

Person graph (the default)

Household graph

Device graph

B2B / account graph

Evaluate / re-run an existing graph

Edge cases and gotchas

Voice

Harness fallbacks

Further reading

Feedback (only if something could be improved)

Similar Skills

0a. Hand off to `/triage-pregraph-data`

Phase 7. Draft and validate the edges-view NQL via `/write-nql`

Phase 8. Hand off composition and submission to `/create-workflow`

0a. Hand off to `/triage-pregraph-data`

Phase 7. Draft and validate the edges-view NQL via `/write-nql`

Phase 8. Hand off composition and submission to `/create-workflow`