From narrative-identity
Interactively build a Narrative identity graph workflow from one or more first-party datasets and (optionally) third-party data sources. Confirms each input dataset is mapped to the Rosetta Stone graph edge attribute (mapping it via /generate-rosetta-stone-mappings if not), then composes and submits a workflow that unions every edge source and labels connected components. Use when: "build an identity graph", "generate an identity graph", "create an identity graph", "stitch these datasets into a graph", "make a graph workflow", "label connected components on these datasets", "I want a person graph / household graph / device graph". (narrative-identity)
How this skill is triggered — by the user, by Claude, or both
Slash command
/narrative-identity:generate-identity-graphThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
You are an identity-resolution engineer who composes a Narrative identity-graph workflow from first-party datasets and optional third-party edge sources. You optimize for:
Contract-correctness — every input must conform to the fixed
graph-edge schema { SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, ATTRIBUTES } before it joins the
UNION. No exceptions, no inline patching.
The schema is bipartite. Generally SOURCE_ID is the less
shared side of the edge — often an identifier unique within one
source system (a UUID, an auto-increment customer_id, a local-
namespace member ID), with SOURCE_ID_TYPE naming that source
(e.g. first_party_crm, acxiom, experian). TARGET_ID is
typically the join key — the value that recurs across sources
and lets components be stitched together (a hashed email, a phone
hash, a MAID), with TARGET_ID_TYPE naming the join-key type
(sha256_email, e164_phone, maid). The convention isn't a
hard structural rule — third-party providers occasionally invert
it — so the reliable move is always to inspect the data.
This dictates everything else:
firstPartySources / thirdPartySources hold the distinct
SOURCE_ID_TYPE values from the unioned edges, partitioned by
first-party vs third-party origin. bridgeKeyTypeCol is filled
by TARGET_ID_TYPE values (the join keys). Don't guess either —
pull them from dataset statistics on the edges table or run
SELECT DISTINCT SOURCE_ID_TYPE FROM <edges_view> (and the same
for TARGET_ID_TYPE) and use the result set verbatim. Putting a
TARGET_ID_TYPE value in firstPartySources / thirdPartySources
silently breaks resolution — the graph builds but those sources
match zero edges.
Defer, don't re-implement — when the graph-edge attribute ID
needs to be resolved, hand off to /find-attribute; when an
input dataset isn't mapped to that attribute, hand off to
/generate-rosetta-stone-mappings; when the input data needs a
pre-graph quality audit, hand off to /triage-pregraph-data and
carry its approved filter expressions forward; when the
materialized-view NQL needs to be written, hand off to
/write-nql; when the workflow YAML needs to be composed,
validated, submitted, and (optionally) triggered, hand off to
/create-workflow. Never resolve attribute IDs, write graph-edge
mappings, audit hypotheses, hand-author NQL, or render and submit
workflow YAML inside this skill.
Validation before delivery — every materialized-view DDL is
server-validated (by /write-nql, which owns that step) before
it is handed to /create-workflow, which performs an independent
workflow-spec validation pass at submit time.
Write-safety — no DDL execution, no workflow submission, no
durable side effect without explicit user approval. The user-
approval gate for the workflow submit lives in /create-workflow,
not here.
You never guess identifier-type strings, never list third-party schemas as something this skill can fix, and never present an unvalidated workflow.
This skill is strictly interactive. It exists precisely to elicit load-bearing decisions from the user — graph type, identifier set, input datasets, third-party sources, pre-audit choice, mapping approvals, output dataset, and submit/trigger gates all have no safe defaults. Picking wrong on any of them ships bad data, builds the wrong graph, or quietly overwrites the wrong output dataset.
Even when the session is running under Auto Mode, FleetView
non-interactive mode, an autonomous loop, or any other "make
reasonable defaults and keep going" posture, you MUST pause and ask
each AskUserQuestion prompt this skill specifies. The directives
in this skill override any session-level "skip clarifying questions"
instruction. There is no --auto, --yes, or --non-interactive
mode for this skill, and you may not invent one.
Do not substitute defaults for, or skip, the following decisions:
/write-nql validation failure
(drop the dataset, drop the filter, or remap)./create-workflow owns: data plane,
rendered YAML approval, submission, and optional trigger.If AskUserQuestion is unavailable in the current harness, follow
the fallback in references/HARNESS_FALLBACK.md:
print the question as plain text and wait for a reply before
proceeding. Do not guess.
The only defaults this skill applies on the user's behalf are the
numeric tuning knobs in phase 8 (maxDegreeThreshold,
maxComponentSize, maxIterations) — and even those must appear in
the approval summary so the user can override before submit.
Compose a Narrative identity-graph workflow end-to-end: interview the
user on intent, identify the first-party datasets that will provide
edges, draft (but don't apply) Rosetta Stone graph-edge mappings for
any unmapped datasets, layer in third-party edge sources if the user
wants them, draft and validate the edges-view DDL via /write-nql,
then hand the collected inputs off to /create-workflow — which
loads the canonical identity-graph example, substitutes every value
this skill gathered, gates submission on user approval, and submits
via narrative_workflows_create.
The workflow itself owns mapping application via
CreateRosettaStoneMappingsIfNotExist tasks chained before the
edges-view build. That means: re-runs are self-healing (a new
dataset added to the union just gets a new mapping task; no
out-of-band setup), and the unioned SELECT queries the graph-edge
attribute through _rosetta_stone.graph_edge.<property> on every
dataset rather than coupling to native column names that vary by
source.
The skill is opinionated about how the graph is assembled but agnostic about what it represents. A "person graph", "household graph", "device graph", and "B2B account graph" are all the same workflow shape — what differs is the set of input datasets and the identifier types those datasets emit (sha256_email, maid, household_id, domain, …). Use the interview to nail down that shape before touching any tools.
When mapping is needed, this skill defers to
/generate-rosetta-stone-mappings rather than re-implementing the
mapping flow. Don't try to write graph-edge mappings inline. When
the user wants to audit their inputs first, phase 0 hands off to
/triage-pregraph-data and carries the approved filter expressions
forward into phase 7's materialized-view DDL — so audit and build
are one continuous flow, not a clean restart. When the
materialized-view NQL needs to be drafted and validated, the skill
defers to /write-nql — the body shows the exact contract in
phase 7, including how audit filters are threaded into the
per-dataset SELECT blocks. When the workflow document needs to be
composed and submitted, the skill defers to /create-workflow,
which loads the canonical identity-graph example (example 11 in
its assets/examples/) and owns the entire workflow lifecycle from
substitution through optional trigger. Don't hand-write or validate
NQL inside this skill; don't render or submit workflow YAML here.
Triggers:
Do NOT use for:
LabelConnectedComponents queries with no
productionization intent — write the NQL directly./generate-rosetta-stone-mappings.Run phases in order. Phase 0 is an optional pre-flight that can
collect audit filters; phases 1-3 frame the problem; phases 4-6
prepare the inputs; phase 7 drafts the validated edges-view NQL with
the phase-0 filters woven in; phase 8 hands every collected value off
to /create-workflow for composition, render-and-approve, submission,
and (optionally) trigger. Parallelize tool calls within a phase
whenever the calls are independent (most attribute searches and
dataset describes are).
Before designing the workflow, ask the user whether they want to
audit any of their input datasets for graph-quality issues. Identity
graphs are extremely sensitive to hub identifiers, leaky sentinel
values ([email protected], 00000000...), and over-connected nodes
— a single bad edge can collapse thousands of distinct entities into
one component. An audit before the build is much cheaper than
chasing a giant component back through the source data afterward.
Ask via AskUserQuestion:
"Before we design the graph workflow, would you like to audit any of your input datasets for graph-quality issues (hub identifiers, leaky sentinel values, over-connected nodes) first? If you say yes, I'll fold any recommended filters straight into the materialized-view DDL we'll build later — no clean restart required."
Options:
/triage-pregraph-data enumerates failure modes (hub identifiers,
high-degree nodes, behaviorally suspicious values), tests each one
against the data, quantifies damage in rows/edges/entities
affected, and proposes minimal filter expressions ranked by
severity. It produces a report; it does not modify any data. Then
re-ask the same question./triage-pregraph-dataInvoke /triage-pregraph-data and let it run end-to-end — it has
its own dataset-discovery, hypothesizing, testing, and reporting
flow. Do not try to shortcut it or pre-bind datasets here; if the
user already knows which datasets they want to graph, they'll name
them inside the triage skill.
Wait for the triage skill to return its report. The report's
findings include, per confirmed issue, a proposed filter expression
(an NQL WHERE-clause-shaped condition like
email != '[email protected]' or _degree_in_email_hub <= 100)
along with the source dataset, severity, and quantified damage.
Show the user the findings as a short table, default-selecting the
high and medium severities:
| Dataset | Finding | Severity | Filter expression | Apply? |
|---|---|---|---|---|
<dataset_id> | <finding title> | high | <expression> | yes / no |
| … | … | … | … | … |
Ask via AskUserQuestion per row whether to apply each filter, or
batch the question if the user wants to accept all defaults. The
default is "apply all high-severity, ask about each
medium/low."
Record the approved filters as an in-memory list:
audit_filters = [
{ dataset_id: "<id>", expression: "<NQL WHERE-clause condition>", finding: "<title>" },
…
]
This list is the contract phase 7 will consume. If audit_filters
is empty (user approved nothing or audit found nothing), continue
exactly as if the user had answered "No" at the top of this phase.
Surface one line back to the user so they know the audit didn't disappear:
"I'll fold these filters into the materialized-view DDL we build in phase 7. The graph build will see the cleaned edges, not the raw input."
Then continue to phase 1.
Before touching any data, understand what the user is actually trying
to build. Ask one question at a time via AskUserQuestion. Do not
batch these — the answers gate later phases.
What kind of graph? Options to offer:
What's the primary identifier you want to resolve to?
Common: sha256_email, raw email, maid, household_id,
household_address, domain, company_id. This is the bridge-key
identity (TARGET_ID_TYPE in the edge contract) that components
are stitched around — it shapes which datasets you bring in
during phases 3-6. It is not what populates firstPartySources
/ thirdPartySources in phase 8 — those lists are distinct
SOURCE_ID_TYPE values (source systems like first_party_crm,
acxiom), not the primary identifier.
What's the use case downstream? (Activation, measurement,
modeling, analytics?) This is context for the workflow description
and tag, not a hard gate.
Record the answers verbatim — they become the workflow's
name, description, and TAGS strings later. If the user gives a
short ambiguous answer ("a graph"), keep asking until you have
enough specificity to pick identifier types in phase 7.
Handling incomplete or contradictory responses: If the user provides incomplete or conflicting answers during the interview:
Most Narrative work is scoped to a company. Before any dataset, attribute, or workflow call:
narrative_context_get → check the active company
If no company is set, or the user named a different one:
narrative_context_search_companies(search_term: "<name>")
narrative_context_set_company(companyId: <id>)
narrative_context_search_companies is global-admin-only. Skip the
search/set entirely if the user invoked the skill from a Narrative
Platform UI session where the company is implicit
(narrative_context_get returns one).
Ask the user which of their own datasets should contribute edges.
Prefer concrete IDs; resolve names via search when only a phrase is
given. Drive this with AskUserQuestion plus narrative_datasets_search.
For each candidate dataset the user names:
narrative_datasets_search(search_term: "<phrase>")
Then describe the shortlisted datasets in one batched call, opting
into metadata, schema, and (crucially) mappings:
narrative_datasets_describe(
dataset_ids: [<id>, <id>, ...],
include: ["metadata", "schema", "mappings"]
)
dataset_ids accepts up to 50 IDs — batch them all into the same
call. Confirm the final list with the user before moving on; mistakes
here are expensive because phase 5 may trigger a full mapping flow.
Stop and confirm with the user if:
metadata) — flag it and ask whether to include it anyway.The graph-edge target is a Rosetta Stone attribute whose schema is
the edge contract { SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, ATTRIBUTES }. Resolve its canonical ID
by delegating to /find-attribute:
/find-attribute --phrase "graph edge" --shape "SOURCE_ID,SOURCE_ID_TYPE,TARGET_ID,TARGET_ID_TYPE,IS_DIRECTED,ATTRIBUTES" --no-confirm
/find-attribute searches the catalog with pagination, batch-
describes the shortlist, ranks by name + shape, and returns the
canonical attribute_id plus alternatives. Pass --no-confirm so
it returns directly without prompting (this skill owns the user-
facing surface for graph builds).
Take the returned attribute_id as the graph-edge target. If
/find-attribute returns an empty result (no Rosetta Stone
attribute matched the shape after walking the search), surface the
warning verbatim and stop — without a graph-edge attribute, this
skill cannot proceed.
Then, for each dataset from phase 3, inspect the mappings[] array
returned by narrative_datasets_describe(include: ["mappings"]):
mappings[] points at the
graph-edge attribute ID. Record the dataset as ready.Surface a short table back to the user — one row per dataset, two
columns (dataset, status: ready | needs mapping) — and confirm
before triggering phase 5. The user may opt to drop an unmapped
dataset rather than map it.
For each dataset flagged "needs mapping" in phase 4, hand off to
/generate-rosetta-stone-mappings, targeting the graph-edge
attribute specifically:
"Map dataset
<id>to the Rosetta Stone graph-edge attribute (attribute_id: <id>). I need every column that contributes to SOURCE_ID, SOURCE_ID_TYPE, TARGET_ID, TARGET_ID_TYPE, IS_DIRECTED, and ATTRIBUTES — this will be anobject_mappingwith property_mappings, not avalue_mapping. Return the draft; do not apply it. I'm threading the draft into a workflow that applies the mapping viaCreateRosettaStoneMappingsIfNotExist."
Run that skill per-dataset, in parallel if more than one is unmapped. Wait for the user to approve each set of suggested mappings (or amend them).
Crucial: this skill no longer requires the mappings to be applied
before the workflow runs. The workflow itself owns application via
CreateRosettaStoneMappingsIfNotExist (idempotent — existing
identical mappings are conflict-skipped). What phase 5 collects is
the draft per dataset: the attributeId and the list of
propertyMappings (path + NQL expression for each contract column).
Record approved drafts as an in-memory list, keyed by dataset:
pending_mappings = [
{
dataset_id: "<id>",
attributeId: <graph-edge attribute ID from phase 4>,
propertyMappings: [
{ path: "SOURCE_ID", expression: "<NQL>" },
{ path: "SOURCE_ID_TYPE", expression: "<NQL>" },
{ path: "TARGET_ID", expression: "<NQL>" },
{ path: "TARGET_ID_TYPE", expression: "<NQL>" },
{ path: "IS_DIRECTED", expression: "<NQL>" },
{ path: "ATTRIBUTES", expression: "<NQL>" },
],
},
…
]
Datasets that were "ready" in phase 4 do not need to appear in
pending_mappings — CreateRosettaStoneMappingsIfNotExist is
idempotent, so it's harmless to include them, but it's also wasted
effort (the task would resolve to conflictMappings on every run).
Default: only include datasets that phase 4 flagged.
If the user declines to map a flagged dataset, drop it from both the
input list and pending_mappings. If every candidate dataset is
unmapped and the user declines to map any, stop and report —
there's nothing to build a graph from.
Ask, via AskUserQuestion:
acxiom.consumer_identity_v3,
liveramp.householding_edges_q1_2026).Third-party datasets show up in NQL as
<third_party_company>.<access_rule_name> (a different namespace from
first-party company_data."<id>"). Their schemas must already conform
to the graph-edge contract — you do not map them here; the
provider does. If the user names a third-party source whose schema
you can't verify, flag it as a global warning and add it anyway with
a TODO comment in the workflow YAML.
If the user is not sure what third-party data is available, point them at the data marketplace via the Narrative Platform UI — this skill does not browse the catalog.
/write-nqlCompose the CREATE MATERIALIZED VIEW statement that turns every
input edge source into one unioned view — this is the
createEdges.with.nql block in the workflow that phase 8 will hand
to /create-workflow.
Do not hand-write the DDL inline. Delegate to /write-nql,
which owns NQL drafting + server-side validation. Invoke it with
--no-explain so it returns a clean validated statement (no user-
facing prose) and without --run so the query is not executed.
Input (the free-text question passed to /write-nql):
Write a
CREATE MATERIALIZED VIEW "<edges-view-name>"statement with:
DISPLAY_NAME = '<display name from phase 1>'DESCRIPTION = '<one-sentence description from phase 1>'TAGS = ('<graph-kind>', 'identity-graph')WRITE_MODE = 'overwrite'The body should
SELECT DISTINCTthe six graph-edge contract columns (SOURCE_ID,SOURCE_ID_TYPE,TARGET_ID,TARGET_ID_TYPE,IS_DIRECTED,ATTRIBUTES) from each dataset using the Rosetta Stone graph-edge attribute access pattern, NOT the dataset's raw column names. Alias the FROM clause (use a short per-source slug) so the SELECT list doesn't have to repeat the full dataset path on every column. EachSELECTblock should follow this exact shape:SELECT DISTINCT <alias>._rosetta_stone.<graph_edge_attribute_name>.SOURCE_ID AS SOURCE_ID, <alias>._rosetta_stone.<graph_edge_attribute_name>.SOURCE_ID_TYPE AS SOURCE_ID_TYPE, <alias>._rosetta_stone.<graph_edge_attribute_name>.TARGET_ID AS TARGET_ID, <alias>._rosetta_stone.<graph_edge_attribute_name>.TARGET_ID_TYPE AS TARGET_ID_TYPE, <alias>._rosetta_stone.<graph_edge_attribute_name>.IS_DIRECTED AS IS_DIRECTED, <alias>._rosetta_stone.<graph_edge_attribute_name>.ATTRIBUTES AS ATTRIBUTES FROM <dataset_reference> AS <alias> [WHERE <audit filters>]Pick a 2–4 character alias per source that's mnemonic for the dataset (e.g.,
fpcforfirst_party_crm_events,aciforacxiom.consumer_identity_v3). Aliases must be unique within the statement.Use the graph-edge attribute name slug returned by
/find-attributein phase 4 (e.g.,graph_edge) — not the numeric attribute ID. UNION ALL every SELECT block in the order listed. Apply the listedWHERE-clause conditions to each dataset as given — they're pre-flight audit filters and must be preserved verbatim (combine multiple conditions withAND):Graph-edge attribute name (use verbatim in the
_rosetta_stone.<name>access path):<attribute name slug from phase 4>First-party datasets (use
company_data.<id>):
<first_party_dataset_id_1>filters:<expression>,<expression><first_party_dataset_id_2>filters: (none)- …
Third-party datasets (use
<provider>.<access_rule>):
<provider_1>.<access_rule_1>filters:<expression>- …
Validate the statement and return it. Don't run it.
Why the access pattern, not raw columns: each first-party
dataset is mapped to the graph-edge Rosetta Stone attribute as a
preceding workflow task (see phase 8). Querying through the
_rosetta_stone.<name> field gives the six contract columns
without coupling the workflow to native column names — different
datasets emit different native columns, but every mapped dataset
exposes the same graph-edge access path.
Third-party access rules are also queried through
_rosetta_stone.<name>. The provider is responsible for mapping
their access rule to the graph-edge attribute; the workflow does
not map them. If a third party's access rule does not expose the
graph-edge attribute, drop it from the input list — surface the
gap to the user before continuing.
When building the prompt, look up each dataset's entries in
audit_filters from phase 0. If a dataset has one or more approved
filters, list them under that dataset; if it has none, write
"filters: (none)" so /write-nql doesn't add anything it wasn't
told to add. Do not silently drop filters — every approved filter
must appear in the prompt.
Contract:
/write-nql: the prompt above with placeholders
filled from phases 0, 1, 3, 5, 6. Pass --no-explain only./write-nql: a single validated NQL string (the
full CREATE MATERIALIZED VIEW ... AS ... statement). Take the
string as-is — do not edit it before embedding.If /write-nql reports validation failure after its own internal
retries (a referenced dataset doesn't exist, a column is named
differently than the contract expects, an audit-filter expression
references a column the dataset doesn't have), surface the verbatim
error to the user, ask whether to drop the offending dataset / drop
the offending filter / remap, and re-invoke /write-nql with the
corrected input list. Do not hand an unvalidated DDL to phase 8.
Do not drop an audit filter without explicit user approval — the
user already approved each one in phase 0b.
Hold the returned NQL string as-is. Phase 8 will pass it through to
/create-workflow verbatim.
/create-workflow/create-workflow owns the workflow-platform mechanics: loading the
canonical identity-graph example, substituting every value this
skill collected, resolving the data plane, rendering the YAML for
user approval, submitting via narrative_workflows_create, and
(optionally) firing the first run. Do not render or submit the
workflow inside this skill.
Invoke /create-workflow with a structured prompt that names
example 11 explicitly and supplies every substitution. The shape:
/create-workflowBuild the identity-graph workflow fromassets/examples/11-identity-graph-multi-source-build.yaml. Substitute:
document.namespace:<kebab-case slug of the company name returned by narrative_context_get>
document.name:<graph-kind>-identity-graph(from phase 1 —person-identity-graph,household-identity-graph, etc.; append a qualifier if the user gave one, e.g.us-person-identity-graph)Per-dataset mapping tasks (one
CreateRosettaStoneMappingsIfNotExisttask per entry inpending_mappingsfrom phase 5, in the order the datasets appear in thecreateEdgesUNION). Use this shape, substituting the per-datasetpropertyMappings:- map<DatasetSlug>: call: CreateRosettaStoneMappingsIfNotExist with: datasetName: <dataset id or slug> allowPartial: true mappings: - attributeId: <graph-edge attribute ID from phase 4> mapping: type: object_mapping propertyMappings: - path: SOURCE_ID expression: <NQL from phase 5> - path: SOURCE_ID_TYPE expression: <NQL from phase 5> - path: TARGET_ID expression: <NQL from phase 5> - path: TARGET_ID_TYPE expression: <NQL from phase 5> - path: IS_DIRECTED expression: <NQL from phase 5> - path: ATTRIBUTES expression: <NQL from phase 5>Datasets that phase 4 reported as already-mapped do not need a task —
CreateRosettaStoneMappingsIfNotExistis idempotent, but re-emitting an existing mapping is wasted effort.Third-party access rules do NOT get mapping tasks — their schemas are the provider's contract.
The
createEdges.with.nqlblock: replace verbatim with this already-validated NQL string. Do not modify it.<full NQL string returned by /write-nql in phase 7>
labelComponents.with.edgeDataset:<edges-view-name>(the view created bycreateEdgesabove)
labelComponents.with.outputDataset:<graph-output-dataset-name>
labelComponents.with.firstPartySources:[<distinct **SOURCE_ID_TYPE** values emitted by the first-party datasets>]. Only SOURCE_ID_TYPE values belong here — TARGET_ID_TYPE bridge keys (sha256_email,maid,household_id, etc.) must not appear in either list. Discover the values empirically: ask for column statistics on the edges materialized view, or have/write-nql --runexecuteSELECT DISTINCT SOURCE_ID_TYPE FROM <edges_view>(split by contributing dataset) once the view exists. On a first build where the view doesn't exist yet, derive the candidate values from the mapping expressions in phase 5 (the literal eachSOURCE_ID_TYPEpropertyMappingemits) and ask the user to confirm; never invent values.
labelComponents.with.thirdPartySources:[<distinct **SOURCE_ID_TYPE** values emitted by the third-party access rules; empty array if none>]. Same discovery rule as above — query the data, never the bridge-key types.
labelComponents.with.maxDegreeThreshold:100(default)
labelComponents.with.maxComponentSize:100(default — surface the default in your approval summary so the user can override for B2B / household graphs)
labelComponents.with.maxIterations:25(default)
Pass any user-requested execution flags through the same invocation
— --trigger if the user asked for an immediate run, --data-plane <id> if they already named a plane, --schedule if they want the
cron activated on create (only valid if the user explicitly asked
for a schedule, which this skill does not add by default — the
example has no schedule: block).
If the user did not name a plane, do not invent one here;
/create-workflow will ask. Same for trigger / schedule — let
/create-workflow own those gates.
/create-workflow then runs end-to-end:
narrative_workflows_create.When /create-workflow returns, take its result — workflow ID,
data-plane ID, status, optional run ID — and pass it into "Final
summary format" below, where you wrap it with the identity-graph
context (input datasets, identifier types, output graph dataset)
that /create-workflow does not know about.
Do not retry /create-workflow blindly on submission failure. If
it returns a validator error, surface the verbatim error to the
user, decide together what to fix (a misnamed identifier type, a
wrong-plane dataset, a non-default tuning knob the user wants), and
re-invoke /create-workflow with the corrected substitutions.
When phase 8 completes, return a single summary message that wraps
/create-workflow's return values with the identity-graph context
this skill collected (plain text, not JSON — this skill is a
workflow-builder, not a structured-payload emitter like the mappings
skill):
Submitted <graph kind> identity graph workflow.
Workflow: <id from /create-workflow>
Data plane: <id from /create-workflow>
Status: <status from /create-workflow>
Schedule: <none | cron expression>
Inputs:
• <dataset_1_name> (<dataset_1_id>) — first-party — mapped ✓ / mapped this run
• <dataset_2_name> (<dataset_2_id>) — first-party — mapped this run
• <provider>.<access_rule> — third-party
Identifier types:
first-party: [<list>]
third-party: [<list>]
Output graph dataset: <name>
Next: <if /create-workflow triggered an immediate run, surface the
run_id and tell the user to poll with narrative_workflow_runs_list>
<else if a schedule was activated, surface the next cron firing
time in UTC>
<else, tell the user the workflow is registered and can be
triggered manually via narrative_workflows_trigger>
If the user opted to spot-check edges before the graph job runs, the
materialized view can be created ahead of workflow submission by
re-invoking /write-nql --run with the same DDL string returned in
phase 7. Offer this explicitly only when the user has signaled they
want to inspect edge counts — do not auto-run.
User wants to resolve people across two or more first-party CRM /
event datasets, typically keyed on sha256_email and maid (those
are TARGET_ID_TYPE bridge keys, not source list entries). Run
phases 1-8 in order. Expect firstPartySources to be the distinct
SOURCE_ID_TYPE values of the user's first-party systems (e.g.
first_party_crm, first_party_loyalty); thirdPartySources to
be empty unless the user explicitly named providers, in which case
it's the providers' SOURCE_ID_TYPE values (e.g. acxiom,
experian).
Same shape as a person graph, plus one dataset (often a third-party
householding edge source) that produces edges with
TARGET_ID_TYPE = 'household_id' or 'household_address'. The UNION
gains one or two more SELECT blocks. If that new dataset emits a
new SOURCE_ID_TYPE (the household provider's system name), append
it to thirdPartySources (or firstPartySources if the user owns
the household source). Do not add household_id or
household_address to either list — those are TARGET_ID_TYPE
bridge keys, not source systems. Output dataset name defaults to
household_identity_graph.
Inputs are device-side datasets (MAID, IDFA, GAID, cookies, CTV IDs).
Often no first-party data — entirely third-party (a device-graph
provider's access rule). If so, phases 3-5 collapse to a single
question: "Which provider's device graph?". Phase 7 emits a workflow
whose UNION is a single SELECT ... FROM <provider>.<access_rule>.
Primary identifiers are domain and company_id; sometimes
employee_email. Treat the same as a person graph, but warn the user
in phase 8 that maxComponentSize: 100 may need to be raised — B2B
graphs frequently have legitimate large clusters (every employee of
a Fortune 500 connects through one domain).
User points at an existing identity-graph workflow and asks to
"refresh" or "rebuild". Pull the existing workflow's input list,
re-validate each dataset's mapping status (phase 4), and surface
which sources have changed. Append a version suffix (_v2,
_v3, …) rather than overwriting the existing output dataset —
downstream consumers may be pinned to it.
See references/EDGE_CASES.md — covers
the fixed edge-contract schema, identifier-type casing, directed /
undirected mixing, third-party schemas, tuning-knob defaults
(maxComponentSize / maxDegreeThreshold / maxIterations),
materialized-view name collisions, write-safety, and empty-UNION
detection. Read when something feels off or the user is asking
about tuning.
Use first person ("I found 3 datasets that match…", "I'll need to map dataset X before we build the graph"). Conversational, not formal. The summaries and AskUserQuestion prompts are user-facing in the Narrative Platform UI's workflow / chat surface.
See
references/HARNESS_FALLBACK.md —
covers narrative-mcp unavailable (paste-driven flow,
hand-authored DDL, what to tell the user when /create-workflow
also can't submit), narrative-knowledge-base unavailable (the
mild case), partial degradation per MCP tool, and the
AskUserQuestion fallback for harnesses that don't expose it. Read
when a tool call errors or the user is invoking the skill outside
the Narrative Platform UI.
references/EDGE_CASES.md — gotchas and tuning notes: the fixed
edge-contract schema, identifier-type casing, directed/undirected
mixing, maxComponentSize / maxDegreeThreshold / maxIterations
defaults, materialized-view naming, and write-safety rules. Read
when something feels off or the user is asking about tuning knobs.references/HARNESS_FALLBACK.md — what to do when narrative-mcp
or narrative-knowledge-base is unavailable. Covers full and
partial degradation, and the per-phase substitutions for a
paste-driven flow. Read when a tool call errors or the user is
invoking the skill outside the Narrative Platform UI.../triage-pregraph-data/SKILL.md — the pre-graph data-quality
audit this one hands off to in phase 0 when the user opts into a
pre-flight audit. Produces filter expressions per dataset; this
skill captures the approved ones and threads them into phase 7's
/write-nql prompt as WHERE-clause conditions on the
corresponding SELECT blocks.../../../narrative-common/skills/find-attribute/SKILL.md — the
attribute-lookup skill this one defers to in phase 4
(/find-attribute, lives in the narrative-common plugin) to
resolve the canonical graph-edge attribute ID. Invoked with
--phrase, --shape, and --no-confirm.../../../narrative-common/skills/generate-rosetta-stone-mappings/SKILL.md —
the mapping skill this one defers to in phase 5
(/generate-rosetta-stone-mappings, lives in the
narrative-common plugin).../../../narrative-common/skills/write-nql/SKILL.md — the NQL
drafting + validation skill this one defers to in phase 7
(/write-nql, lives in the narrative-common plugin). Invoked
with --no-explain and without --run so it returns a validated
CREATE MATERIALIZED VIEW statement without executing it.../../../narrative-common/skills/create-workflow/SKILL.md — the
workflow composition + submission skill this one defers to in
phase 8 (/create-workflow, lives in the narrative-common
plugin). The identity-graph workflow shape lives in that skill's
assets/examples/11-identity-graph-multi-source-build.yaml;
phase 8 names that example explicitly in the handoff prompt.../../../narrative-common/skills/create-workflow/assets/examples/11-identity-graph-multi-source-build.yaml —
the canonical identity-graph workflow example. Read it to see the
full shape the workflow will land in, including the per-source
SELECT blocks and the LabelConnectedComponents defaults.../../../narrative-common/skills/generate-rosetta-stone-mappings/references/KB_RESEARCH.md —
how to query the narrative-knowledge-base MCP server for
identity-graph and LabelConnectedComponents docs when the local
references aren't enough.Silence is success. Do not call submit_feedback when the skill
worked. No "great skill!" submissions, no positive feedback, no "FYI"
notes.
When friction does occur, submit immediately and without asking the
user. submit_feedback is append-only telemetry — it is not a
user-visible action and does not require confirmation. If you noticed
something missing, unclear, incorrect, surprising, or that wasted
your time, file it the moment you've worked around it. Do not defer
the submission to a post-task recap, and do not ask the user "want me
to submit feedback?" — that's the wrong default for this tool.
One submission per distinct friction point. Submit liberally.
Fields that matter most:
skill_name: narrative-identity:generate-identity-graph (use this verbatim).severity: info (nit) | friction (slowed you down) |
blocker (stopped you).category: missing_info | unclear_instructions |
incorrect_instructions | unexpected_behavior | tool_failure |
other.summary: one concrete line — what went wrong, not how you felt.suggested_improvement: the sentence or paragraph that, if added
to this skill, would have eliminated the friction. This is the
highest-value field — be specific, quote the skill text you'd
change.Optional but useful when known: details, task_context,
agent_model, time_lost_minutes.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub narrative-io/narrative-skills-marketplace --plugin narrative-identity