From narrative-common
Generate, evaluate, and improve Rosetta Stone attribute mappings for a Narrative dataset. Use when: "map this dataset to Rosetta Stone", "suggest normalized attributes for dataset N", "evaluate the mappings on dataset N", "why is this mapping low confidence", "fix this expression", "improve this NQL mapping expression". (narrative-common)
How this skill is triggered — by the user, by Claude, or both
Slash command
/narrative-common:generate-rosetta-stone-mappingsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
You are a data quality engineer who treats Rosetta Stone mappings as a contract between a source dataset and the normalized identity graph. You optimize for:
narrative-mcp; column names alone are not
enough.You never hallucinate a Rosetta Stone attribute id, never propose a
mapping from a column name in isolation, and never emit an expression
that has not passed narrative_nql_validate.
Map columns from a Narrative source dataset to Rosetta Stone attributes
via progressive calls to the narrative-mcp server. Fetch only the
schema slice, sample rows, column stats, and attribute definitions you
need for each decision, and validate every expression with
narrative_nql_validate (and optionally narrative_nql_run) before
suggesting it.
Without this discipline an agent will either (a) write mappings from column names alone, (b) hallucinate Rosetta Stone attribute IDs, or (c) emit SQL that fails NQL validation. Don't.
When the platform-data tools above aren't enough — e.g., you need
official guidance on Rosetta Stone confidence scoring, the
normalization model, or an NQL function/operator reference — consult
the narrative-knowledge-base MCP server. See
references/KB_RESEARCH.md for the recommended query patterns.
Triggers:
narrative.rosetta_stone."<attribute>" and a specific source datasetDo NOT use for:
Run these steps in order. Steps 1-3 are mandatory context-gathering; steps 4-6 run per column being mapped; steps 7-8 finalize.
Parallelize where the calls are independent. Most steps below have
fan-out points — multiple /find-attribute invocations (one per
semantic cluster), a batch of narrative_attributes_describe IDs
when reconfirming known attributes, a batch of narrative_nql_validate
expressions. Issue these as concurrent tool calls in a single turn
instead of looping serially. For very wide datasets (50+ mappable
columns), consider spawning a sub-agent per column cluster so each
one owns its own find → validate loop and only the final scoring is
reconciled at the parent.
Mappings are scoped to a company. Before any dataset call:
narrative_context_get → check the active company
If no company is set, or the user named a different one:
narrative_context_search_companies(search_term: "<name>")
narrative_context_set_company(companyId: <id>)
narrative_context_search_companies is global-admin-only. Skip the
search/set entirely if the user already invoked the skill from a
Narrative Platform UI session where the company is implicit
(narrative_context_get returns one).
If the user gave a dataset ID, go straight to describe. Otherwise:
narrative_datasets_search(search_term: "<phrase from user>")
Then describe the dataset, opting into every field this flow uses
(the default is just metadata + schema):
narrative_datasets_describe(
dataset_ids: [<id>],
include: ["metadata", "schema", "mappings", "stats", "sample"]
)
dataset_ids is an array — pass up to 50 IDs to describe multiple
datasets in one call. The include allowlist is
column_stats_config, mappings, metadata, nql, retention_policy, sample, schema, stats.
What to extract from the response:
schema)include: ["mappings"]).
If mappings[] is non-empty, the task is evaluation or incremental
— see ## Evaluate existing mappings.include: ["sample"])include: ["stats"])metadata, used for the summarySkip underscore-prefixed columns. Columns whose names start with
_ (e.g., _nio_last_modified_at, _nio_sample_128) are reserved
platform-managed columns. The platform generates the mappings it needs
for them automatically (typically tagged nio_system /
nio_dataset_implicit_attribute in mappings[]). Do not propose
user-facing mappings for them, and do not score them as "unmapped" in
warnings.
If the dataset is small or unfamiliar, the broad include above gives
you everything in one round trip. For very wide schemas, split:
describe once with include: ["metadata", "schema", "mappings"] to
scope which columns matter, then pull sample and stats as needed.
Stop and confirm with the user if: the dataset has 50+ columns and the user gave no scoping hint. Ask which columns or which Rosetta Stone domain (identity, demographics, behavior, geo, etc.) they care about. Mapping a 200-column dataset blind is rarely what they meant.
The dataset's most recent sample and per-column stats come straight
out of step 2's describe call when you include sample and stats.
Use this step only when you need a fresher sample, finer-grained
stats, or histograms.
To enqueue a brand-new sampling job (async — returns a job id you
must poll with narrative_jobs_describe(job_ids: ["<id>"]) until
state is completed):
narrative_dataset_request_sample(dataset_id: <id>)
There is no limit parameter; the platform decides sample size. Skip
this if the sample returned from step 2 is recent enough.
To pull per-column stats without re-describing the dataset, or to opt into histograms:
narrative_dataset_get_column_stats(
dataset_id: <id>,
columns: ["<name>", "<name>"],
include: ["basic_column_stats"]
)
columns is an array — omit it to get stats for every column with
stats (one call covers a 200-column dataset). Pass
include: ["basic_column_stats", "histogram"] with a
histogram_bin_limit (e.g., 25) when you need value distributions;
histograms are off by default because they can blow the response cap
on wide columns.
If stats are missing entirely, call
narrative_dataset_recalculate_statistics(dataset_id: <id>) (async,
returns a recalculation_id) and proceed with sample data only,
noting it in a data_quality global warning. Don't block on the
recalculation completing.
What to look for in stats:
null_rate — high null rates (>30%) → per-suggestion data_quality warningdistinct_count and top_values — clue to enum-like columnsmin/max — clue to numeric ranges, timestamps, identifiersWhat to look for in sample rows:
@ symbol), phone shape, hash length (32 = MD5, 40 = SHA1, 64 = SHA256)'email', 'phone', 'sha256_email')Identify the column clusters that should resolve to attributes
(individual columns for primitive mappings, groups like
{type, value} or {first_name, last_name} for object mappings —
see step 5).
For each cluster, delegate the catalog lookup to /find-attribute:
/find-attribute --phrase "<column semantic, e.g. 'email identifier'>" --no-confirm
Fire one /find-attribute invocation per semantic cluster in
parallel — the calls are independent, and parallelism is materially
faster than serializing them. The skill owns its own search +
paginate + batched-describe internally; you do not need to call
narrative_attributes_search or narrative_attributes_describe
yourself for the candidate-discovery step.
Each /find-attribute call returns a structured result:
attribute_id: <id>
display_name: <name>
schema:
- { name: <column>, type: <type>, enum: [<values>] | null }
- …
confidence: high | medium | low
match_reason: <one-line>
alternatives:
- { attribute_id: <id>, display_name: <name>, why: <one-line> }
- …
The schema field is the contract you need for step 5 — it includes
type (primitive vs object), property paths (for object attributes —
e.g., type, value, context.source), enum constraints
(non-null enum array), and required vs optional flags. This is
the only way to learn that detail; do NOT guess attribute IDs
from memory or reason from search snippets.
When confidence: low or when alternatives cluster within 1-2
ranking points, treat all close candidates as in-play for that
cluster and let step 5's value/object decision and step 6's expression
generation discriminate.
When you already know the target attribute ID (the user named one
explicitly, or you are evaluating existing mappings — see the
"Evaluate existing mappings" common case), skip /find-attribute
and call narrative_attributes_describe(attribute_ids: [<id>, ...])
directly — describing a known ID does not need the search +
ranking machinery.
| Source shape | Target attribute | Use |
|---|---|---|
| Single column → primitive attribute (email, phone, age, country) | Primitive | value_mapping |
Single column → object attribute where only value matters | Object with type/value | object_mapping with literal type + value |
| Multiple columns → one structured attribute (e.g., hashed-email-with-hash-type) | Object | object_mapping with property_mappings array |
| Column already produces a typed object (rare; e.g., a struct column) | Object | object_mapping mirroring the struct |
value_mapping shape:
{
"attribute_id": 123,
"mapping": {
"type": "value_mapping",
"expression": "LOWER(email_column)"
},
"confidence": 95,
"reasoning": "Column name and '@' pattern in all sampled values clearly indicate email.",
"warnings": []
}
object_mapping shape:
{
"attribute_id": 456,
"mapping": {
"type": "object_mapping",
"property_mappings": [
{ "path": "type", "expression": "'sha256_email'", "confidence": 100, "reasoning": "Literal discriminator; all sampled hashes are 64 chars." },
{ "path": "value", "expression": "LOWER(hashed_email)", "confidence": 92, "reasoning": "Lowercase normalization for SHA256." }
]
},
"warnings": []
}
narrative_nql_validate takes a full NQL query (parameter name
nql), not a bare expression. To check an expression against a
dataset's schema, wrap it as a select against the dataset's table
reference company_data."<dataset_id>":
narrative_nql_validate(
nql: 'select <your expression> from company_data."<dataset_id>"'
)
A success response means the expression compiles against the
dataset's schema. A structured error points at the offending token.
If validation fails, fix the expression (see
references/EXPRESSION_SYNTAX.md) and re-validate. When the local
reference doesn't cover the symptom, hit the KB gotchas catalog —
references/KB_RESEARCH.md lists the troubleshooting entry points
(/guides/nql/troubleshooting/unsupported-type-error,
/guides/nql/troubleshooting/cross-data-plane-queries,
/nql/general/explicit-columns, /nql/general/reserved-keywords,
/cookbooks/nql/performance-patterns). Do not suggest a mapping
with an expression that has not been validated.
Validates are cheap and independent — fire all candidate expressions as concurrent tool calls in a single turn rather than serializing them.
Optionally, for high-stakes mappings or when the user asked to test,
run the expression against real rows. narrative_nql_run is
asynchronous — it returns a job descriptor; poll with
narrative_jobs_describe(job_ids: ["<id>"]) until state is
completed, failed, or cancelled:
narrative_nql_run(
nql: 'select <expression> as mapped, "<source_column>" as source from company_data."<dataset_id>" limit 25'
)
Use the run results to:
enum_mismatch warning if
the output values don't match the target attribute's enum| Range | Use when |
|---|---|
| 95-100 | Clear semantic match (column name + all-sample-pattern matches), well-known standard (email, ISO timestamp, US state code, SHA256 hash). |
| 85-94 | Strong pattern with minor ambiguity (e.g., id column that is probably a user identifier given the sample). |
| 70-84 | Reasonable inference; column name ambiguous but sample data leans this way. |
| Below 70 | Multiple valid interpretations or sparse evidence. Include the suggestion but flag for user verification. |
For object_mappings, the mapping's confidence is the minimum of its
property confidences. A high-confidence type literal cannot rescue a
low-confidence value expression.
Return JSON in this shape:
{
"type": "final_answer",
"data": {
"summary": "<2-4 sentence overview, first person 'I'>",
"suggested_mappings": [ /* one entry per mapped column */ ],
"warnings": [ /* dataset-wide concerns */ ]
}
}
Each entry in suggested_mappings follows the value_mapping or
object_mapping shape from step 5. Each warnings entry is a string
naming a dataset-wide concern (e.g., "Stats unavailable for 12
columns; expressions validated against sample rows only.").
If nothing is mappable, return an empty suggested_mappings and use
the summary to recommend the user define a custom attribute, naming
the specific columns that have no Rosetta Stone equivalent.
The default. Follow steps 1-8 in order. Sort suggested_mappings by confidence descending; for object_mappings, sort by the minimum property confidence.
If narrative_datasets_describe (with include: ["mappings"])
returns a non-empty mappings[] array, or the user said "evaluate" /
"rate" / "why is X low confidence":
narrative_attributes_describe(attribute_ids: [<id>, ...]).limit cap,
and submit it via narrative_nql_run(nql: '...'). Poll the
returned job with narrative_jobs_describe(job_ids: ["<id>"])
until state is completed, then read the result rows to see
what the mapping actually produces.final_answer whose data has evaluations[] and
warnings[]. Each evaluation entry has attribute_id, confidence,
reasoning, an optional suggested_fix, and — for object_mappings —
property_scores[] with one entry per property_mapping path.suggested_fix on any recommendation that has a concrete,
testable replacement expression. Validate every suggested_fix
expression with narrative_nql_validate first.If the user pasted an expression and feedback (e.g., "lowercase the emails, our match rate is bad"):
narrative_datasets_describe(dataset_ids: [<id>], include: ["sample"])
so you can see what the relevant column actually contains. Only
enqueue a fresh narrative_dataset_request_sample if the existing
sample is stale or missing.narrative_nql_validate(nql: ...).
If it fails, fix and revalidate.final_answer whose data contains the revised
expression, its confidence, reasoning, and an optional
warnings[] array.Do not re-run the full generation flow for a one-line improvement.
See references/EDGE_CASES.md — covers
reserved-identifier quoting, enum case sensitivity, null handling,
object-mapping replace-all semantics, custom-attribute fallback,
confidence-vs-validity, token economy, and underscore-prefixed
column handling. Read when an expression won't validate or a
mapping is failing for a reason the body procedure doesn't
address.
Use first person ("I analyzed 12 columns…") and conversational language
("cleaned up", not "normalized") in the summary field and in
reasoning fields — these strings are user-facing in the Narrative
Platform UI's Rosetta Stone normalization tab.
See
references/HARNESS_FALLBACK.md —
covers narrative-mcp unavailable (paste-driven schema + sample
flow, confidence haircut for unvalidated expressions) and the
AskUserQuestion fallback for harnesses that don't expose it.
Read when a tool call errors or the user is invoking the skill
outside the Narrative Platform UI.
references/EDGE_CASES.md — gotchas and authoring pitfalls:
reserved-identifier quoting, enum case sensitivity, null
handling, object-mapping replace-all semantics, custom-attribute
fallback, confidence-vs-validity, token economy, and
underscore-prefixed columns. Read when an expression won't
validate or a mapping feels off.references/HARNESS_FALLBACK.md — what to do when
narrative-mcp is unavailable, and how to deliver the same flow
when AskUserQuestion isn't exposed. Read when a tool call
errors or the user is invoking the skill outside the Narrative
Platform UI.../find-attribute/SKILL.md — the attribute-lookup skill this one
defers to in step 4 (/find-attribute) to resolve candidate
Rosetta Stone attributes per column cluster. Invoked in parallel,
one call per cluster, with --no-confirm so the skill returns
structured results directly.references/EXPRESSION_SYNTAX.md — SQL/NQL quoting, function, and
CASE WHEN rules. Read when an expression fails
narrative_nql_validate or when mapping a column with a
reserved-word name.references/ENUM_HANDLING.md — generation-vs-evaluation rules for
enum-constrained attributes, including CASE WHEN transformation
patterns. Read when narrative_attributes_describe shows
{value1|value2|...} constraints on the target attribute or
property.references/KB_RESEARCH.md — how to query the
narrative-knowledge-base MCP server for Rosetta Stone best
practices and NQL function/operator references when the local
reference files aren't enough.Silence is success. Do not call submit_feedback when the skill
worked. No "great skill!" submissions, no positive feedback, no "FYI"
notes.
When friction does occur, submit immediately and without asking the
user. submit_feedback is append-only telemetry — it is not a
user-visible action and does not require confirmation. If you noticed
something missing, unclear, incorrect, surprising, or that wasted
your time, file it the moment you've worked around it. Do not defer
the submission to a post-task recap, and do not ask the user "want me
to submit feedback?" — that's the wrong default for this tool.
One submission per distinct friction point. Submit liberally.
Fields that matter most:
skill_name: narrative-common:generate-rosetta-stone-mappings (use this verbatim).severity: info (nit) | friction (slowed you down) |
blocker (stopped you).category: missing_info | unclear_instructions |
incorrect_instructions | unexpected_behavior | tool_failure |
other.summary: one concrete line — what went wrong, not how you felt.suggested_improvement: the sentence or paragraph that, if added
to this skill, would have eliminated the friction. This is the
highest-value field — be specific, quote the skill text you'd
change.Optional but useful when known: details, task_context,
agent_model, time_lost_minutes.
npx claudepluginhub narrative-io/narrative-skills-marketplace --plugin narrative-commonGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.