Skill

spec

Author or validate a DiscoveryConfig dataspec — the YAML describing a target table (columns, grain, business rules, load strategy) consumed by toolkit agent discovery. Use when the user wants to define a new pipeline target, has a dataspec to check, or asks what a dataspec needs to contain.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/toolkit-pipeline:spec

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A dataspec is one YAML file per target table. Full field reference: the plugin's

SKILL.md

47 lines · ~663 tokens

Stats

LanguageShell

Parent stars1

MaintenanceGood

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Author a dataspec

A dataspec is one YAML file per target table. Full field reference: the plugin's references/spec-schema.md (two levels up from this skill's directory); complete examples per tooling under references/examples/. Read the schema reference before writing anything.

If the user already has a YAML

Read it and validate against the schema reference:

content present and substantive (target table + columns + grain + rules)?
tooling/materialization combination legal (per-tooling enums)?
loadStrategy keys spelled right (type, change_detection, watermark_column, ...)?
targetPlatform consistent with tooling (pyspark → databricks unless they know better)?
No hand-written targetRequirement/resolvedTransformation (producer-only fields)?

Report problems with concrete fixes; small gaps (missing grain, vague rules) matter more than formatting — discovery quality tracks spec quality.

If authoring from scratch — interview, then write

Gather, in order (skip what the user already said):

Tooling: sql, dbt, or pyspark — what should pipeline-build emit?
Target platform: snowflake or databricks (pyspark defaults to databricks).
Target table: name plus database/schema.
Content — the heart of the spec:
- If the user has DDL for the target, paste it in and set format: ddl.
- Otherwise format: natural_language: columns (name, source or "derived from X", nullability), the grain ("one row per ..."), and business rules (derivations, filters, SCD expectations). Push for the grain and rules — they drive the generated tests.
Materialization + load strategy: offer defaults — dimensions: materialization: merge + loadStrategy.type: incremental_merge with change_detection: watermark; append-only facts: incremental_append; small/reference tables: full_refresh/ctas.
Context: where the source data lives (database/schema/tables, source system names). This steers discovery's search — and if the datasource has a filters block in toolkit.conf, that bounds what discovery can see at all.

Write the file as specs/<target_table>.yaml in the working directory (one spec per target table), show it to the user, and point at /toolkit-pipeline:discover as the next step.

spec

Popularity

Invocation

Context Preview

SKILL.md

spec

Popularity

Invocation

Context Preview

SKILL.md

Author a dataspec

If the user already has a YAML

If authoring from scratch — interview, then write

Similar Skills

Author a dataspec

If the user already has a YAML

If authoring from scratch — interview, then write

Similar Skills