From toolkit-pipeline
Run toolkit agent discovery to resolve a dataspec against a live datasource into a reviewable data contract, then walk the human-review resolve loop until the contract is approved. Use when the user has (or needs) a dataspec and wants source-to-target mappings, or mentions data contracts, discovery, or humanReviewItems.
How this skill is triggered — by the user, by Claude, or both
Slash command
/toolkit-pipeline:discoverThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
`toolkit agent discovery` reads a dataspec, scans/profiles the live datasource, proposes
toolkit agent discovery reads a dataspec, scans/profiles the live datasource, proposes
source-to-target column mappings with confidence scores, validates them against real data, and
emits a data contract. Items it isn't confident about are flagged for human review — the loop
below resolves them.
toolkit-check || exit
On failure surface the hint: line and stop (/toolkit-core:setup / /toolkit-core:connect).
If it prints a note: about the project directory, run every toolkit command below from that
directory (or export TOOLKIT_PROJECT_HOME) — the toolkit doesn't search parent directories.
Additional prerequisites beyond toolkit-check:
targetPlatform. If the user's datasource is
another type, stop here and explain.llmClient block in toolkit.conf, the
toolkit falls back to Amazon Bedrock via the phData auth flow (works out of the box for phData
consultants). Others configure a provider with /toolkit-core:llm (Bedrock/OpenAI/Anthropic).toolkit agent * is license-gated; an authorization error means the user's token
doesn't include agent access — that's a licensing conversation, not a config bug.filters block in
toolkit.conf to keep that scoped (see /toolkit-core:connect).Ask for the dataspec YAML path(s). If the user doesn't have one, author it with them — the
interview and field reference live in the plugin's references/spec-schema.md and
references/examples/ (two levels up from this skill's directory; same flow as
/toolkit-pipeline:spec): tooling,
target platform, target table, content (columns/grain/business rules), materialization + load
strategy, source context. One spec per target table, saved as specs/<target_table>.yaml.
One output subdirectory per spec — discovery always writes <output-dir>/data-contract.json
(the spec name does not change the filename), so shared dirs clobber each other:
toolkit agent discovery <datasource> specs/<name>.yaml --output ./discovery-out/<name>
For multiple specs, loop:
for f in specs/*.yaml; do
name=$(basename "$f" .yaml)
toolkit agent discovery <datasource> "$f" --output "./discovery-out/$name"
done
Read discovery-out/<name>/discovery-report.txt. Walk the user through:
x-marked rows[LOW CONFIDENCE] mappings, [COMPLEX TRANSFORM] business logic, [ASSERTION FAILED]
example-record mismatches, ambiguous sourcesCapture the user's decision on each item in their own words ("the age-band logic is correct, keep it"; "map channel_code from DIM_CHANNEL.CODE instead").
Edit discovery-out/<name>/data-contract.json: find each item under
sourceToTarget.humanReviewItems[] and write the user's decision into its comment field
(leave items the user has no opinion on untouched). Then:
toolkit agent discovery <datasource> specs/<name>.yaml --resolve ./discovery-out/<name>/data-contract.json
The positional datasource/spec arguments are required by the CLI but ignored on the resolve
path — pass the same ones. The resolver applies each comment via the LLM, drops resolved items
from humanReviewItems, sets approvedByHuman: true when none remain, and rewrites both the
contract and the report in place. Tooling, target platform, and target columns are structurally
locked — comments can't change the target schema. Contracts without comments pass through
unchanged, so sweeping a whole directory is safe.
Repeat Steps 3–4 until humanReviewItems is empty. Then hand off to
/toolkit-pipeline:build.
Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub phdata/agent-marketplace --plugin toolkit-pipeline