From understudy
Use when a developer already has production LLM traces — a bucket of captures, provider log exports, or gateway capture files — and wants them turned into local, redacted eval sets, or profiled for cost first. "Ingest my traces", "turn these logs into an eval set", "where is my LLM spend going", "which calls could a local model take over".
How this skill is triggered — by the user, by Claude, or both
Slash command
/understudy:ingest-tracesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
[`../capture-evidence/SKILL.md`](../capture-evidence/SKILL.md) assumes a local
../capture-evidence/SKILL.md assumes a local
harness can be attached and run. Many workloads arrive the other way around:
the traces already exist — in an object-store bucket, a provider log export,
or an Understudy gateway capture export — and the job is to get them into
local evidence artifacts without leaking anything. This worker does that
transformation: source → deterministic classification → frozen slices →
redacted manifests, all on the local machine.
privacy.local_only: true and
privacy.upload_performed: false, and is itself review-gated before any
external reference..understudy/ (gitignored).Identify the source shape first:
aws s3, rclone, wrangler); pull to a local staging
dir.captures export <request-id> exports one request at a time,
writes metadata only unless run with --include-payload --yes (payloads may
contain prompts/completions), and platform-side bulk export requires
elevated platform-administrator access — plan on per-request exports or ask
your platform administrator for a bulk dump.
Captures may carry either workload_id or the older placement_id — read
both.Then confirm with the developer which workload(s) the traces should map to, and what the unit of one "task" is (one request? one conversation?).
Profile a whole capture directory first. When the source is a fleet of
gateway captures and the developer wants to know where the spend goes and
which call types a local model could take over, run the profiling playbook in
references/profile-captures.md before (or
instead of) per-workload ingestion — it produces a cost + call-type taxonomy
and a ranked local-takeover candidate list from the same .jsonl dump. When
the question is about trace content at fleet scale — "which runs failed",
"label these by failure mode", "summarize what goes wrong" — follow with the
bulk semantic-triage playbook in
references/lotus-semantic-triage.md.
request.messages or equivalent) is the canonical
replay surface — record that invariant in the manifest; never evaluate
against a lossy projection.../capture-evidence/SKILL.md so its
split/baseline contract owns them from here.input_id, expected outputs/tool-calls where the trace
contains them) that optimize-workload adapters and
../compare-model-sweep/SKILL.md can
consume directly.references/profile-captures.md. "What is
this workload actually doing?" →
../understand-workload/SKILL.md.
Harness attachment, metric, and baseline →
../capture-evidence/SKILL.md.End with: the source shape and trace count pulled (staging path, not bucket
identity); the workload groups with their classification patterns and counts;
the frozen slice names, sizes, and hashes; the manifest paths written under
.understudy/ingest-traces/; the privacy assertions (local-only, no raw
bodies in manifests, review required before upload); and the recommended next
skill for this workload.
../capture-evidence/SKILL.md — owns the
metric/validator/baseline contract the frozen slices feed.references/profile-captures.md —
fleet-level cost and call-type taxonomy over the same capture set, with a
ranked local-takeover candidate list.references/lotus-semantic-triage.md —
content-level bulk triage (filter/label/rank/summarize all captures) with
LOTUS semantic operators on a local MLX-served model.../understand-workload/SKILL.md —
decomposes one ingested workload before model comparison.../curate-trajectories/SKILL.md — split
provenance and leak-blocking once selections are made.Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub understudylabs/understudy-agent-tools --plugin understudy