From methodic
Use this skill when the user (or an agent flow) wants to upload a dataset / training data / reference field to a Chronicle experiment, or to load one back. Phrases like "upload this dataset", "register the training data", "attach this .npz as an input to the variation", "add the reference field", "load the dataset for experiment X", "download the dataset". A dataset is a binary asset (e.g. an `.npz`) uploaded via the presigned-PUT flow with a recorded provenance record (per-component sha256 + size), then linked as an experiment- or variation-level **input**. A directory uploads as one component per file — the way to shard GB-scale data. For *reports* / write-ups use chronicle-write-report; for authoring a variation's *config* use chronicle-author-variation. This is the data counterpart to those.
How this skill is triggered — by the user, by Claude, or both
Slash command
/methodic:chronicle-datasetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Get a dataset into Chronicle as a binary asset and attach it to an experiment
Get a dataset into Chronicle as a binary asset and attach it to an experiment
or variation as an input — or pull an existing dataset back down. This is a
thin orchestration over the SDK's chronicle.datasets namespace; it never
makes raw HTTP calls.
Datasets are plain assets (asset_type="dataset"): there is no separate
dataset table. They are search-indexed on metadata only — the dataset's
name + provenance record, never the bytes — and become discoverable in search
once linked (or output-stamped), alongside discovery through the
experiment/variation link. The upload records a provenance record on the asset
so the bytes are verifiable later.
Each component is a single presigned PUT — there is no multipart upload.
.npz) → one file, one component.upload makes one component per file,
which is the sharding mechanism. On GCS the presigned write URL is a
resumable session, so a large single object also works; the S3 fallback
caps a single PUT, so prefer file-level sharding for portability. Split a
huge monolithic array into per-shard files before uploading.path (upload) — a single file or a directory of shards.experiment_id — the Chronicle experiment UUID. Resolve in order:
methodic config (~/.config/methodic/current_experiment)variation (optional) — variation index (or plaintext name → resolve to
index first) to link the dataset as a variation input. Omit to link at the
experiment level (shared across the experiment's variations).asset_type (default dataset) — free-form; use a more specific type
(e.g. reference_field) when it helps downstream consumers.source / provenance (optional) — source is a one-line origin
string (a URL, a generator command, a parent run); provenance is a dict of
domain facts you know (shape, rows, schema, generator config). Both are
merged into the stored provenance record — the sha256/size/component facts
are computed for you.sensitive (optional) — when true, link with propagate_acl=False so
the dataset does not inherit the experiment's reader ACLs.asset_id + a dest directory.from methodic import Chronicle
chronicle = Chronicle.from_env() # CHRONICLE_SERVER_URL + CHRONICLE_API_KEY
# Upload the bytes, record provenance, and link as an input in one call.
ref = chronicle.datasets.upload(
path, # file → 1 component; dir → 1 per file
name=name, # optional; defaults to the file/dir name
asset_type="dataset",
content_type="application/octet-stream",
source=source, # e.g. "generated by tools/make_field.py"
provenance={"shape": [128, 128], "rows": 4096}, # optional domain facts
link_experiment=experiment_id, # link as input now…
link_variation=variation, # …at the variation level (omit → experiment-level)
propagate_acl=not sensitive, # False keeps a sensitive dataset off experiment ACLs
)
print(f"dataset {ref.asset_id} — {len(ref.components)} component(s), "
f"{ref.provenance['size_bytes']} bytes")
for c in ref.provenance["components"]:
print(f" {c['component']} sha256={c['sha256'][:12]}… {c['size_bytes']}B")
To link an existing dataset to a new variation, either pass it at
variation creation (variations.create(..., input_asset_ids=[ref.asset_id]))
or call chronicle.datasets.link(asset_id, experiment_id, variation=...).
from methodic import Chronicle
chronicle = Chronicle.from_env()
dest = chronicle.datasets.load(asset_id, "./data") # downloads every component
prov = chronicle.datasets.provenance(asset_id) # the recorded provenance, or None
print(f"loaded into {dest}; provenance: {prov}")
When you need to manage the component PUTs yourself (resumable retries,
externally generated bytes), register creates the asset + presigned URLs
without uploading:
info = chronicle.datasets.register(
components=["shard-0000.npz", "shard-0001.npz"],
name="big-dataset",
provenance={"rows": 10_000_000},
)
for comp, url in info.upload_urls.items():
chronicle.assets.upload_component(url, local_paths[comp], "application/octet-stream")
chronicle.assets.finalize(info.asset_id)
chronicle.datasets.link(info.asset_id, experiment_id, variation=variation)
Tell the user:
sensitive kept it off the experiment's readers.upload / link 403 — the caller lacks Write on the experiment.
Surface the message verbatim; the key needs experiment Write (or the dataset
must be uploaded unlinked and linked by someone who has it).link 409 "experiment/variation is committed" — inputs freeze on
commit. Link the dataset before committing the variation/experiment, or
add a new (open) variation.propagate_acl=True.upload_asset error "must declare its owning scope" — an unlinked
upload (link: "none") needs scope: "user" or scope: "organization" +
organization_id. Prefer linking as an input at upload time; fall back to
an explicit scope + chronicle.link_asset later only when the target
experiment/variation doesn't exist yet.An agent driving Chronicle through the MCP server (not the Python SDK) has
chronicle.upload_asset for the single-file / small-inline upload case
(inline base64 ≤2 MiB, or a presigned upload_url for a single large object)
and chronicle.load_asset(asset_id) to mint presigned read URLs +
provenance for an existing dataset's components. Multi-file / sharded datasets
are this skill's job — the SDK splits a directory into components, which the
one-shot MCP tools deliberately don't.
A dataset is an INPUT — always pass link: "input". Inputs are what the
experiment/variation consumes (datasets, reference fields, weights);
outputs are artifacts a run produced. A dataset uploaded with
link: "output" lands on the wrong side of the experiment record (the
Outputs tab) and corrupts lineage. Concretely:
chronicle.upload_asset(..., link: "input", variation: <idx>). No ACL propagation (workers read it via the
experiment's containment).variation; experiment-level input
links propagate the experiment's reader ACLs (disable with
propagate_acl: false for a sensitive dataset).link: "none", or reused from a
parent experiment) → chronicle.link_asset(experiment_id, asset_id, link: "input", variation?). Same freeze + invalidation gates as REST.chronicle.propose_variation(..., input_asset_ids: [<asset_id>]) links it
as a variation input at creation time; upload the bytes first.link: "none", the default) requires an explicit
owning scope so the asset isn't orphaned: scope: "user" for personal, or
scope: "organization" + organization_id (resolve via
chronicle.list_scopes; optional visibility, org-wide by default in an
org context).pip install methodic-research (≥0.9 — the chronicle.datasets namespace)CHRONICLE_API_KEY + CHRONICLE_SERVER_URL exported (or methodic auth login already done)organization_id: in
~/.methodic/config.yaml (or $CHRONICLE_ORGANIZATION_ID) — dataset
creates that omit organization_id then attribute to that org; pass
methodic.PERSONAL to force a personal-scope upload.git — this skill moves bytes via the API; no repo checkout needed.npx claudepluginhub methodic-research/skills --plugin methodicSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.