Skill

chronicle-dataset

Use this skill when the user (or an agent flow) wants to upload a dataset / training data / reference field to a Chronicle experiment, or to load one back. Phrases like "upload this dataset", "register the training data", "attach this .npz as an input to the variation", "add the reference field", "load the dataset for experiment X", "download the dataset". A dataset is a binary asset (e.g. an `.npz`) uploaded via the presigned-PUT flow with a recorded provenance record (per-component sha256 + size), then linked as an experiment- or variation-level **input**. A directory uploads as one component per file — the way to shard GB-scale data. For *reports* / write-ups use chronicle-write-report; for authoring a variation's *config* use chronicle-author-variation. This is the data counterpart to those.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/methodic:chronicle-dataset

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Get a dataset into Chronicle as a binary asset and attach it to an experiment

SKILL.md

206 lines · ~2.6k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Dataset upload + load

Get a dataset into Chronicle as a binary asset and attach it to an experiment or variation as an input — or pull an existing dataset back down. This is a thin orchestration over the SDK's chronicle.datasets namespace; it never makes raw HTTP calls.

Datasets are plain assets (asset_type="dataset"): there is no separate dataset table. They are search-indexed on metadata only — the dataset's name + provenance record, never the bytes — and become discoverable in search once linked (or output-stamped), alongside discovery through the experiment/variation link. The upload records a provenance record on the asset so the bytes are verifiable later.

Sizing — single PUT per component, no multipart

Each component is a single presigned PUT — there is no multipart upload.

MB-scale (a reference field, a small .npz) → one file, one component.
GB-scale → pass a directory: upload makes one component per file, which is the sharding mechanism. On GCS the presigned write URL is a resumable session, so a large single object also works; the S3 fallback caps a single PUT, so prefer file-level sharding for portability. Split a huge monolithic array into per-shard files before uploading.

Inputs

path (upload) — a single file or a directory of shards.
experiment_id — the Chronicle experiment UUID. Resolve in order:
1. Explicit argument from the user
2. methodic config (~/.config/methodic/current_experiment)
3. Detect from cwd if inside a clone of the experiment repo
4. Prompt the user
variation (optional) — variation index (or plaintext name → resolve to index first) to link the dataset as a variation input. Omit to link at the experiment level (shared across the experiment's variations).
asset_type (default dataset) — free-form; use a more specific type (e.g. reference_field) when it helps downstream consumers.
source / provenance (optional) — source is a one-line origin string (a URL, a generator command, a parent run); provenance is a dict of domain facts you know (shape, rows, schema, generator config). Both are merged into the stored provenance record — the sha256/size/component facts are computed for you.
sensitive (optional) — when true, link with propagate_acl=False so the dataset does not inherit the experiment's reader ACLs.
For load: asset_id + a dest directory.

Workflow — upload

from methodic import Chronicle

chronicle = Chronicle.from_env()  # CHRONICLE_SERVER_URL + CHRONICLE_API_KEY

# Upload the bytes, record provenance, and link as an input in one call.
ref = chronicle.datasets.upload(
    path,                                  # file → 1 component; dir → 1 per file
    name=name,                             # optional; defaults to the file/dir name
    asset_type="dataset",
    content_type="application/octet-stream",
    source=source,                         # e.g. "generated by tools/make_field.py"
    provenance={"shape": [128, 128], "rows": 4096},  # optional domain facts
    link_experiment=experiment_id,         # link as input now…
    link_variation=variation,              # …at the variation level (omit → experiment-level)
    propagate_acl=not sensitive,           # False keeps a sensitive dataset off experiment ACLs
)

print(f"dataset {ref.asset_id} — {len(ref.components)} component(s), "
      f"{ref.provenance['size_bytes']} bytes")
for c in ref.provenance["components"]:
    print(f"  {c['component']}  sha256={c['sha256'][:12]}…  {c['size_bytes']}B")

To link an existing dataset to a new variation, either pass it at variation creation (variations.create(..., input_asset_ids=[ref.asset_id])) or call chronicle.datasets.link(asset_id, experiment_id, variation=...).

Workflow — load

from methodic import Chronicle

chronicle = Chronicle.from_env()

dest = chronicle.datasets.load(asset_id, "./data")   # downloads every component
prov = chronicle.datasets.provenance(asset_id)        # the recorded provenance, or None
print(f"loaded into {dest}; provenance: {prov}")

Driving the upload yourself (very large / custom transfer)

When you need to manage the component PUTs yourself (resumable retries, externally generated bytes), register creates the asset + presigned URLs without uploading:

info = chronicle.datasets.register(
    components=["shard-0000.npz", "shard-0001.npz"],
    name="big-dataset",
    provenance={"rows": 10_000_000},
)
for comp, url in info.upload_urls.items():
    chronicle.assets.upload_component(url, local_paths[comp], "application/octet-stream")
chronicle.assets.finalize(info.asset_id)
chronicle.datasets.link(info.asset_id, experiment_id, variation=variation)

After the skill completes

Tell the user:

The dataset asset id, its component count + total size, and the experiment (and variation) it's linked to as an input.
Whether ACL propagation was applied (so they know who can read it) — call out when sensitive kept it off the experiment's readers.
For a load: the destination directory and the recorded provenance.

Failure modes

upload / link 403 — the caller lacks Write on the experiment. Surface the message verbatim; the key needs experiment Write (or the dataset must be uploaded unlinked and linked by someone who has it).
link 409 "experiment/variation is committed" — inputs freeze on commit. Link the dataset before committing the variation/experiment, or add a new (open) variation.
GB-scale single file on S3 — a single presigned PUT can't carry it. Shard into a directory of files (one component each) and upload the directory; don't try to stream a multi-GB monolith through one PUT.
Variation-input ACLs don't propagate — linking at the variation level does not stamp experiment readers onto the asset (the server only propagates for experiment-level links). A worker reads a variation-input dataset via the experiment's containment; if you need every experiment member to read it directly, link at the experiment level with propagate_acl=True.
Wrong asset type for a report — datasets are binary inputs. A written findings/takeaways document is a report — use chronicle-write-report, not this skill.
MCP upload_asset error "must declare its owning scope" — an unlinked upload (link: "none") needs scope: "user" or scope: "organization" + organization_id. Prefer linking as an input at upload time; fall back to an explicit scope + chronicle.link_asset later only when the target experiment/variation doesn't exist yet.

MCP-native agents

An agent driving Chronicle through the MCP server (not the Python SDK) has chronicle.upload_asset for the single-file / small-inline upload case (inline base64 ≤2 MiB, or a presigned upload_url for a single large object) and chronicle.load_asset(asset_id) to mint presigned read URLs + provenance for an existing dataset's components. Multi-file / sharded datasets are this skill's job — the SDK splits a directory into components, which the one-shot MCP tools deliberately don't.

A dataset is an INPUT — always pass link: "input". Inputs are what the experiment/variation consumes (datasets, reference fields, weights); outputs are artifacts a run produced. A dataset uploaded with link: "output" lands on the wrong side of the experiment record (the Outputs tab) and corrupts lineage. Concretely:

Variation-scoped dataset → chronicle.upload_asset(..., link: "input", variation: <idx>). No ACL propagation (workers read it via the experiment's containment).
Experiment-shared dataset → omit variation; experiment-level input links propagate the experiment's reader ACLs (disable with propagate_acl: false for a sensitive dataset).
Order matters: inputs freeze at commit. Upload + link before committing the variation/experiment; after commit the input link is refused (the freeze is the point — add a new open variation instead).
Existing asset (uploaded earlier with link: "none", or reused from a parent experiment) → chronicle.link_asset(experiment_id, asset_id, link: "input", variation?). Same freeze + invalidation gates as REST.
Proposing a variation around a dataset → chronicle.propose_variation(..., input_asset_ids: [<asset_id>]) links it as a variation input at creation time; upload the bytes first.
Unlinked upload (link: "none", the default) requires an explicit owning scope so the asset isn't orphaned: scope: "user" for personal, or scope: "organization" + organization_id (resolve via chronicle.list_scopes; optional visibility, org-wide by default in an org context).

Requires

pip install methodic-research (≥0.9 — the chronicle.datasets namespace)
CHRONICLE_API_KEY + CHRONICLE_SERVER_URL exported (or methodic auth login already done)
Optional: a default organization via organization_id: in ~/.methodic/config.yaml (or $CHRONICLE_ORGANIZATION_ID) — dataset creates that omit organization_id then attribute to that org; pass methodic.PERSONAL to force a personal-scope upload.
No git — this skill moves bytes via the API; no repo checkout needed.

chronicle-dataset

Invocation

Context Preview

SKILL.md

chronicle-dataset

Invocation

Context Preview

SKILL.md

Dataset upload + load

Sizing — single PUT per component, no multipart

Inputs

Workflow — upload

Workflow — load

Driving the upload yourself (very large / custom transfer)

After the skill completes

Failure modes

MCP-native agents

Requires

Similar Skills

Dataset upload + load

Sizing — single PUT per component, no multipart

Inputs

Workflow — upload

Workflow — load

Driving the upload yourself (very large / custom transfer)

After the skill completes

Failure modes

MCP-native agents

Requires

Similar Skills