Skill

fgcz-custom-analysis-register

Promote a one-off R Markdown analysis into a SUSHI-shaped folder on gstore so it can be chained as a parent dataset by downstream SUSHI apps. Use when delivering a custom analysis (a hand-written Rmd, not a SUSHI app) to a user, when an analysis output needs to appear in the SUSHI lineage tree, when input came from an upstream SUSHI dataset and the result should be linkable back. Triggers on "custom analysis", "register analysis in SUSHI", "Hubert blueprint", "promote Rmd to SUSHI", "dataset.tsv parameters.tsv input_dataset.tsv", "wrap manual analysis for B-Fabric", "make this Rmd look like a SUSHI app output".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/fgcz-infrastructure:deprecated-custom-analysis-register

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Wrap a hand-written R Markdown analysis so it produces the same on-disk shape a SUSHI app would. The output drops cleanly into the SUSHI dataset graph: any downstream SUSHI app (ScSeurat, exploreSC, etc.) can chain off it, and the B-Fabric + SUSHI registrations (Steps 9–10) make it visible in the SUSHI UI and the B-Fabric audit trail.

Supporting Files

references/example-run-annotated.R

SKILL.md

292 lines · ~3.9k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitMay 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

fgcz-custom-analysis-register

Based on Hubert's blueprint at gitlab.bfabric.org/Genomics/hubert-scripts-2026/p40992-Alithea-FlashSeq/example-run.R. The B-Fabric side is live via Ronald's register_custom_analysis.py (pending merge into btools main). The production-SUSHI side is not wired into that script — Step 10 does it directly via a MySQL insert (the script's built-in --register-sushi only writes to dev SUSHI; see warning in Step 9).

When to use

You wrote an .Rmd, rendered it locally, and the user now wants the output delivered to gstore.
The analysis consumes outputs from a SUSHI app (FeatureCounts, STAR, ScSeurat, CellRangerMulti…) and should be chainable from it in the SUSHI UI.
You want the SUSHI/B-Fabric audit trail to show this analysis even though you didn't build a real SUSHI app.

Do NOT use for:

Building a real SUSHI app → use fgcz-sushi-app-dev.
Just rendering an Rmd with retry/validation logic → use autonomous-render.
Reading/writing B-Fabric workunits or datasets directly → use bfabric.

Inputs to gather before scaffolding

Confirm aloud with the user — echo back project + order before writing anything (the existing "confirm order numbers" rule applies here):

Field	Example	Notes
`project`	`p40992`	The pXXXXX directory under `/srv/gstore/projects/`.
`order_id`	`o41017`	The oXXXXX prefix matching the upstream order.
`analysis_name`	`SC-FlashSeq_QC_Evaluation`	Short, no spaces. Becomes part of the folder name and filenames.
`timestamp`	`2026-05-15--12-00-00`	Generate fresh with `format(Sys.time(), "%Y-%m-%d--%H-%M-%S")` — don't reuse a prior one.
Rmd path	`~/git/<your-scripts>/pXXXXX_*/QC.Rmd`	The source `.Rmd`.
Upstream SUSHI dataset path	`/srv/gstore/projects/pXXXXX/oYYYYY_FeatureCounts_YYYY-MM-DD--...`	Used in Step 4 to copy `input_dataset.tsv` for provenance. Without it, the SUSHI lineage tree has nothing to link back to.
Upstream SUSHI dataset ID	e.g. `109531`	The numeric ID from production SUSHI's `data_sets` table (not the dev SUSHI ID, which is different — look it up via the URL `fgcz-sushi.uzh.ch/data_set/pXXXXX/<id>` or query the DB via the `sushi-framework` skill). Needed in Step 10 to set `parent_id` and build the lineage edge.

If the upstream SUSHI dataset path is unknown, ask the user — don't guess. Provenance without a real parent is worse than no provenance.

What this skill produces

/srv/gstore/projects/{project}/{order_id}_{analysis_name}_{timestamp}/
├── {analysis_name}.Rmd                # source (copied from your repo)
├── {analysis_name}.html               # rendered report
├── dataset.tsv                        # output dataset — SUSHI schema
├── parameters.tsv                     # analysis parameters
├── input_dataset.tsv                  # provenance — copied from upstream SUSHI dataset
├── *.qs2                              # any cached objects the Rmd writes
└── scripts/
    ├── {order_id}_run-{analysis}.sh   # vanilla bash launcher
    ├── {order_id}_run-{analysis}_o.log
    └── {order_id}_run-{analysis}_e.log

The folder name and the three TSVs are the exact contract every SUSHI app honours — that's what makes the output chainable.

Step-by-step recipe

Work locally in /srv/GT/analysis/{project}/Analyses_Paul/ (or a /tmp/ scratch dir for dry runs), then g-req the finished folder to gstore at the end. Never write into /srv/gstore/ directly.

Step 1 — Build the folder

library(ezRun)
library(stringr)

project          <- "p40992"
order_id         <- "o41017"
analysis_name    <- "SC-FlashSeq_QC_Evaluation"
analysis_version <- format(Sys.time(), "%Y-%m-%d--%H-%M-%S")

gstore_folder <- paste0(project, "/", order_id, "_", analysis_name, "_", analysis_version)

setwdNew(basename(gstore_folder))   # creates and chdir into the timestamped folder
dir.create("scripts")

setwdNew is from ezRun; it creates the directory if missing and setwd()s into it. The folder name must start with {order_id}_ so SUSHI's lineage parser picks it up.

Step 2 — Write `dataset.tsv` (output schema)

rmd_file  <- paste0(analysis_name, ".Rmd")
html_file <- str_replace(rmd_file, "\\.Rmd$", ".html")

output_dataset <- ezFrame(
  "Name"            = analysis_name,
  "Html [File,Link]"= file.path(gstore_folder, html_file),
  "Rmd [File]"      = file.path(gstore_folder, rmd_file)
)
ezWrite.table(output_dataset, file = "dataset.tsv", row.names = FALSE)

The square-bracket suffixes ([File], [File,Link]) are the SUSHI column-type tags — they tell the SUSHI UI to render the path as a downloadable link. Don't drop them.

Step 3 — Write `parameters.tsv`

params <- list(
  analysis_name    = analysis_name,
  analysis_version = analysis_version
)

paramFrame <- ezFrame(Value = sapply(params, as.character))
ezWrite.table(paramFrame, file = "parameters.tsv", col.names = FALSE)

Keep this honest — add every parameter that materially affected the result (gene annotation version, thresholds, reference paths). Future-you will thank present-you.

Step 4 — Copy `input_dataset.tsv` (provenance)

upstream <- "/srv/gstore/projects/p40992/o41017_FeatureCounts_2026-02-18--12-45-59"
file.copy(file.path(upstream, "dataset.tsv"), "input_dataset.tsv")

This is the file-level provenance link. The DB-level link is the parent_id column on the data_sets row that Step 10 inserts — both are belt-and-braces so the lineage is visible in the file tree on gstore and in the SUSHI UI's parent-child navigation.

Step 5 — Drop the rendered Rmd into the folder

file.copy(file.path("..", rmd_file), rmd_file)   # or wherever the source lives

Step 6 — Emit the vanilla bash launcher

bash_commands <- sprintf('#!/bin/bash
set -eux
set -o pipefail
umask 0002

source /usr/local/ngseq/etc/lmod_profile
module add Dev/R/4.6.0

R --vanilla --slave <<EOT
  rmarkdown::render(
    input      = "%s",
    envir      = new.env(),
    output_dir = ".",
    quiet      = FALSE
  )
EOT
', rmd_file)

launcher <- sprintf("scripts/%s_run-%s.sh", order_id, tolower(analysis_name))
writeLines(bash_commands, con = launcher)
Sys.chmod(launcher, mode = "0755")

Why R --vanilla --slave: no .Rprofile, no .Renviron, no .RData is read. The render runs with only what the heredoc loads. That means anyone landing on this folder in gstore six months from now can rerun the script and reproduce the result without guessing what env produced it.

Step 7 — Run the launcher, capture logs

o_log <- sub("\\.sh$", "_o.log", launcher)
e_log <- sub("\\.sh$", "_e.log", launcher)
system2("bash", args = launcher, stdout = o_log, stderr = e_log)

The two log files sit next to the launcher in scripts/ — same level as the script that produced them.

Step 8 — `g-req` to gstore

system2("/usr/local/ngseq/bin/g-req",
        args = c("copynow", ".", dirname(gstore_folder)))

The subtle bit: g-req copynow . X/ copies the current directory as a subdirectory of X/. Because gstore_folder already contains the basename (the timestamped folder name), dirname(gstore_folder) resolves to the project's gstore root, and the copy lands at exactly the right place. This is the only pattern where the "g-req creates a subdirectory" gotcha works in your favour — see CLAUDE.md "g-req Commands" for the general rule (always copy individual files, not directories).

After it succeeds, surface the web URL. The HTML on gstore is reachable via the SUSHI proxy (works in any FGCZ browser session without re-auth):

https://fgcz-sushi.uzh.ch/projects/{project}/{order_id}_{analysis_name}_{timestamp}/{analysis_name}.html

The same file is also served at https://fgcz-gstore.uzh.ch/projects/... — that URL works too but is Basic-auth-walled, so the fgcz-sushi.uzh.ch form is friendlier for sharing.

The SUSHI page (once Step 10 below has run) lives at:

https://fgcz-sushi.uzh.ch/data_set/p{project_number}/{sushi_dataset_id}

Note the URL pattern: /data_set/ (underscore, not /datasets/) and the p prefix on the project number.

Step 9 — Register the B-Fabric workunit + dataset

register_custom_analysis.py (currently at /home/rdomi/btools/btools/, pending merge into btools main) creates a B-Fabric workunit and dataset and registers all the gstore files as resources. Do not pass --register-sushi if you want a production-visible SUSHI dataset — see the warning below.

If you don't already have a working btools env, the easiest setup is to copy the script + dependent src/ files into your own btools clone and invoke via uv run:

# One-time setup: bring Ronald's script + sync the src/ helpers it imports
cp /home/rdomi/btools/btools/register_custom_analysis.py ~/git/btools/btools/
cp /home/rdomi/btools/btools/src/{paths,bfabric_utils,tsv_utils,resource_utils}.py \
   ~/git/btools/btools/src/

# Per-run invocation: B-Fabric ONLY (no --register-sushi)
cd ~/git/btools && \
  BFABRICPY_CONFIG_ENV=PRODUCTION uv run python btools/register_custom_analysis.py \
    /srv/gstore/projects/{project}/{order_id}_{analysis_name}_{timestamp} \
    --generated-using Claude_Agent \
    --generated-for {user} \
    --verbose

Capture the output. It prints two integers on the last line:

B-Fabric workunit_id: <WU>
B-Fabric dataset_id:  <BFDS>

Both are needed for Step 10 (link them into the SUSHI row).

⚠️ Why NOT --register-sushi — and what to use instead:

The script's --register-sushi flag POSTs to http://fgcz-h-083:4071/projects/{project}/datasets/register, which is the DEV SUSHI Python API on fgcz-h-083. Production SUSHI (fgcz-sushi.uzh.ch / fgcz-h-082) has no equivalent Python API on any port — only Rails on :8880 behind auth. So --register-sushi writes to dev only; the dataset will not be visible at https://fgcz-sushi.uzh.ch/data_set/.... Use Step 10 below to do a direct production SUSHI MySQL insert and link it to the B-Fabric IDs from Step 9.

Other notes:

--generated-using Claude_Agent is the governance tag for LLM-origin runs.
Never default to BFABRICPY_CONFIG_ENV=PRODUCTION without explicit user confirmation.

Step 10 — Register the dataset in production SUSHI (MySQL direct)

This is the only way to make the analysis visible at https://fgcz-sushi.uzh.ch/data_set/p{project_number}/{id} today. The full recipe (column names, gotchas, Ruby-hash syntax for samples.key_value) is in the sushi-framework skill under "Registering Results in SUSHI Database". The short version, parameterised for this skill's outputs:

# Needs: $SUSHI_DB_PASSWORD env var (ask admin), ssh access to fgcz-h-082

ssh fgcz-h-082 'mysql -u sushilover -p"$SUSHI_DB_PASSWORD" sushi' <<SQL
-- Identifiers
SET @project_id = (SELECT id FROM projects WHERE number = {project_number});
SET @user_id    = (SELECT id FROM users    WHERE login  = '{user}');
SET @parent_id  = {upstream_sushi_dataset_id};   -- from the SUSHI URL of the upstream job
SET @bfabric_id = {BFDS};                         -- from Step 9
SET @workunit_id = {WU};                          -- from Step 9

INSERT INTO data_sets
  (project_id, parent_id, name, created_at, updated_at,
   num_samples, completed_samples, user_id, child,
   sushi_app_name, order_id, bfabric_id, workunit_id)
VALUES
  (@project_id, @parent_id,
   '{order_id}_{analysis_name}_{timestamp}',
   NOW(), NOW(), 1, 1, @user_id, 1,
   'CustomAnalysis', {order_number},
   @bfabric_id, @workunit_id);
SET @new_id = LAST_INSERT_ID();

-- One sample row matching dataset.tsv columns (Ruby hash-rocket syntax, NOT JSON colons)
INSERT INTO samples (key_value, data_set_id, created_at, updated_at) VALUES
  ('{"Name"=>"{analysis_name}", '
   '"Html [File,Link]"=>"p{project_number}/{order_id}_{analysis_name}_{timestamp}/{analysis_name}.html", '
   '"Rmd [File]"=>"p{project_number}/{order_id}_{analysis_name}_{timestamp}/{analysis_name}.Rmd"}',
   @new_id, NOW(), NOW());

SELECT id, name, parent_id, bfabric_id, workunit_id FROM data_sets WHERE id = @new_id;
SQL

Result: the SUSHI page is now live at https://fgcz-sushi.uzh.ch/data_set/p{project_number}/<id_from_LAST_INSERT_ID>, the B-Fabric icon shows up in the SUSHI UI, and the lineage edge to the upstream parent is in place.

Gotchas inherited from the sushi-framework skill (worth re-reading there if anything looks off):

samples.key_value uses Ruby => hash rockets, not JSON colons. Pipe the SQL via stdin to avoid shell quoting hell.
Production SUSHI is on fgcz-h-082 and owned by trxcopy. You can run mysql read/write as your own user; only the Rails log requires trxcopy.
SUSHI UI displays sample columns in reverse JSON-insertion order — first key becomes the rightmost column. Order the key_value hash with Name first so the leftmost column matches the dataset.tsv schema.

Guardrails

Confirm project + order_id aloud before scaffolding. Type them back. Cheap to do, expensive to get wrong.
Always generate a fresh timestamp — never reuse one. Each run gets its own immutable folder.
Always supply the web URL after g-req succeeds. CLAUDE.md mandates this.
Don't write into /srv/gstore/ directly. Work in /srv/GT/analysis/{project}/Analyses_Paul/<scratch>/, then g-req.
Don't pass --register-sushi to register_custom_analysis.py (Step 9 warning) — it only writes to dev. Production SUSHI registration goes through MySQL (Step 10).

Follow-ups (deliberately out of scope)

Merge register_custom_analysis.py + src/ updates into main btools so users don't have to copy from /home/rdomi/.
Build a production-SUSHI Python API (mirror of the dev one on fgcz-h-083:4071) so Step 10 can become an HTTP call instead of a raw MySQL insert. Until then, Step 10 is the canonical path.
Add SBATCH wrapping for long renders. The synchronous bash launcher OOMs on heavy CellChat / multi-GB Seurat workloads.
Add evals/evals.json with 2–3 test prompts once the workflow has stabilised.

fgcz-custom-analysis-register

Invocation

Context Preview

Supporting Files

SKILL.md

fgcz-custom-analysis-register

Invocation

Context Preview

Supporting Files

SKILL.md

fgcz-custom-analysis-register

When to use

Inputs to gather before scaffolding

What this skill produces

Step-by-step recipe

Step 1 — Build the folder

Step 2 — Write dataset.tsv (output schema)

Step 3 — Write parameters.tsv

Step 4 — Copy input_dataset.tsv (provenance)

Step 5 — Drop the rendered Rmd into the folder

Step 6 — Emit the vanilla bash launcher

Step 7 — Run the launcher, capture logs

Step 8 — g-req to gstore

Step 9 — Register the B-Fabric workunit + dataset

Step 10 — Register the dataset in production SUSHI (MySQL direct)

Guardrails

See also

Follow-ups (deliberately out of scope)

Similar Skills

fgcz-custom-analysis-register

When to use

Inputs to gather before scaffolding

What this skill produces

Step-by-step recipe

Step 1 — Build the folder

Step 2 — Write dataset.tsv (output schema)

Step 3 — Write parameters.tsv

Step 4 — Copy input_dataset.tsv (provenance)

Step 5 — Drop the rendered Rmd into the folder

Step 6 — Emit the vanilla bash launcher

Step 7 — Run the launcher, capture logs

Step 8 — g-req to gstore

Step 9 — Register the B-Fabric workunit + dataset

Step 10 — Register the dataset in production SUSHI (MySQL direct)

Guardrails

See also

Follow-ups (deliberately out of scope)

Similar Skills

Step 2 — Write `dataset.tsv` (output schema)

Step 3 — Write `parameters.tsv`

Step 4 — Copy `input_dataset.tsv` (provenance)

Step 8 — `g-req` to gstore

Step 2 — Write `dataset.tsv` (output schema)

Step 3 — Write `parameters.tsv`

Step 4 — Copy `input_dataset.tsv` (provenance)

Step 8 — `g-req` to gstore