From dataset-cleaner
Given a dataset name, find its expression matrix on the VEuPathDB workflow server, rsync it locally, and determine the WGCNA soft-threshold power.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dataset-cleaner:wgcna-power-thresholdThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
!`cat ${CLAUDE_SKILL_DIR}/../../shared/sync-dataset-steps.md`
!cat ${CLAUDE_SKILL_DIR}/../../shared/sync-dataset-steps.md
Substitute every occurrence of DATASET_NAME above with: $ARGUMENTS
After completing the steps above you have remotePath — the full path to the dataset directory on yew. Continue:
Inspect remotePath for "rnaseq" or "microarray" (case-insensitive).
Search for WGCNA input files under remotePath:
ssh yew "find <remotePath> -type f -name 'wgcnaInput*.txt'"
Case A — only wgcnaInput_unstranded.txt found:
Capture its full path as remoteInputFile. Skip to Step 7.
Case B — both wgcnaInput_firststrand.txt and wgcnaInput_secondstrand.txt found:
Rsync both files locally to decide which is sense:
mkdir -p /tmp/$ARGUMENTS
rsync -aL yew:<path/to/wgcnaInput_firststrand.txt> /tmp/$ARGUMENTS/
rsync -aL yew:<path/to/wgcnaInput_secondstrand.txt> /tmp/$ARGUMENTS/
Run R to compare column means:
Rscript ${CLAUDE_SKILL_DIR}/../../bin/select-strand.R $ARGUMENTS
AMBIGUOUS, stop and tell the user the strand could not be determined (report both means).remoteInputFile to the remote path of the selected strand file. Skip to Step 7.Case C — no wgcnaInput*.txt files found:
Stop and tell the user no WGCNA input files were found under remotePath.
ssh yew "find <remotePath> -type f -name 'profiles.txt'"
If exactly one file is found, capture it as remoteInputFile. Continue to Step 7.
If zero files are found, stop and tell the user no profiles.txt was found — list all .txt files under remotePath and ask the user to pick one.
If more than one is found, show the list and ask the user to pick one.
!cat ${CLAUDE_SKILL_DIR}/../../shared/manual-delivery-path.md
Substitute every occurrence of DATASET_NAME above with: $ARGUMENTS
technologyType was determined in Step 4 (RNASeq or Microarray).
After completing the steps above you have manualDeliveryPath. The analysisConfig.xml lives at:
<manualDeliveryPath>/analysisConfig.xml
Rsync it along with the chosen input file:
mkdir -p /tmp/$ARGUMENTS
rsync -aL yew:<remoteInputFile> /tmp/$ARGUMENTS/
rsync -aL yew:<manualDeliveryPath>/analysisConfig.xml /tmp/$ARGUMENTS/
If analysisConfig.xml is not found on the remote (rsync exits non-zero or the file is absent), stop and tell the user it was not found at <manualDeliveryPath>/analysisConfig.xml.
Write a paths.txt file recording the key remote paths found during this run:
cat > /tmp/$ARGUMENTS/paths.txt <<EOF
datasetName: $ARGUMENTS
remoteInputFile: <remoteInputFile>
manualDeliveryPath: <manualDeliveryPath>
analysisConfig: <manualDeliveryPath>/analysisConfig.xml
EOF
Rscript ${CLAUDE_SKILL_DIR}/../../bin/wgcna-power-threshold.R $ARGUMENTS
Capture the recommended power as softThresholdPower. If it is NA, stop and tell the user no clear threshold was found and show the fit table.
By this point you have captured:
datasetName = $ARGUMENTSsoftThresholdPower from Step 8organismAbbrev from Step 2inputFileBasename = basename of remoteInputFilestrandType = one of: firststrand, secondstrand, unstranded, or microarraytechnologyType = RNASeq (Steps 5) or Microarray (Step 6)Derive inputSuffixMM:
[module - membership - <strandType> - tpm - unique][module - membership - microarray]Create the dataset output directory, copy the rsynced analysisConfig.xml and the plot into it, then insert the <step> block as a new child element immediately before the last closing tag in analysisConfig.xml:
mkdir -p ~/wgcna/$ARGUMENTS
cp /tmp/$ARGUMENTS/analysisConfig.xml ~/wgcna/$ARGUMENTS/analysisConfig.xml
cp /tmp/$ARGUMENTS/power_threshold_plot.pdf ~/wgcna/$ARGUMENTS/
cp /tmp/$ARGUMENTS/paths.txt ~/wgcna/$ARGUMENTS/paths.txt
python3 - <<'PYEOF'
import os
path = os.path.expanduser("~/wgcna/$ARGUMENTS/analysisConfig.xml")
with open(path) as f:
content = f.read()
step_xml = """
<!-- datasetName: $ARGUMENTS | project: <projectName> | organism: <organismAbbrev> | softThresholdPower: <softThresholdPower> -->
<step class="ApiCommonData::Load::IterativeWGCNAResults">
<property name="profileSetName" value="WGCNA $ARGUMENTS" />
<property name="inputFile" value="<inputFileBasename>" />
<property name="softThresholdPower" value="<softThresholdPower>" />
<property name="organismAbbrev" value="<organismAbbrev>" />
<property name="inputSuffixMM" value="<inputSuffixMM>" />
<property name="technologyType" value="<technologyType>" />
<property name="threshold" value="1" />
<property name="samples" isReference="1" value="$globalReferencable->{samples}" />
</step>
"""
last_close = content.rfind('</')
if last_close == -1:
content = content + step_xml
else:
content = content[:last_close] + step_xml + content[last_close:]
with open(path, 'w') as f:
f.write(content)
print("Step inserted into", path)
PYEOF
Report the output directory (~/wgcna/$ARGUMENTS/) and its contents to the user:
analysisConfig.xml — rsynced file with the step XML insertedpower_threshold_plot.pdf — scale-free topology plot used to choose the power thresholdpaths.txt — remote paths found during this run (input file and analysisConfig)npx claudepluginhub veupathdb/dataset-cleanerCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.