From claude-resources
Migrates GitHub Actions CI from self-hosted-with-fallback (detect-runner pattern) to direct Blacksmith cloud runner labels, handling cross-instance cache, pnpm/Node setup, container jobs, and leftover self-hosted steps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-resources:dev-blacksmith-migrationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Migrate a repo from the "detect-runner with self-hosted fallback" CI pattern to direct [Blacksmith](https://blacksmith.sh/) cloud runner labels. The same playbook applies to other ephemeral cloud runner services (RunsOn, BuildJet, Namespace, Depot) — the runner-label syntax differs, the gotchas don't.
Migrate a repo from the "detect-runner with self-hosted fallback" CI pattern to direct Blacksmith cloud runner labels. The same playbook applies to other ephemeral cloud runner services (RunsOn, BuildJet, Namespace, Depot) — the runner-label syntax differs, the gotchas don't.
The repo has at least one of:
.github/workflows/detect-runner.yml (or similarly named) reusable workflow that polls the GitHub API for online self-hosted runners and emits a runner outputruns-on: ${{ needs.detect-runner.outputs.runner }}runs-on: mixing self-hosted and ubuntu-latest via expressionsset-safe-directory: false on actions/checkout (a self-hosted-only optimization)If you don't see any of those, this skill is the wrong tool — the user just needs a normal runs-on label switch.
Run the audit script to find every self-hosted-ism in .github/workflows/:
bash $HOME/.claude/skills/dev-blacksmith-migration/scripts/audit.sh
It prints, for each workflow file:
runs-on: values (which need replacement)detect-runner references (job calls + needs: lists)set-safe-directory: false occurrencessafe.directory / chown / "Clean workspace" stepscontainer: (these need extra care — see Step 6)actions/cache/save → actions/cache/restore across jobs, or actions/upload-artifact → download-artifact)Read the output before making any edits.
Don't guess; ask. The answers determine which steps below to apply.
detect-runner entirely, or keep it as a fallback?ubuntu-latest output that consumers ignore once they hardcode the Blacksmith label.blacksmith-2vcpu-ubuntu-2204 (matches ubuntu-latest's 2-vCPU shape). Bigger jobs may want blacksmith-4vcpu-ubuntu-2204, blacksmith-8vcpu-ubuntu-2204, etc. — confirm with the user.ubuntu-2204 is the safe default. Use ubuntu-2404 only if the workflow explicitly needs Ubuntu 24.04 features.-arm (e.g. blacksmith-2vcpu-ubuntu-2204-arm) when targeting arm64 builds.runs-on: valuesEvery runs-on: value in .github/workflows/ becomes (using the spec from Step 2):
runs-on: blacksmith-2vcpu-ubuntu-2204
This includes:
runs-on: ${{ needs.detect-runner.outputs.runner }} — replaceruns-on: ubuntu-latest — replace (including the detect job inside detect-runner.yml itself if you're keeping the file)runs-on: self-hosted — replaceruns-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64 (or any RunsOn label) — replace, if the repo went RunsOn → BlacksmithUse replace_all: true on the Edit tool if all runs-on: values become identical.
detect-runner plumbing (if Step 2 #1 was "drop")In each consumer workflow:
detect-runner: job that calls uses: ./.github/workflows/detect-runner.ymldetect-runner from the needs: list:
needs: detect-runner → remove the line entirely (job has no other deps)needs: [detect-runner, build] → needs: buildneeds: [detect-runner, build, test] → needs: [build, test]Then git rm .github/workflows/detect-runner.yml.
The repo's RUNNER_CHECK_TOKEN GitHub Actions secret becomes orphan — tell the user it can be deleted manually if desired (you can't delete secrets via gh CLI without scope they may not want to grant).
If detect-runner.yml was emitting an IFTTT "self-hosted offline" notification, that goes away with the file. The deploy-status IFTTT notification (a separate job in the consumer workflow) is a different concern — leave that alone.
For every actions/checkout step:
# Self-hosted leftover — REMOVE the with: block (or just the one option)
- uses: actions/checkout@<sha>
with:
set-safe-directory: false # ← DELETE this option
Default set-safe-directory: true is required for container jobs to access the workspace. Leaving it false causes mysterious fatal: detected dubious ownership errors in container subprocesses.
Delete these step types if you find them — they're all "next-run cleanup" patterns that ephemeral runners don't need:
Clean workspace (rm -rf $GITHUB_WORKSPACE/... before the rest of the job)Fix workspace permissions (chown -R ... $GITHUB_WORKSPACE at job end)For any job that uses container: (not runs-on:), add this step before checkout:
test:
runs-on: blacksmith-2vcpu-ubuntu-2204
container:
image: mcr.microsoft.com/playwright:v1.58.2-noble
steps:
- name: Mark workspace as safe for git
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
- uses: actions/checkout@<sha>
# ... rest of the steps
Why: actions/checkout (a node action) writes safe.directory to /root/.gitconfig inside the container, but shell run: steps inside the same container have HOME=/github/home and read /github/home/.gitconfig. Without this step, lifecycle scripts like pnpm install's prepare (which runs lefthook install, husky install, etc.) hit fatal: detected dubious ownership in repository at '/__w/<repo>/<repo>'.
This is not self-hosted-specific — it's a container-on-any-runner concern. The original codebase probably had this step alongside set-safe-directory: false, and the pair looked self-hosted-only. Keep this step; drop the set-safe-directory: false.
If the workflow has multiple jobs and shares files between them (typical Build → Test → Deploy split), audit the existing pattern:
actions/cache/save → actions/cache/restore keyed by ${{ github.run_id }}: works on a single self-hosted runner. Blacksmith provides an accelerated cache backend that survives across instances, but actions/cache was never designed as a job-to-job pipe — it's a "speed up next run" mechanism. Misusing it as inter-job transport is fragile (cache eviction, key collisions, container-network edge cases).actions/upload-artifact@v4 → actions/download-artifact@v4: route through api.github.com, work cross-instance, work in containers, the documented inter-job transport.Recommended for any Blacksmith migration with multi-job workflows: switch to artifacts.
Concrete swap (in the upstream job):
# BEFORE
- name: Cache blog build output
uses: actions/cache/save@<sha>
with:
path: blog/dist/
key: blog-build-${{ github.run_id }}
# AFTER
- name: Upload blog build output
uses: actions/upload-artifact@v4
with:
name: blog-dist
path: blog/dist/
retention-days: 1
if-no-files-found: error
In the downstream jobs:
# BEFORE
- name: Restore blog build cache
uses: actions/cache/restore@<sha>
with:
path: blog/dist/
key: blog-build-${{ github.run_id }}
# AFTER
- name: Download blog build output
uses: actions/download-artifact@v4
with:
name: blog-dist
path: blog/dist/
Blacksmith's accelerated actions/cache backend is fine to keep using for its intended purpose — speeding up setup-node, the pnpm store, and build-tool caches across runs. Just don't use it as an inter-job pipe within a single run.
Any job that does NOT have its own actions/checkout but DOES run commands like pnpm, npm, or node is a self-hosted leftover. On the persistent runner, the workspace and toolchain were inherited from a previous job; on ephemeral runners, each job starts on a fresh VM.
Symptom: pnpm: command not found in the deploy job after Build and Test pass.
Fix: add the missing setup steps at the top of the job:
deploy:
steps:
- name: Checkout repository # if the job needs package.json / pnpm-workspace.yaml
uses: actions/checkout@<sha>
- name: Setup pnpm
uses: pnpm/action-setup@<sha>
- name: Setup Node.js
uses: actions/setup-node@<sha>
with:
node-version: <match the other jobs>
# ... existing artifact downloads, deploy commands, etc.
If the deploy job runs pnpm add -w <pkg> or any command that needs a pnpm workspace, the actions/checkout is required (otherwise there's no package.json / pnpm-workspace.yaml for pnpm to find). Otherwise just the two setup steps may be enough.
PR-level CI (often pr-checks.yml) usually runs a single-job preview-deploy workflow. It cannot exercise the cross-job artifact passing or the container-job paths that the production deploy uses. Pre-merge green on pr-checks is necessary but not sufficient.
The full validation requires merging to the trigger branch (usually main) and watching the production deploy. Plan for one or more iteration cycles directly on main if the user is OK with that, or coordinate via short-lived hotfix PRs.
For each push, watch CI with /watch-ci <pr> (PR mode) or /watch-ci (auto-detects the merged-PR path on the target branch).
When a deploy fails post-merge, check the failing job's step name and match against the table below before re-reading logs in detail:
| Failing step output | Cause | Fix |
|---|---|---|
Cache not found for input keys: ...-<run_id> | actions/cache used as inter-job transport on ephemeral runners | Switch to artifacts (Step 7) |
pnpm: command not found in deploy job | Deploy-only job missing setup steps | Add Setup pnpm + Setup Node.js (Step 8) |
fatal: detected dubious ownership in repository at '/__w/...' | Container-job HOME mismatch between checkout and shell git | Add manual safe.directory step before checkout (Step 6) |
pnpm add -w <pkg> errors with no pnpm-workspace.yaml found | Deploy job has no checkout | Add actions/checkout to the job (Step 8) |
| Build job's IFTTT alert "self-hosted runner offline" still firing | Old detect-runner.yml still in repo | git rm the workflow file (Step 4) |
For deeper context on each, see references/troubleshooting.md.
This skill is for the migration. Day-to-day GitHub Actions best practices (timeouts, concurrency, action pinning, security) live in /gh-actions-wisdom. Read both when starting a migration so you don't accidentally regress on those general rules while shuffling runner labels.
npx claudepluginhub takazudo/claude-resources --plugin claude-resourcesAdds self-hosted GitHub Actions runner support with automatic fallback to ubuntu-latest. Creates a reusable detect-runner.yml workflow that checks runner availability via GitHub API and modifies existing workflows.
Design, debug, and harden GitHub Actions CI/CD workflows including reusable workflows, matrix builds, self-hosted runners, OIDC authentication, caching, environments, secrets, and release automation.
Write and optimize GitHub Actions workflows. Use when creating CI/CD pipelines, configuring workflow triggers, managing artifacts, or debugging workflow runs.