From ravie
Use this skill when creating, modifying, debugging, retiring, or documenting scheduled jobs, cron/systemd timers, VPS or automation-server tasks, retries, health checks, logs, or automation secrets. Do not use for normal app feature work, one-off manual scripts that will not recur, or external writes without explicit approval.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ravie:automation-sreThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Operate your VPS as your background automation runtime. Source-controlled, idempotent, observable, and safely rolled back.
Operate your VPS as your background automation runtime. Source-controlled, idempotent, observable, and safely rolled back.
Manage scheduled jobs, cron entries, systemd timers, and recurring automations on your VPS with: source in GitHub, runbooks in Notion, idempotent execution, proper logging, retries with limits, and explicit approval for production-impacting changes.
Reads from:
crontab, systemd units, job logs at /var/log/automations/, ~/automations/~/code/automations/)Writes to:
On the VPS:
~/code/
├── automations/ ← source-controlled (GitHub)
│ ├── README.md
│ ├── data-sync-job/
│ │ ├── job.ts
│ │ ├── runbook.md
│ │ └── README.md
│ ├── daily-brief/
│ │ ├── job.ts
│ │ └── runbook.md
│ ├── _shared/
│ │ ├── notion-client.ts
│ │ └── env.ts
│ └── package.json
└── ravie/ ← cloned for skill reference
/etc/systemd/system/ ← systemd units (root, source-controlled snapshot in GitHub)
~/.config/systemd/user/ ← user-scope services
/var/log/automations/ ← logs by job
~/.automation/state/ ← persistent state per job
~/.automation/secrets/ ← env files (chmod 600), gitignored
Write 1-2 sentences describing what the job does and why it exists. If you can't, the job isn't ready.
No automation should run from un-tracked code. Either:
~/code/automations/[job-name]/Commit before deploying.
OnCalendar= for systemd timersBefore scheduling, verify the job is idempotent:
If the job isn't idempotent, stop. Either make it idempotent, or accept it's a one-shot manual job, not a scheduled one.
.env files in ~/.automation/secrets/ (chmod 600)Environment= directives loaded from secret filesEach job should:
/var/log/automations/[job-name]/[job-name].logFor high-impact jobs:
For low-impact jobs:
daily-brief skillEvery job must be runnable manually:
cd ~/code/automations/data-sync-job
npx ts-node job.ts --once
or
systemctl --user start data-sync-job.service
If you can't trigger it manually, you can't debug it.
Every job must be disablable without losing data:
systemctl --user disable --now data-sync-job.timer
# or
crontab -e # comment out the line
Document the disable command in the runbook.
Only after this passes: enable the schedule.
In automations/[job-name]/runbook.md:
# Runbook: [job-name]
## Purpose
[What this does, why it exists]
## Schedule
[Cron / systemd schedule, with timezone]
## Source
- Code: [GitHub path]
- Service: [systemd unit name or cron entry]
## Run manually
[commands]
## Disable
[commands]
## Logs
- Path: `/var/log/automations/[name]/`
- Retention: [days]
## Secrets required
- ENV_VAR_1
- ENV_VAR_2
- (location: `~/.automation/secrets/[name].env`)
## Idempotency
[How the job handles duplicate runs]
## Failure modes
- [Common failure 1]: [how to detect, how to fix]
- [Common failure 2]: [how to detect, how to fix]
## Alerting
[Where notifications go on failure]
## Last reviewed
[Date]
Push runbook to GitHub. Mirror in Notion if it's a high-impact automation.
Track the work as a Linear issue:
After enabling, verify the first scheduled run actually fires:
# Automation SRE — [job-name]
## Purpose
[What this does]
## Source
- Repo: [GitHub path]
- Branch: [name]
## Schedule
- Cron / systemd: [expression]
- Timezone: [tz]
## Inputs / Outputs
- Reads from: [systems]
- Writes to: [systems]
- Mutates: [yes/no, what]
## Idempotency
- Status: idempotent / not idempotent / unknown
- Mechanism: [how it handles re-runs]
## Secrets
- [list of required env vars, no values]
- Storage: [location]
## Logging
- Path: [...]
- Retention: [days]
## Retry / alerts
- Retries: [count]
- Backoff: [strategy]
- Alerting: [channel]
## Manual run command
[commands]
## Disable command
[commands]
## Test result (manual)
- [Pass / fail, what was verified]
## Runbook
- GitHub: [path]
- Notion: [link if mirrored]
## Approvals needed
- [ ] Approve job for scheduled run
- [ ] Approve secrets configuration
- [ ] Approve alert channel
Tier 1 (read-only) — inspecting logs, jobs, runbooks: always allowed.
Tier 2 (draft) — drafting job code, runbook, schedule: always allowed.
Tier 3 — needs approval:
Tier 4 — needs explicit approval naming exact action:
Tier 5 — always blocked:
permission-guardian — for tier classificationgithub-operator — for source code commitsnotion-brain — for runbook publishinglinear-operator — for tracking automation workobservability-incident-loop — invoked when a job's failure is part of an incidentdaily-brief — surfaces automation statuspattern-learner — surfaces automations that fail repeatedlyUntracked drift — Job runs from a script that's not in GitHub. You change it, forget what changed, can't roll back. Always source-control.
Silent failures — Job fails but no alert. Days later, you notice. Always have alerting for high-impact jobs.
Non-idempotent scheduled jobs — Job does something twice when it shouldn't. Data corruption. Always verify idempotency before scheduling.
Hardcoded secrets — Token in the script. Repo gets compromised. Account compromised. Always env-manage.
No manual disable — Job is broken and you can't stop it. Always have a disable command in the runbook.
Schedule drift — Job runs in UTC but you assume local time. Output looks wrong. Always specify timezone.
Untested before scheduling — Job goes live without a manual test. First run fails. Always test manually first.
Stale runbook — Runbook says X, but the code does Y now. During weekly review, verify runbooks against code.
Permission creep — Job started read-only, now has write access "just in case." Audit periodically. Least privilege.
npx claudepluginhub amnafarzy/ravie --plugin ravieGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.