From tapway-superpowers
Fully automated implementation loop: reads a written plan, validates it, executes every task using the TDD subagent pattern (Test Writer → Implementer per task), then runs simplify → review → PR without human intervention. Deploy mode available when running on a staging server: adds deployment, health check, and integration/E2E tests before opening the PR. Standard triggers: "implement it with autopilot", "autoship", "ship this", "execute the plan", "run the plan end to end", "implement end to end", "just ship it". Deploy mode triggers: "implement and deploy", "autoship with deployment", "ship and deploy", "full autoship", "deploy mode".
How this skill is triggered — by the user, by Claude, or both
Slash command
/tapway-superpowers:autoshipThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**When to invoke:** A plan already exists in `docs/plans/` and you want to execute it all the way to an open PR without manually running each step.
When to invoke: A plan already exists in docs/plans/ and you want to execute it all the way to an open PR without manually running each step.
Prerequisite: The plan must exist and be saved to
docs/plans/[feature].md. If it doesn't exist yet, run/brainstormingthen/planfirst.
| Mode | Trigger | When to use |
|---|---|---|
| Standard | "implement it with autopilot" / "autoship" | CI/CD handles deployment; Claude Code is on a developer machine |
| Deploy | "implement and deploy" / "autoship with deployment" | Claude Code is running directly on the staging server |
Standard mode:
Read plan → Health check → Self-assign → Worktree
└─ For each task:
Test Writer (RED) → RED gate → Implementer (GREEN+REFACTOR) → Reviews
└─ After all tasks:
/simplify → /review → Update docs → /pr
Deploy mode (everything above, plus):
└─ After all tasks:
/simplify → /review → Update docs
→ Detect deploy method → Deploy to staging
→ Health check loop → Integration/E2E tests
→ /pr (with deployment evidence in body)
The coordinator handles everything. You only intervene if:
/review finds Critical issues that require a design decision (not just a fix)Ask if not already provided:
Which plan should I execute? (e.g. docs/plans/user-auth.md)
Or describe the feature and I'll find the matching plan.
Read the plan file. Confirm it exists and has a task breakdown.
If deploy mode was triggered, also detect the deployment configuration now (see Phase 4D below) and confirm it before starting the task loop — better to discover a misconfigured deploy command before 30 minutes of implementation.
Review every task in the plan against these criteria. Fix problems now — they are much cheaper to fix in the plan than mid-implementation.
For each task, verify:
FILES TO MODIFY lists exact paths (no "the service file" — must be backend/src/services/auth_service.py)If any task fails the health check:
docs/plans/[feature].mdDo not proceed to Phase 2 until every task passes the health check.
# Confirm clean state
git status # must be clean
# Create worktree (if not already in one)
git worktree add -b feat/[feature-name] ../[project]-[feature-name] origin/main
cd ../[project]-[feature-name]
# Self-assign in checklist
# Edit docs/checklists/[feature]-checklist.md:
# **Assignee:** autoship 🤖 **Status:** 🟡 In progress
git add docs/checklists/ && git commit -m "chore: self-assign [feature] for autoship"
Maintain a running status board and update it after every task:
## Autoship Status: [Feature]
| Task | Status | Commit |
|---|---|---|
| Task 1: ... | ✅ Done | abc1234 |
| Task 2: ... | 🔄 In progress | — |
| Task 3: ... | ⏳ Pending | — |
For each task:
You are the Test Writer for Task N of [Feature] — RED phase only.
TASK: [exact task description]
TEST FILE: [exact path — e.g. backend/tests/unit/test_auth_service.py]
DESIRED BEHAVIOR: [one sentence]
Write ONE test: test_[function]_[condition]_[expected_outcome]
Run it. Paste exact output.
SUCCESS CRITERIA: FAILS with AssertionError, ImportError, AttributeError,
or TypeScript compilation error — NOT a syntax error.
STOP. Do not write production code.
SURGICAL CHANGES: Touch only the test file.
test_[function]_[condition]_[expected_outcome]Gate fails → retry once with specific feedback. Fails again → pause and report to user.
You are the Implementer for Task N of [Feature] — GREEN then REFACTOR.
Failing test at [test file path]. Do NOT modify it.
TASK: [exact task description]
FILES TO MODIFY: [production files only]
SUCCESS CRITERIA: [test name] passes. No new failures in full suite.
GREEN: minimum code to make test pass — no gold-plating.
REFACTOR: only if code is unclear — run tests after every step.
SURGICAL CHANGES: production files only.
CONVENTIONS: [key items from CLAUDE.md]
Commit: git commit -m "feat/fix/refactor: [behavior]"
Report: test output, files changed, commit hash.
Spec compliance:
pass, TODO, NotImplementedError)Code quality: invoke code-review skill
Both pass → mark task ✅ in status board. Proceed to next task.
Once all tasks are ✅:
Step 1 — Simplify
/simplify
Apply all suggestions. Run tests to confirm nothing broke.
Step 2 — Self-Review
/review
If Critical fixes are significant enough to require a new task, add it to the status board and execute it using Phase 3 before continuing.
Step 3 — Update Docs
Run /repo-docs — mandatory on every autoship run, no exceptions.
/repo-docs
/repo-docs updates only the sections affected by this branch's changes if docs already exist, or generates all docs from scratch if this is the first time. Commit any changes it makes:
git add docs/
git commit -m "docs: update project docs for [feature]"
Skip this phase entirely in standard mode. Only run when deploy mode was triggered.
Check for these in order and use the first match:
| Signal | Deploy command |
|---|---|
docker-compose.yml or docker-compose.yaml | docker compose pull && docker compose up -d |
Makefile with a deploy target | make deploy |
package.json with a "deploy" script | npm run deploy |
Procfile | foreman start or platform-specific command |
| None found | Ask the user for the exact deploy command before continuing |
Record the deploy command. If it was inferred (not explicitly provided by the user), confirm it once before running:
Deploy command detected: docker compose pull && docker compose up -d
Proceeding — interrupt if this is wrong.
Run the deploy command. Capture stdout and stderr.
# Example for docker compose
docker compose pull && docker compose up -d
docker compose logs --tail=30
If the command exits non-zero: stop, report the exact error output, do not continue to health check or PR.
After deploying, confirm the app is responding. Try each endpoint in order until one returns 2xx:
curl -sf http://localhost:[PORT]/health ||
curl -sf http://localhost:[PORT]/api/health ||
curl -sf http://localhost:[PORT]/healthz ||
curl -sf http://localhost:[PORT]/ping ||
curl -sf http://localhost:[PORT]/
Retry every 5 seconds for up to 60 seconds (12 attempts). Record which endpoint responded and the HTTP status.
If still failing after 60 seconds:
docker compose logs --tail=50 (or equivalent) and report the outputRollback suggestion on health check failure:
# Docker compose — restart with previous image
docker compose down && git stash && docker compose up -d
# Systemd service
sudo systemctl restart [service-name]
# PM2
pm2 restart all
Detect which integration/E2E test suites exist and run all of them:
| Signal | Command |
|---|---|
tests/integration/ directory | pytest tests/integration/ --tb=short -q |
tests/e2e/ directory | pytest tests/e2e/ --tb=short -q |
pytest.ini or pyproject.toml with integration marker | pytest -m integration --tb=short -q |
playwright.config.ts or playwright.config.js | npx playwright test |
cypress.config.ts or cypress.config.js | npx cypress run |
package.json with "test:e2e" script | npm run test:e2e |
package.json with "test:integration" script | npm run test:integration |
Makefile with test-integration target | make test-integration |
Run every suite that exists. Report pass/fail counts for each.
If any integration or E2E test fails:
Capture results for the PR body:
## Deployment Verified ✅
- Environment: staging (hostname: [hostname])
- Deployed at: [timestamp]
- Deploy command: [command used]
- Health check: ✅ [endpoint] → [HTTP status]
- Unit tests: [N/N passed]
- Integration tests: [N/N passed] (or "none found")
- E2E tests: [N/N passed] (or "none found")
/pr
Standard mode — PR body includes:
docs/plans/[feature].md and docs/checklists/[feature]-checklist.md/reviewDeploy mode — PR body also includes:
After PR is open:
Update checklist: all tasks ✅, status 🟢 Done, **PR:** #[number]
Commit checklist update to the branch:
git add docs/checklists/
git commit -m "chore: mark [feature] complete, PR #[number]"
git push
Report to user:
Standard mode:
## Autoship Complete ✅
Feature: [name]
Tasks completed: N/N
PR: #[number] — [title]
Review findings: [none / N warnings noted in PR body]
Deploy mode:
## Autoship Complete ✅ (with deployment)
Feature: [name]
Tasks completed: N/N
Deployed to: staging ([hostname])
Health: ✅ [endpoint]
Integration tests: N/N passed
E2E tests: N/N passed
PR: #[number] — [title]
Review findings: [none / N warnings noted in PR body]
| Phase | Model |
|---|---|
| Plan health check | Sonnet — requires judgment about ambiguity |
| Test Writer | Haiku — formulaic |
| Implementer (1-2 files) | Haiku |
| Implementer (multi-file) | Sonnet |
| Implementer (architecture) | Opus |
| Simplify + Review | Sonnet |
| Deploy diagnosis (on failure) | Sonnet — log analysis requires judgment |
/prmain — always from a worktree/pr| What failed | Recovery |
|---|---|
| Task fails twice | Pause, report exact blocker to user, wait for instruction |
| Deploy command exits non-zero | Report exact error, suggest checking logs, do not open PR |
| Health check times out (60s) | Print last 50 log lines, suggest rollback command, do not open PR |
| Integration test fails | Diagnose real regression vs env issue; fix before PR if real |
| E2E test fails | Same as integration — fix or document as known limitation with user approval |
/autopilot Instead| Situation | Use |
|---|---|
Plan already written in docs/plans/ | /autoship (this skill) |
| No plan yet, want everything from scratch | Built-in /autopilot |
| Single small task, no plan needed | /tdd directly |
| Complex plan, want adversarial plan critique before implementation | Built-in /autopilot to generate + critique the plan, then /autoship to execute it |
Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub tapway/tapway-superpowers --plugin tapway-superpowers