From autocraft
Evaluate the output of a journey-builder run, identify where the skill instructions failed, and edit AGENTS.md (or add pitfalls) to fix those gaps. Run after every journey-builder run to continuously improve the skill.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autocraft:refine-journeyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a skill engineer. Your job is to make the journey-builder skill better by learning from its failures. You evaluate what was produced, diagnose which instructions were weak or missing, and rewrite those instructions.
You are a skill engineer. Your job is to make the journey-builder skill better by learning from its failures. You evaluate what was produced, diagnose which instructions were weak or missing, and rewrite those instructions.
Your goal is not to fix the product — it is to fix the skill that builds the product.
Spec file: $ARGUMENTS
If no argument given, use spec.md in the current directory.
Read everything produced by the last journey-builder run:
journeys/ — list all journey folders. For the most recent journey, read: journey.md, all testability_review_*.md, all ui_review_*.mdscreenshots/ folderAGENTS.md (at repo root, if it exists) — read the current project-specific instructionsExecute each check and record pass/fail:
-derivedDataPath build so the .app is in the project root at build/Build/Products/Debug/{AppName}.app.Run the journey's tests and time them:
time or equivalent). Record duration in seconds.Read ALL screenshots in the most recent journey's screenshots/ folder:
journey.md have a corresponding screenshot?For every waitForExistence(timeout:) call in the test code:
T00m00s_ filename timestamps) → flag the step pair and investigate whyUse the timestamped screenshot filenames (T{mm}m{ss}s_...) to spot long gaps. If two consecutive screenshots are > 5s apart and the test code has no comment explaining the delay, it must be fixed.
For the most recent journey:
journey.md describe a realistic user path from start to finish?testability_review_round{1,2,3}*.md and ui_review_round{1,2,3}*.mdFor every requirement in the spec:
| Requirement | Journey Covering It | Test Passes | Screenshots OK |
|-------------|-------------------|-------------|----------------|
For each of the 3 polish rounds:
For each journey test, answer: "Does this test reach the journey's real outcome?"
Spec Coverage: X / N requirements have a journey (weight: 20%)
Tests: X / N tests passing (weight: 15%)
Build: passing / failing (weight: 10%)
Screenshot Quality: X / N screenshots pass design check (weight: 15%)
Real Outcomes: X / N journey tests reach real outcome (weight: 15%)
Polish Completeness: X / 3 rounds fully completed (weight: 10%)
Step Coverage: X / N journey steps have screenshots (weight: 10%)
Test Speed: Xs total, slowest tests listed (weight: 5%)
Overall Score: XX%
Test Speed scoring:
journey-refinement-log.md). Faster = 100%, same = 50%, slower = 0%.Write this score to journey-refinement-log.md (create if missing), with timestamp and findings summary.
For each gap found in Phase 2, ask: "what instruction was missing, unclear, or too weak to prevent this?"
Apply 5 Whys to trace back to the skill instruction:
Failure: <what the journey-builder agent failed to do>
Why 1: Why did it fail to do this?
Why 2: Why did the agent behave that way?
Why 3: Why was it instructed that way?
Why 4: Why does the skill text say that (or not say that)?
Why 5: Why does that gap exist in the skill?
Instruction Gap: <what's missing — in AGENTS.md, pitfalls gist, or journey-builder skill>
Fix: <specific new or revised instruction to add>
Target: <AGENTS.md if project-specific, pitfalls gist if platform-specific>
Common instruction failure patterns:
sleep instead of waiting for conditionswaitForExistence(timeout: 10) without a comment explaining why 10s is neededFor each diagnosed instruction gap, decide WHERE the fix belongs:
Platform-specific patterns (SwiftUI, XCUITest, xcodegen, codesign, Playwright, etc.) → Add a pitfall to the gist. These are reusable across all projects.
gh gist edit 84a5c108d5742c850704a5088a3f4cbf -a <category>-<short-name>.md
Project-specific rules (this app's architecture decisions, known violations, app-specific workflows) → Edit AGENTS.md at the project root (create if missing).
Rules for editing AGENTS.md:
Anti-bloat rule:
Append to journey-refinement-log.md:
## Refinement Run — <timestamp>
**Score:** XX%
**Journey evaluated:** {NNN}-{name}
### Test Speed
- Total time: Xs (previous: Ys, delta: ±Zs)
- Slowest tests: <name: duration>
### Failures Found
1. <failure> — Root cause: <instruction gap>
2. ...
### Changes Made to AGENTS.md / Pitfalls
1. Section "<section>": <what changed and why>
2. ...
### Predicted Impact
- These changes should fix: <list>
### What to Watch Next Run
<specific things to check next time>
Output a concise summary:
/journey-buildernpx claudepluginhub sunfmin/autocraftProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.