How this skill is triggered — by the user, by Claude, or both
Slash command
/sindri:sindriThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Communicate with the user in their language. Detect from their messages or system locale.
Communicate with the user in their language. Detect from their messages or system locale.
Parse the first argument after /sindri:
| Input | Action |
|---|---|
init | Run Init flow below |
loop | Run Loop flow below |
cycle | Run Cycle flow below |
| (none) | Show available commands and help user pick |
If no argument, tell the user:
/sindri init— Set up improvement loop (explore project, design evaluate, scaffold .sindri/)/sindri loop— Start continuous experiment loop/sindri cycle— Run one experiment cycle
Set up an autonomous improvement loop in the current project. The most critical output is a well-designed evaluate function.
Understand the project before asking anything. Use Bash, Glob, Grep, Read — not AskUserQuestion.
Build a mental model of:
After exploration, summarize what you found and suggest potential improvement areas. Let the user pick or refine the goal before proceeding.
Run sindri init, then update .sindri/config.yaml with what you learned:
artifact — detected source directory or filesrun — detected from package.json, Makefile, etc.timeout — estimated from project complexityShow the config to the user for confirmation.
This is the most important step. A bad metric makes the entire loop useless.
Use AskUserQuestion to have a deep conversation. Ask informed questions based on Phase 1.
Start with the goal:
Determine measurability:
Can you measure the result as a number?
├── Yes: What number? (ms, %, count, KB...)
│ ├── Available immediately? → Direct measurement
│ └── Takes days/weeks? → Find a leading indicator
│ "What's the earliest signal that tells you it's working?"
│
└── No: What does "better" mean to you?
└── "If you were reviewing this yourself, what would you check?"
→ Decompose into T/F checklist (5-10 yes/no items)
→ NEVER use numeric scores (1-10). LLM scores drift.
If hybrid: ask how much weight each dimension deserves.
Write the function. Patterns:
// Direct measurement
export function evaluate(): number {
const ms = benchmark()
return 1000 / ms
}
// T/F checklist (for subjective criteria)
export async function evaluate(): Promise<number> {
const checks = await llm.checkAll(output, checklist)
return checks.filter(Boolean).length / checks.length
}
// Hybrid
export async function evaluate(): Promise<number> {
return measure() * 0.6 + await checklist() * 0.4
}
Show the user and get explicit confirmation before moving on.
If the metric requires time to accumulate (ad CTR, A/B test, SEO), recommend a cycle interval.
Ask: "How long does it take for meaningful data to come in after a change?"
Set schedule in config.yaml (seconds between cycles):
Tell the user to use /sindri cycle periodically instead of /sindri loop.
Fill in the Domain Context section of .sindri/agents.md:
Tell the user:
sindri is ready. Start the loop:
/sindri loop(continuous) or/sindri cycle(one at a time)
Start or resume the continuous experiment loop.
.sindri/ exist? If not, run /sindri init first..sindri/evaluate.ts contain real logic (not just return 0)? If not, tell the user..sindri/agents.md — this is your complete operating manualRun exactly ONE experiment cycle and stop.
For delayed feedback domains where data needs time to accumulate between cycles.
.sindri/ exist? If not, run /sindri init first..sindri/evaluate.ts contain real logic? If not, tell the user..sindri/agents.md.sindri/results/<branch>.jsonl for historynpx claudepluginhub taehyeon-kim/sindri --plugin sindriRuns Karpathy-inspired autonomous iteration loops on any task: modify, verify, keep/discard, repeat. Subcommands for planning, debugging, fixing, security audits, shipping.
Runs an autonomous improvement loop: modify code, measure one metric, keep or discard changes, repeat. Use for overnight optimization against a quantified goal (coverage, bundle size, etc.).
Runs autonomous experiment loops to iteratively optimize measurable metrics like code performance, ML loss, build size via git branches, code changes, verify commands, and guards.