musts

AI agents are fast at editing code. They are less reliable at knowing when
verification is actually finished.
musts gives your repository a local, enforceable definition of done:
The task is not done until musts validate is empty.
Instead of hoping the agent remembers every build, test, UI check, and
architecture rule, you declare those checks next to the code they protect.
When files change, musts validate reports the exact validation tasks still
pending. The agent runs them, records evidence, and repeats until the report
is clean.
Get Started
1. Install the CLI
# Homebrew (macOS / Linux)
brew install bitomule/tap/musts
# Cargo (from crates.io)
cargo install musts --locked
# Precompiled binaries
cargo binstall musts # or download directly from GitHub Releases
2. Create your first MUSTS.yml
Put a MUSTS.yml at the root of your repo:
checks:
test:
uses: cargo/test
Now ask what still needs to be validated:
musts validate
If code covered by that manifest has changed, musts returns a concrete task
for the agent to complete. After running the requested command, the agent
records evidence:
cargo test --workspace 2>&1 | tee /tmp/musts-cargo-test.log
musts evidence cargo-test-root \
--text "cargo test --workspace passed" \
--asset /tmp/musts-cargo-test.log
musts validate
When musts validate is empty, the repo has fresh evidence for the current
workspace state.
3. Tell your agent to obey the loop
For Claude Code, install the plugin. It bundles the musts skill and a
Stop hook that runs musts validate whenever Claude tries to finish a turn:
/plugin marketplace add bitomule/musts
/plugin install musts@musts
See docs/claude-code-plugin.md for install,
update, uninstall, and private-fork details.
For other agents, add the rule to your AGENTS.md, CLAUDE.md, or equivalent
repo instructions:
Before declaring a code change done, run `musts validate`.
Treat every reported task as required. Run the task, capture evidence outside
the workspace, submit it with `musts evidence`, and repeat until
`musts validate` is empty.
The CLI is agent-agnostic. Anything that can run shell commands can participate
in the loop.
What You Can Encode
musts is not limited to "run the test suite". A check can represent any
validation rule your repo needs before an agent is allowed to stop.
Build and test checks
checks:
fmt:
uses: cargo/fmt
clippy:
uses: cargo/clippy
test:
uses: cargo/test
Targeted build checks
checks:
app-build:
uses: bazel/build
with:
target: //App:App
Product or architecture contracts
Use the built-in agent capability when the validation is a judgement call
that needs a human-readable answer rather than a command exit code:
checks:
usecase-shape:
uses: agent
paths:
- "Sources/App/UseCases/**"
with:
facts:
- "Every use case has exactly one public entry point."
- "The entry point name describes the user action, not implementation detail."
- "No use case reaches across module boundaries except through declared ports."
When a matching use case changes, musts validate asks the agent to verify
those facts and submit a text explanation. That makes repo-specific rules
visible, repeatable, and hard to forget.
UI and device checks
musts can also gate flows that need screenshots, videos, JSON reports, or
other assets. Built-in and third-party capabilities decide what evidence they
need; the agent should follow the evidence: and submit: lines in the
musts validate report.
How The Loop Works