From starry-harness
This skill should be used when the user asks to "find bugs in StarryOS", "hunt bugs", "test syscalls", "discover vulnerabilities", "test starry", "fix syscall", "compare with Linux", "run syscall test", "check Linux compatibility", or wants to systematically discover, test, and fix StarryOS kernel bugs using Linux comparison testing. Supersedes the older test-starry skill.
How this skill is triggered — by the user, by Claude, or both
Slash command
/starry-harness:hunt-bugsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematic workflow for discovering, testing, and fixing bugs in the StarryOS kernel by comparing behavior against real Linux. This is the core engineering loop of the starry-harness plugin.
Systematic workflow for discovering, testing, and fixing bugs in the StarryOS kernel by comparing behavior against real Linux. This is the core engineering loop of the starry-harness plugin.
Six-phase cycle: Discover → Test → Compare → Analyze → Fix → Report. Each phase produces artifacts that feed the next, building a growing knowledge base.
Identify candidate syscalls to test by scanning for suspicious patterns in kernel source.
Automated pattern scan — Search os/StarryOS/kernel/src/syscall/ for:
Ok(0) or Err(LinuxError::ENOSYS) without real logic_ => {} or _ => Ok(0) catch-allsMan page cross-reference — For each suspect syscall:
bash ${CLAUDE_PLUGIN_ROOT}/scripts/man-lookup.sh <syscall>Check the registry — Read os/StarryOS/tests/known.json to skip already-tested syscalls. Focus on fresh targets or known-buggy syscalls that haven't been fixed yet.
Prioritize targets by:
references/workflow.mdGenerate a C test case using the starry_test.h harness.
Test case location: os/StarryOS/tests/cases/test_<syscall>.c
Structure:
#include "starry_test.h"
#include <sys/...> // relevant POSIX headers
TEST_BEGIN("syscall_name")
TEST("normal_operation") {
// Happy path from man page
EXPECT_OK(result);
} TEND
TEST("error_EINVAL") {
// Invalid arguments per man page
EXPECT_ERRNO(result, -1, EINVAL);
} TEND
TEST("edge_case_from_manpage") {
// Specific edge case documented in man page
} TEND
TEST_END
Rules for good tests:
Run the test on Linux FIRST, then StarryOS. Linux must pass before StarryOS results are trusted. If the test fails on Linux, the test itself is buggy — fix the test, do not proceed to StarryOS.
Step 1 — Linux baseline (MANDATORY FIRST):
bash ${CLAUDE_PLUGIN_ROOT}/scripts/linux-ref-test.sh os/StarryOS/tests/cases/test_<name>.c /tmp/linux-ref.txt
Inspect the output. Every test must PASS. If any FAIL → the test has a bug (wrong assertion, wrong ABI, wrong expected value). Fix it before continuing.
Step 2 — StarryOS (only after Linux passes):
bash ${CLAUDE_PLUGIN_ROOT}/scripts/pipeline.sh <name> --arch riscv64
Step 3 — Compare:
diff /tmp/linux-ref.txt os/StarryOS/tests/results/test_<name>.txt
Why Linux-first matters: A test that passes on both Linux and StarryOS might have a bug that matches a StarryOS bug (e.g., reading 5 syscall args when Linux expects 6). Running on Linux first catches test bugs before they produce false negatives.
ABI cross-check: Before writing tests for a syscall, run the ABI checker to verify StarryOS reads the correct number of arguments:
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/abi-check.py
If the target syscall has an ABI mismatch, fix the arg count in the kernel dispatch before writing tests.
Man page vs kernel ABI warning: Man pages document the C library API, not the raw kernel ABI. For syscalls with complex argument passing (preadv2, pwritev2, mmap on 32-bit, etc.), check the Linux kernel source (SYSCALL_DEFINE macros) or musl source (src/*/) for the actual ABI. The harness's man-lookup.sh is a starting point, not the final authority on argument layout.
For each test that diverges from Linux:
os/StarryOS/kernel/src/syscall/ — find the relevant file and functionmod.rs through the handleros/StarryOS/tests/known.json with findingsImplement the fix, then run it through the adaptive review pipeline. Do NOT skip any step. Do NOT report a fix as "done" until the pipeline converges. See evolve/references/review-pipeline.md for the full protocol.
Minimum rounds (always, non-negotiable):
cargo xtask clippy --package starry-kernel and cargo fmtAdditional rounds for P0/P1 bugs:
8. Independent re-derivation: Dispatch a separate agent (or Codex if available) with ONLY the bug description + man page (NOT the proposed fix). Compare the independently-derived fix against the proposed one.
9. If fixes disagree → dispatch a reconciliation agent to synthesize, then re-review
10. Record review rounds in strategy.json reviews section with confidence level
Only report the fix after:
Generate structured artifacts for every bug found and fixed.
docs/starry-reports/bugs/BUG-NNN-<syscall>.md using template from references/workflow.mdbash ${CLAUDE_PLUGIN_ROOT}/scripts/journal-entry.sh BUG "<title>" "<body>"fixed, buggy, broken, or stub| Resource | Path |
|---|---|
| Syscall handlers | os/StarryOS/kernel/src/syscall/ |
| Test harness header | os/StarryOS/tests/cases/starry_test.h |
| Test sources | os/StarryOS/tests/cases/test_*.c |
| Test results | os/StarryOS/tests/results/ |
| Known bugs registry | os/StarryOS/tests/known.json |
| Pipeline | ${CLAUDE_PLUGIN_ROOT}/scripts/pipeline.sh --arch <arch> |
| Bug reports | docs/starry-reports/bugs/ |
| Work journal | docs/starry-reports/journal.md |
references/workflow.md — Detailed phase procedures, bug report template, known.json schemareferences/syscall-patterns.md — Common bug patterns in syscall implementations with examples from this codebaseBefore presenting results to the user, self-check:
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub josephjoshua/starry-harness --plugin starry-harness