From starry-harness
This skill should be used when the user asks to "audit the kernel", "review kernel internals", "find concurrency bugs", "check for deadlocks", "find memory leaks", "audit scheduler", "review memory management", "stress test", "find races", "check lock ordering", "audit signal handling", "find improvements", "review kernel architecture", or wants to go beyond syscall testing to analyze StarryOS kernel internals for bugs, performance issues, and improvement opportunities.
How this skill is triggered — by the user, by Claude, or both
Slash command
/starry-harness:audit-kernelThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematic workflow for analyzing kernel internals beyond syscall correctness. Covers the scheduler, memory manager, concurrency primitives, signal delivery, filesystem, and process lifecycle — areas where there is no man page to compare against.
Systematic workflow for analyzing kernel internals beyond syscall correctness. Covers the scheduler, memory manager, concurrency primitives, signal delivery, filesystem, and process lifecycle — areas where there is no man page to compare against.
Every finding MUST be grounded in verifiable evidence. Consult references/verification-discipline.md for the full protocol. The short version:
Verification Tiers (only report tier 1-5):
| Tier | Evidence | Example |
|---|---|---|
| 1 | Executable — test produces different result | "SMP=1 passes, SMP=4 deadlocks" |
| 2 | Source-level proof — code visibly wrong | "Line 220 calls read_at in a write path" |
| 3 | Property violation — measurable invariant broken | "RSS grew 50MB after 1000 fork+exit cycles" |
| 4 | Differential — behavior changes with config | "Works with 128MB RAM, crashes at 64MB" |
| 5 | Linux source comparison — StarryOS diverges from Linux impl | "Linux's do_mremap handles X; StarryOS doesn't" |
Never report tier 6-7 (pattern guesses, LLM reasoning) as findings. If a code-reading suspicion arises, write a test first to elevate it to tier 1-4 before reporting.
Unlike hunt-bugs (which starts from man page specs), kernel auditing starts from code reading and must escalate to executable evidence:
references/kernel-audit-areas.md${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh with SMP sweepingevolve/references/review-pipeline.md): self-check → kernel-reviewer → independent re-derivation (for P0/P1) → regression check. Do NOT claim a fix is done until confidence is "high".Every test goes through this validation chain before its results are trusted:
Write test → Run on Linux (MUST pass) → Run on StarryOS → Compare
↓ fails ↓ differs
Fix the test Real bug found
(test was wrong) (report it)
If a test fails on Linux, the test has a bug — fix the test, do not report a kernel bug.
If a test passes on both Linux and StarryOS, the hypothesis was wrong — document it and move on. Not finding a bug is a valid result.
Mutation validation (for high-stakes findings): Write a small standalone program that simulates the suspected bug. Verify the test catches the simulated bug. This proves the test is actually capable of detecting the class of bug being looked for.
Concurrency bugs are non-deterministic. Use controlled amplification to make them manifest reliably. Full techniques in references/concurrency-reproduction.md. The key tools:
StarryOS has a built-in lockdep in components/kspin/. Enable it by adding features = ["lockdep"] to the ax-kspin dependency in os/StarryOS/kernel/Cargo.toml. It catches AB/BA deadlocks, recursive acquisitions, and out-of-order unlocks on the first occurrence — no stress testing needed. See references/concurrency-reproduction.md section 0 for full details.
bash ${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh <test_name> --runs 100 --smp 1,2,4
If SMP=1 passes and SMP=4 fails → concurrency bug confirmed (tier 1 evidence).
Run the test 100+ times. Even a 1% failure rate proves the bug exists. The failure rate itself is the metric — report it as "fails N/100 runs at SMP=4."
The stress-test script uses a configurable timeout. If QEMU doesn't exit within the timeout, the test deadlocked. Report the timeout count across runs.
When a specific race is suspected between two code points:
axtask::yield_now() at point A in the kernel sourcebash ${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh <test_name> --memory 128M
Reduced memory forces more page faults, OOM paths, and allocation failures.
StarryOS has built-in allocation tracking with DWARF backtraces. Enable the memtrack feature in os/StarryOS/kernel/Cargo.toml to get /dev/memtrack — read it before and after a workload to find leaked allocations with exact backtraces showing where they were allocated. See references/kernel-audit-areas.md for details.
For kernel internals without specs, test properties — invariants that must always hold:
| Property | Test Design | Detects |
|---|---|---|
| No memory leak | Enable memtrack, read /dev/memtrack before/after N fork+exit cycles | Memory leaks (with backtraces) |
| No zombie accumulation | Count processes before/after fork+wait cycles | Process leaks |
| Scheduler fairness | N threads each count iterations; ratio should be ~1:1 | Starvation |
| Lock-free progress | Thread makes progress within bounded time | Deadlock/livelock |
| Signal delivery | Signal arrives within 1 scheduling period | Signal loss |
| COW correctness | Parent and child see correct data after fork+write | COW bugs |
| File data integrity | Write pattern → read back → compare | FS corruption |
See references/kernel-audit-areas.md for the full catalog. Summary:
| Subsystem | Key Source Paths | What to Audit |
|---|---|---|
| Scheduler | components/axsched/, os/arceos/modules/axtask/ | Fairness, starvation, SMP load balance |
| Memory | components/starry-vm/, os/arceos/modules/axmm/ | Leaks, COW, page fault handling, OOM |
| Concurrency | components/kspin/, os/arceos/modules/axsync/ | Lock ordering, deadlock, atomicity |
| Signals | components/starry-signal/, os/StarryOS/kernel/src/task/signal.rs | Delivery races, masking, nested signals |
| Process | components/starry-process/, os/StarryOS/kernel/src/task/ | Zombie leaks, orphan reparenting, exec races |
| Filesystem | os/arceos/modules/axfs/, components/rsext4/ | Data integrity, concurrent access, crash safety |
| Networking | os/arceos/modules/axnet/, components/starry-smoltcp/ | TCP state machine, buffer leaks, connection lifecycle |
| Script | Purpose |
|---|---|
${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh | Multi-run SMP-sweeping test runner with deadlock detection |
${CLAUDE_PLUGIN_ROOT}/scripts/linux-ref-test.sh | Docker Linux test runner for test validation |
${CLAUDE_PLUGIN_ROOT}/scripts/journal-entry.sh | Journal entry for findings |
references/verification-discipline.md — Full anti-hallucination protocol, tier definitions, test correctness chainreferences/kernel-audit-areas.md — Detailed audit catalog for each kernel subsystem with specific code paths and what to look forreferences/concurrency-reproduction.md — Techniques for reproducing races, deadlocks, and non-deterministic bugsBefore presenting results to the user, self-check:
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub josephjoshua/starry-harness --plugin starry-harness