Skill

audit-kernel

This skill should be used when the user asks to "audit the kernel", "review kernel internals", "find concurrency bugs", "check for deadlocks", "find memory leaks", "audit scheduler", "review memory management", "stress test", "find races", "check lock ordering", "audit signal handling", "find improvements", "review kernel architecture", or wants to go beyond syscall testing to analyze StarryOS kernel internals for bugs, performance issues, and improvement opportunities.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/starry-harness:audit-kernel

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Systematic workflow for analyzing kernel internals beyond syscall correctness. Covers the scheduler, memory manager, concurrency primitives, signal delivery, filesystem, and process lifecycle — areas where there is no man page to compare against.

Supporting Files

references/concurrency-reproduction.mdreferences/kernel-audit-areas.mdreferences/verification-discipline.md

SKILL.md

150 lines · ~2.3k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitApr 19, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

StarryOS Kernel Internal Audit

Anti-Hallucination Discipline

Every finding MUST be grounded in verifiable evidence. Consult references/verification-discipline.md for the full protocol. The short version:

Verification Tiers (only report tier 1-5):

Tier	Evidence	Example
1	Executable — test produces different result	"SMP=1 passes, SMP=4 deadlocks"
2	Source-level proof — code visibly wrong	"Line 220 calls read_at in a write path"
3	Property violation — measurable invariant broken	"RSS grew 50MB after 1000 fork+exit cycles"
4	Differential — behavior changes with config	"Works with 128MB RAM, crashes at 64MB"
5	Linux source comparison — StarryOS diverges from Linux impl	"Linux's do_mremap handles X; StarryOS doesn't"

Never report tier 6-7 (pattern guesses, LLM reasoning) as findings. If a code-reading suspicion arises, write a test first to elevate it to tier 1-4 before reporting.

Workflow: Suspect → Test → Prove → Report

Unlike hunt-bugs (which starts from man page specs), kernel auditing starts from code reading and must escalate to executable evidence:

Select subsystem — Pick from the catalog in references/kernel-audit-areas.md
Read source — Understand the implementation, identify suspect patterns
Form hypothesis — "I suspect X could cause Y under condition Z"
Design test — Write a C test that creates condition Z and checks for Y
Validate test on Linux — Run on Docker Linux first. The test MUST pass. If it fails, the test is wrong.
Mutation check (optional but recommended) — Temporarily simulate the suspected bug in a toy program and verify the test catches it
Run on StarryOS — Execute via the QEMU pipeline
For concurrency bugs — Use ${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh with SMP sweeping
Classify evidence tier — Only report if tier 1-5
If proposing a fix → run the MANDATORY adaptive review pipeline (see evolve/references/review-pipeline.md): self-check → kernel-reviewer → independent re-derivation (for P0/P1) → regression check. Do NOT claim a fix is done until confidence is "high".
Report — Bug report + journal entry + update strategy.json reviews section

Test Correctness Protocol

Every test goes through this validation chain before its results are trusted:

Write test → Run on Linux (MUST pass) → Run on StarryOS → Compare
                  ↓ fails                       ↓ differs
            Fix the test                   Real bug found
            (test was wrong)               (report it)

If a test fails on Linux, the test has a bug — fix the test, do not report a kernel bug.

If a test passes on both Linux and StarryOS, the hypothesis was wrong — document it and move on. Not finding a bug is a valid result.

Mutation validation (for high-stakes findings): Write a small standalone program that simulates the suspected bug. Verify the test catches the simulated bug. This proves the test is actually capable of detecting the class of bug being looked for.

Concurrency Bug Reproduction

Concurrency bugs are non-deterministic. Use controlled amplification to make them manifest reliably. Full techniques in references/concurrency-reproduction.md. The key tools:

Runtime Lockdep (use first — deterministic)

StarryOS has a built-in lockdep in components/kspin/. Enable it by adding features = ["lockdep"] to the ax-kspin dependency in os/StarryOS/kernel/Cargo.toml. It catches AB/BA deadlocks, recursive acquisitions, and out-of-order unlocks on the first occurrence — no stress testing needed. See references/concurrency-reproduction.md section 0 for full details.

SMP Sweeping

bash ${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh <test_name> --runs 100 --smp 1,2,4

If SMP=1 passes and SMP=4 fails → concurrency bug confirmed (tier 1 evidence).

Repeat Amplification

Run the test 100+ times. Even a 1% failure rate proves the bug exists. The failure rate itself is the metric — report it as "fails N/100 runs at SMP=4."

Timeout Deadlock Detection

The stress-test script uses a configurable timeout. If QEMU doesn't exit within the timeout, the test deadlocked. Report the timeout count across runs.

Yield Injection (for targeted reproduction)

When a specific race is suspected between two code points:

Insert axtask::yield_now() at point A in the kernel source
This forces a context switch, widening the race window
Run the test — if the bug manifests, the race is confirmed
Remove the yield and document the race window

Memory Pressure

bash ${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh <test_name> --memory 128M

Reduced memory forces more page faults, OOM paths, and allocation failures.

Memtrack (deterministic memory leak detection)

StarryOS has built-in allocation tracking with DWARF backtraces. Enable the memtrack feature in os/StarryOS/kernel/Cargo.toml to get /dev/memtrack — read it before and after a workload to find leaked allocations with exact backtraces showing where they were allocated. See references/kernel-audit-areas.md for details.

Property-Based Tests

For kernel internals without specs, test properties — invariants that must always hold:

Property	Test Design	Detects
No memory leak	Enable memtrack, read /dev/memtrack before/after N fork+exit cycles	Memory leaks (with backtraces)
No zombie accumulation	Count processes before/after fork+wait cycles	Process leaks
Scheduler fairness	N threads each count iterations; ratio should be ~1:1	Starvation
Lock-free progress	Thread makes progress within bounded time	Deadlock/livelock
Signal delivery	Signal arrives within 1 scheduling period	Signal loss
COW correctness	Parent and child see correct data after fork+write	COW bugs
File data integrity	Write pattern → read back → compare	FS corruption

Kernel Subsystem Audit Areas

See references/kernel-audit-areas.md for the full catalog. Summary:

Subsystem	Key Source Paths	What to Audit
Scheduler	`components/axsched/`, `os/arceos/modules/axtask/`	Fairness, starvation, SMP load balance
Memory	`components/starry-vm/`, `os/arceos/modules/axmm/`	Leaks, COW, page fault handling, OOM
Concurrency	`components/kspin/`, `os/arceos/modules/axsync/`	Lock ordering, deadlock, atomicity
Signals	`components/starry-signal/`, `os/StarryOS/kernel/src/task/signal.rs`	Delivery races, masking, nested signals
Process	`components/starry-process/`, `os/StarryOS/kernel/src/task/`	Zombie leaks, orphan reparenting, exec races
Filesystem	`os/arceos/modules/axfs/`, `components/rsext4/`	Data integrity, concurrent access, crash safety
Networking	`os/arceos/modules/axnet/`, `components/starry-smoltcp/`	TCP state machine, buffer leaks, connection lifecycle

Agents

kernel-reviewer — Read-only source analysis, identifies suspect patterns (tier 5-6)
linux-comparator — Runs tests on both platforms, provides tier 1 evidence
bug-triager — Classifies findings into competition categories

Key Scripts

Script	Purpose
`${CLAUDE_PLUGIN_ROOT}/scripts/stress-test.sh`	Multi-run SMP-sweeping test runner with deadlock detection
`${CLAUDE_PLUGIN_ROOT}/scripts/linux-ref-test.sh`	Docker Linux test runner for test validation
`${CLAUDE_PLUGIN_ROOT}/scripts/journal-entry.sh`	Journal entry for findings

Additional Resources

Reference Files

references/verification-discipline.md — Full anti-hallucination protocol, tier definitions, test correctness chain
references/kernel-audit-areas.md — Detailed audit catalog for each kernel subsystem with specific code paths and what to look for
references/concurrency-reproduction.md — Techniques for reproducing races, deadlocks, and non-deterministic bugs

Before Finishing

Before presenting results to the user, self-check:

Every finding is tier 1-5. Any code-reading suspicion without a test is marked "pending hypothesis."
If a fix was proposed: the full review pipeline was run (self-check, kernel-reviewer, regression, re-derivation for P0/P1).
Results are recorded: known.json, journal, strategy.json all updated.
If anything is incomplete, finish it before responding.

audit-kernel

Invocation

Context Preview

Supporting Files

SKILL.md

audit-kernel

Invocation

Context Preview

Supporting Files

SKILL.md

StarryOS Kernel Internal Audit

Anti-Hallucination Discipline

Workflow: Suspect → Test → Prove → Report

Test Correctness Protocol

Concurrency Bug Reproduction

Runtime Lockdep (use first — deterministic)

SMP Sweeping

Repeat Amplification

Timeout Deadlock Detection

Yield Injection (for targeted reproduction)

Memory Pressure

Memtrack (deterministic memory leak detection)

Property-Based Tests

Kernel Subsystem Audit Areas

Agents

Key Scripts

Additional Resources

Reference Files

Before Finishing

Similar Skills

StarryOS Kernel Internal Audit

Anti-Hallucination Discipline

Workflow: Suspect → Test → Prove → Report

Test Correctness Protocol

Concurrency Bug Reproduction

Runtime Lockdep (use first — deterministic)

SMP Sweeping

Repeat Amplification

Timeout Deadlock Detection

Yield Injection (for targeted reproduction)

Memory Pressure

Memtrack (deterministic memory leak detection)

Property-Based Tests

Kernel Subsystem Audit Areas

Agents

Key Scripts

Additional Resources

Reference Files

Before Finishing

Similar Skills