Skill

hunt-bugs

This skill should be used when the user asks to "find bugs in StarryOS", "hunt bugs", "test syscalls", "discover vulnerabilities", "test starry", "fix syscall", "compare with Linux", "run syscall test", "check Linux compatibility", or wants to systematically discover, test, and fix StarryOS kernel bugs using Linux comparison testing. Supersedes the older test-starry skill.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/starry-harness:hunt-bugs

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Systematic workflow for discovering, testing, and fixing bugs in the StarryOS kernel by comparing behavior against real Linux. This is the core engineering loop of the starry-harness plugin.

Supporting Files

references/syscall-patterns.mdreferences/workflow.md

SKILL.md

184 lines · ~2.3k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitApr 19, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

StarryOS Bug Hunting Harness

Systematic workflow for discovering, testing, and fixing bugs in the StarryOS kernel by comparing behavior against real Linux. This is the core engineering loop of the starry-harness plugin.

Workflow Overview

Six-phase cycle: Discover → Test → Compare → Analyze → Fix → Report. Each phase produces artifacts that feed the next, building a growing knowledge base.

Phase 1: Discovery

Identify candidate syscalls to test by scanning for suspicious patterns in kernel source.

Automated pattern scan — Search os/StarryOS/kernel/src/syscall/ for:

Stubs: functions returning Ok(0) or Err(LinuxError::ENOSYS) without real logic
Copy-paste: adjacent handlers with near-identical structure (preadv/pwritev, etc.)
TODO/FIXME/HACK comments indicating incomplete implementations
Missing flag handling: match arms with _ => {} or _ => Ok(0) catch-alls
Ignored arguments: function parameters that are never read

Man page cross-reference — For each suspect syscall:

Fetch the man page: bash ${CLAUDE_PLUGIN_ROOT}/scripts/man-lookup.sh <syscall>
Compare documented behavior against the implementation
List specific requirements the kernel does NOT implement

Check the registry — Read os/StarryOS/tests/known.json to skip already-tested syscalls. Focus on fresh targets or known-buggy syscalls that haven't been fixed yet.

Prioritize targets by:

Used by target applications (Nginx, Python, etc.) — check references/workflow.md
Likely severity (data corruption > wrong errno > missing feature)
Fix difficulty (quick wins first to build momentum)

Phase 2: Test Generation

Generate a C test case using the starry_test.h harness.

Test case location: os/StarryOS/tests/cases/test_<syscall>.c

Structure:

#include "starry_test.h"
#include <sys/...>  // relevant POSIX headers

TEST_BEGIN("syscall_name")

TEST("normal_operation") {
    // Happy path from man page
    EXPECT_OK(result);
} TEND

TEST("error_EINVAL") {
    // Invalid arguments per man page
    EXPECT_ERRNO(result, -1, EINVAL);
} TEND

TEST("edge_case_from_manpage") {
    // Specific edge case documented in man page
} TEND

TEST_END

Rules for good tests:

One TEST block per distinct behavior from the man page
Test both success and every documented error code
Test flag combinations (e.g., MAP_PRIVATE|MAP_ANONYMOUS)
Test boundary values (0, -1, SIZE_MAX, page-unaligned addresses)
Name tests descriptively so PASS/FAIL output is self-documenting

Phase 3: Linux Comparison

Run the test on Linux FIRST, then StarryOS. Linux must pass before StarryOS results are trusted. If the test fails on Linux, the test itself is buggy — fix the test, do not proceed to StarryOS.

Step 1 — Linux baseline (MANDATORY FIRST):

bash ${CLAUDE_PLUGIN_ROOT}/scripts/linux-ref-test.sh os/StarryOS/tests/cases/test_<name>.c /tmp/linux-ref.txt

Inspect the output. Every test must PASS. If any FAIL → the test has a bug (wrong assertion, wrong ABI, wrong expected value). Fix it before continuing.

Step 2 — StarryOS (only after Linux passes):

bash ${CLAUDE_PLUGIN_ROOT}/scripts/pipeline.sh <name> --arch riscv64

Step 3 — Compare:

diff /tmp/linux-ref.txt os/StarryOS/tests/results/test_<name>.txt

Why Linux-first matters: A test that passes on both Linux and StarryOS might have a bug that matches a StarryOS bug (e.g., reading 5 syscall args when Linux expects 6). Running on Linux first catches test bugs before they produce false negatives.

ABI cross-check: Before writing tests for a syscall, run the ABI checker to verify StarryOS reads the correct number of arguments:

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/abi-check.py

If the target syscall has an ABI mismatch, fix the arg count in the kernel dispatch before writing tests.

Man page vs kernel ABI warning: Man pages document the C library API, not the raw kernel ABI. For syscalls with complex argument passing (preadv2, pwritev2, mmap on 32-bit, etc.), check the Linux kernel source (SYSCALL_DEFINE macros) or musl source (src/*/) for the actual ABI. The harness's man-lookup.sh is a starting point, not the final authority on argument layout.

Phase 4: Root Cause Analysis

For each test that diverges from Linux:

Locate the handler: os/StarryOS/kernel/src/syscall/ — find the relevant file and function
Read the code path: Trace from the syscall dispatch in mod.rs through the handler
Identify the divergence: Compare the code logic against the man page requirement
Classify the bug — dispatch the bug-triager agent to categorize it:
- Concurrency, Memory, Safety, Semantic, or Correctness
Record in known.json: Update os/StarryOS/tests/known.json with findings

Phase 5: Fix (MANDATORY review pipeline)

Implement the fix, then run it through the adaptive review pipeline. Do NOT skip any step. Do NOT report a fix as "done" until the pipeline converges. See evolve/references/review-pipeline.md for the full protocol.

Minimum rounds (always, non-negotiable):

Write the fix in the relevant kernel source file
Self-check: Re-read the fix against the man page and the test output. Does it address the root cause?
Dispatch kernel-reviewer agent (fresh context, no anchoring) to verify:
- Proper Rust idioms, code reuse, safety, API consistency
If kernel-reviewer finds critical issues → revise the fix, restart from step 2
Re-run the test via the StarryOS pipeline to verify the fix
Re-run Linux comparison to confirm behavior now matches
Run regression: cargo xtask clippy --package starry-kernel and cargo fmt

Additional rounds for P0/P1 bugs: 8. Independent re-derivation: Dispatch a separate agent (or Codex if available) with ONLY the bug description + man page (NOT the proposed fix). Compare the independently-derived fix against the proposed one. 9. If fixes disagree → dispatch a reconciliation agent to synthesize, then re-review 10. Record review rounds in strategy.json reviews section with confidence level

Only report the fix after:

All minimum rounds pass (steps 1-7)
For P0/P1: at least one independent re-derivation (step 8)
Confidence is "high" (all rounds agree, 0 regressions)
If confidence is "medium" or "low" → flag for human review, do NOT claim fixed

Phase 6: Report

Generate structured artifacts for every bug found and fixed.

Bug report: Write to docs/starry-reports/bugs/BUG-NNN-<syscall>.md using template from references/workflow.md
Journal entry: Run bash ${CLAUDE_PLUGIN_ROOT}/scripts/journal-entry.sh BUG "<title>" "<body>"
Update known.json: Mark syscall status as fixed, buggy, broken, or stub
Update strategy.json: Record review rounds and confidence in the reviews section
Update triage: If multiple bugs found, dispatch bug-triager agent for re-prioritization

Key File Locations

Resource	Path
Syscall handlers	`os/StarryOS/kernel/src/syscall/`
Test harness header	`os/StarryOS/tests/cases/starry_test.h`
Test sources	`os/StarryOS/tests/cases/test_*.c`
Test results	`os/StarryOS/tests/results/`
Known bugs registry	`os/StarryOS/tests/known.json`
Pipeline	`${CLAUDE_PLUGIN_ROOT}/scripts/pipeline.sh --arch <arch>`
Bug reports	`docs/starry-reports/bugs/`
Work journal	`docs/starry-reports/journal.md`

Additional Resources

Reference Files

references/workflow.md — Detailed phase procedures, bug report template, known.json schema
references/syscall-patterns.md — Common bug patterns in syscall implementations with examples from this codebase

Related Skills

audit-kernel — For bugs beyond syscalls (scheduler, memory, concurrency, signals). Use audit-kernel when the bug is in kernel internals rather than syscall behavior.

Agents

linux-comparator — Docker Linux test runner + structured comparison
kernel-reviewer — Code quality review for kernel changes
bug-triager — Bug classification and prioritization

Before Finishing

Before presenting results to the user, self-check:

Every bug finding is backed by tier 1-5 evidence. Any tier 6-7 suspicion is marked "pending hypothesis."
If a fix was proposed: self-check, kernel-reviewer, and regression-check were all run. For P0/P1: independent re-derivation was attempted.
Results are recorded: known.json updated, journal entry written, strategy.json updated.
If any of these are incomplete, finish them before responding — do not present partial work as done.

hunt-bugs

Invocation

Context Preview

Supporting Files

SKILL.md

hunt-bugs

Invocation

Context Preview

Supporting Files

SKILL.md

StarryOS Bug Hunting Harness

Workflow Overview

Phase 1: Discovery

Phase 2: Test Generation

Phase 3: Linux Comparison

Phase 4: Root Cause Analysis

Phase 5: Fix (MANDATORY review pipeline)

Phase 6: Report

Key File Locations

Additional Resources

Reference Files

Related Skills

Agents

Before Finishing

Similar Skills

StarryOS Bug Hunting Harness

Workflow Overview

Phase 1: Discovery

Phase 2: Test Generation

Phase 3: Linux Comparison

Phase 4: Root Cause Analysis

Phase 5: Fix (MANDATORY review pipeline)

Phase 6: Report

Key File Locations

Additional Resources

Reference Files

Related Skills

Agents

Before Finishing

Similar Skills