From session-orchestrator
Runs modular quality probes adapted to the project's tech stack. Discovers issues via /discovery or session-end, then creates VCS tickets for confirmed problems.
How this skill is triggered — by the user, by Claude, or both
Slash command
/session-orchestrator:discoverysonnetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Two modes of operation:
Two modes of operation:
/discovery [scope]): Full 6-phase flow with interactive triage (Phases 0-6)discovery-on-close: true): Phases 0-4 only, returns structured findings to session-endThe scope argument accepts: all (default), code, infra, ui, arch, session, audit, vault, or comma-separated like code,session.
Read skills/_shared/bootstrap-gate.md and execute the gate check. If the gate is CLOSED, invoke skills/bootstrap/SKILL.md and wait for completion before proceeding. If the gate is OPEN, continue to Phase 1.
Read and parse Session Config per skills/_shared/config-reading.md. Store result as $CONFIG.
Discovery-relevant fields (parse these specifically):
discovery-on-close, discovery-probes, discovery-exclude-paths, discovery-severity-threshold, discovery-confidence-threshold, discovery-parallelismtest-command, typecheck-command, lint-commandpencil, vcs, cross-repos, stale-issue-daysDetect the project's tech stack via marker file checks. Use Glob and run checks in parallel:
| Marker File(s) | Activates |
|---|---|
package.json | JS/TS probes |
tsconfig.json | TypeScript probes |
requirements.txt / pyproject.toml | Python probes |
Dockerfile / docker-compose.yml | Container probes |
vercel.json / .vercel/ | Vercel probes |
.github/workflows/ | GitHub CI probes |
.gitlab-ci.yml | GitLab CI probes |
supabase/ | Supabase probes |
next.config.* / nuxt.config.* | SSR probes |
tailwind.config.* | Tailwind probes |
| Pencil in Session Config | design-drift probe |
.orchestrator/bootstrap.lock | harness-audit probe |
.vault.yaml OR Session Config vault-integration.enabled: true | vault probes |
package.json / requirements.txt / Cargo.toml AND Session Config slopcheck.enabled: true AND slopcheck.sources includes "discovery" | supply-chain probe (skills/discovery/probes-supply-chain.md) |
discovery-probes is set in config, intersect with that listscope argument was passed, restrict to that categoryThe audit probe activates when bootstrap.lock is present OR when discovery-probes config explicitly lists audit.
The vault probe activates when .vault.yaml is present in the repo root OR when vault-integration.enabled: true in Session Config OR when discovery-probes config explicitly lists vault.
Default exclude paths (always apply):
node_modules/, .git/, dist/, build/, .next/, .nuxt/, coverage/Add any paths from discovery-exclude-paths in Session Config.
VCS Reference: Detect the VCS platform per the "VCS Auto-Detection" section of the gitlab-ops skill.
Report: "Discovery: [N] probes active across [categories]. Stack: [detected]. Threshold: [severity]."
since_ref is provided)When since_ref is set (passed from the /discovery --since <git-ref> invocation):
changedFilesSince(since_ref) from scripts/lib/discovery/helpers.mjs.[] (no files changed since the ref), emit:
No files changed since <since_ref>. Skipping discovery.
and exit with status 0. Do NOT fall back to a full-repo scan.changedFiles context to each probe agent below.Probe exemptions: The vault-staleness probe and the harness-audit probe are EXEMPT from --since filtering — they always scan the full repository because their analysis targets metadata (vault narrative staleness, bootstrap lock state) that is not file-diff-gated. This exemption is advisory: no code enforcement is applied in this wave. The probe agents will naturally read whole-repo state; the changedFiles context they receive from --since is informational and does not restrict their glob/grep scope.
Dispatch probe agents IN PARALLEL using the Agent tool. Group by category (max $CONFIG['discovery-parallelism'] agents, default 5):
Cursor IDE: No Agent() tool available. Run probes sequentially within the current session — one category at a time. Complete each category's analysis before moving to the next.
skills/discovery/probes-vault.md): invokes skills/discovery/probes/vault-staleness.mjs and skills/discovery/probes/vault-narrative-staleness.mjs directly via node. Each probe returns {findings, metrics, duration_ms}. The runner reports FINDING: blocks per finding and appends summary records to .orchestrator/metrics/vault-staleness.jsonl and vault-narrative-staleness.jsonl.skills/discovery/probes-supply-chain.md): invokes skills/discovery/probes/supply-chain-slopcheck.mjs directly via node. Gated: only activates when slopcheck.enabled: true AND "discovery" is in slopcheck.sources (Session Config). The probe returns {findings, summary}. SLOP findings surface as critical, ASSUMED as medium, LEGITIMATE packages generate no finding. See probes-supply-chain.md for invocation details and classification reference.Each agent receives:
probes-intro.md (confidence scoring reference) AND the category-specific probes-<category>.md file for this agent's category (include the actual grep commands/patterns in the prompt)since_ref was provided and changedFiles is non-empty: the changedFiles array (informational context for per-probe filtering — per-probe filtering enforcement is deferred to W3)FINDING:
probe: <probe_name>
category: <category>
severity: <critical|high|medium|low>
file_path: <absolute path>
line_number: <number>
matched_text: <exact text from tool output>
title: <short title for the finding>
description: <1-2 sentence description>
recommended_fix: <concrete fix suggestion>
"If a probe's activation condition is not met, skip it with: SKIPPED: <probe_name> -- " "If a probe command fails, skip it with: FAILED: <probe_name> -- " "Do NOT fabricate findings. Only report what tool output confirms."
CRITICAL: run_in_background: false for all agents.
Skip categories with no activated probes (don't dispatch empty agents).
After all probe agents complete:
Collect all FINDING: blocks from agent outputs into a unified findings list.
For EACH finding:
file_path:line_number using the Read toolmatched_text appears at or near that line (+/-3 lines tolerance)For each verified finding, assign a confidence score (0-100) based on three factors:
| Factor | Low (+0) | Medium (+10) | High (+20) |
|---|---|---|---|
| Pattern specificity | Generic match (URL, TODO) | Moderate (orphaned annotation, magic number) | Specific (API key regex, eval(), SQL injection) |
| File context | Test fixture, example, seed data, docs | Utility, config, scripts | Production source, API handler, middleware |
| Historical signal | Previously dismissed as false positive | No prior data (first occurrence) | Recurring issue (confirmed in learnings.jsonl) |
Scoring rules:
critical get a minimum confidence of 70 — they are NEVER auto-deferredThreshold: Read discovery-confidence-threshold from Session Config (default: 60). If not configured, use 60.
Annotate each finding with its confidence score for Phase 5 presentation.
Two findings are duplicates if:
file_path ANDKeep the higher severity finding. Merge descriptions.
discovery-severity-threshold from Session Config.discovery-confidence-threshold (default: 60). Log filtered-out findings: "Auto-dismissed N low-confidence findings (below threshold [T]). Use discovery-confidence-threshold: 0 to see all."Group remaining findings by category for Phase 5 presentation.
If in embedded mode (called from session-end): STOP HERE. Return structured findings to the caller using this schema:
Embedded mode return schema:
{
"findings": [
{"probe": "string", "category": "string", "severity": "critical|high|medium|low", "confidence": 0-100, "file": "string", "line": number, "description": "string", "recommendation": "string"}
],
"stats": {
"probes_run": number,
"findings_raw": number,
"findings_verified": number,
"false_positives": number,
"user_dismissed": 0,
"issues_created": 0,
"by_category": {"<category>": {"findings": number, "actioned": 0}}
}
}
Present both as structured data in your final output. Do not proceed to Phase 5.
Before auto-defer and before presenting any findings for triage, load the persistent discovery triage state and filter findings through it:
Call loadTriageState() from scripts/lib/discovery/triage-state.mjs (uses default path .orchestrator/metrics/discovery-triage.jsonl). Returns an empty Map if the file does not exist — no error.
Call filterFindings({ findings: verifiedFindings, stateMap }) to partition findings into three buckets:
toShow — state is open, reopened, or no prior state entry (new findings — present for user triage)suppressed — state is dismissed or accepted-as-known (skip silently)tracked — state is promoted-to-#NNN (issue already filed; show as informational)Emit a one-line state banner before the summary table:
Triage state: [N suppressed] suppressed (dismissed/accepted-as-known), [N tracked] tracked in existing issues. Presenting [N toShow] findings.
Omit the banner entirely if all three counts are zero (first run).
Render tracked findings as informational lines in the summary — NOT as interactive triage items:
[INFO] Finding "<title>" (<file_path>) is tracked in #<issue_id> — not re-triaged.
Continue Phase 5 triage using only toShow findings. The suppressed bucket requires no user interaction.
After the user completes triage (Steps 1-4 below), append state changes to .orchestrator/metrics/discovery-triage.jsonl via appendTriageEntry() from triage-state.mjs:
{ fingerprint, state: 'promoted-to-#<issue_id>', issue_id: <N>, timestamp, session_id }{ fingerprint, state: 'dismissed', user_decision: '<reason>', timestamp, session_id }{ fingerprint, state: 'open', ... } entry per finding (so they re-appear next run if not yet promoted)Before presenting findings for triage, separate by confidence threshold:
/discovery --include-deferred."Present findings using AskUserQuestion -- NEVER plain text options. On Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists.
Include confidence scores in the presentation:
[CRITICAL] (confidence: 85) hardcoded-values: API key found in src/config.ts:42
[HIGH] (confidence: 72) security-basics: eval() usage in src/utils/parser.ts:18
[MEDIUM] (confidence: 61) orphaned-annotations: TODO without issue in src/lib/auth.ts:55
Present a findings overview table:
## Discovery Results
Probes run: [N] | Findings verified: [N] | False positives discarded: [N]
| Category | Critical | High | Medium | Low | Total |
|----------|----------|------|--------|-----|-------|
| Code | ... | ... | ... | ... | ... |
| Infra | ... | ... | ... | ... | ... |
| UI | ... | ... | ... | ... | ... |
| Arch | ... | ... | ... | ... | ... |
| Session | ... | ... | ... | ... | ... |
For each Critical or High finding, use AskUserQuestion (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "<finding title>\n\n<file_path>:<line_number>\n```\n<matched_text with +/-3 lines context>\n```\n\n<description>\n\nRecommended fix: <recommended_fix>",
header: "<severity>",
options: [
{ label: "Create issue (<severity>)", description: "Create a priority:<severity> issue for this finding" },
{ label: "Adjust priority", description: "Create issue with different priority" },
{ label: "Dismiss -- intentional", description: "This is by design, skip" },
{ label: "Dismiss -- false positive", description: "Detection was wrong, skip" }
]
}]
})
If user selects "Adjust priority", ask which priority with another AskUserQuestion. On Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists.
Group remaining findings by category. For each category with medium/low findings (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "[N] medium/low findings in [category]:\n\n1. [title] -- [file_path]:[line] ([severity])\n2. [title] -- [file_path]:[line] ([severity])\n...",
header: "[Category]",
options: [
{ label: "Accept all (Recommended)", description: "Create issues for all [N] findings" },
{ label: "Review individually", description: "Walk through each finding one by one" },
{ label: "Dismiss all", description: "Skip all medium/low findings in this category" }
]
}]
})
If "Review individually" selected, walk through each like Step 2.
Before creating any issues (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "Ready to create [N] issues?\n\n- [X] critical\n- [Y] high\n- [Z] medium\n- [W] low",
header: "Confirm",
options: [
{ label: "Create all [N] issues", description: "Proceed with issue creation" },
{ label: "Review list first", description: "Show full list before creating" },
{ label: "Cancel", description: "Do not create any issues" }
]
}]
})
VCS Reference: Detect the VCS platform per the "VCS Auto-Detection" section of the gitlab-ops skill. Use CLI commands per the "Common CLI Commands" section.
For each approved finding:
issue-templates.mdtype:discovery + priority:<level> + area:<inferred from category/filepath> + status:readyglab issue create --title "[Discovery] <title>" --label "type:discovery,priority:<level>,area:<area>,status:ready" --description "<body>"gh issue create --title "[Discovery] <title>" --label "type:discovery,priority:<level>,area:<area>,status:ready" --body "<body>"## Discovery Report
### Summary
- Probes run: [N] across [categories]
- Raw findings: [N]
- Verified: [N] (false positives discarded: [M])
- User approved: [N]
- Issues created: [N]
### Created Issues
| # | Title | Priority | Area | Probe |
|---|-------|----------|------|-------|
| <IID> | <title> | <priority> | <area> | <probe> |
### Dismissed Findings
- [N] dismissed as intentional
- [M] dismissed as false positive
### Recommendations
- [suggestions based on finding patterns]
discovery-severity-threshold -- filter before presenting to usernode_modules/, .git/, dist/, build/, .next/, .nuxt/, coverage/This phase runs only in standalone mode. Embedded mode returns findings to the caller.
After Phase 6 (Issue Creation) completes, prepare discovery statistics for session metrics:
Count totals from the triage results:
probes_run: number of probes that were activated and executedfindings_raw: total findings before verificationfindings_verified: findings that passed Phase 4.2 verificationfalse_positives: findings discarded during verificationuser_dismissed: findings the user declined during Phase 5 triageissues_created: issues created in Phase 6by_category: per-category breakdown of findings and actioned itemsReport stats summary:
Discovery stats: [probes_run] probes, [findings_raw] raw → [findings_verified] verified ([false_positives] false positives). User dismissed [user_dismissed]. Created [issues_created] issues.
These stats are available for session-end to include in sessions.jsonl under the discovery_stats field. The discovery skill does NOT write to sessions.jsonl directly — session-end handles that.
Persistent triage state prevents re-presenting the same finding on every /discovery run. State is stored in an append-only JSONL file and keyed by a stable fingerprint.
Location: .orchestrator/metrics/discovery-triage.jsonl (gitignored via .orchestrator/metrics/*.jsonl pattern — machine-local, never committed)
Format: One JSON object per line:
{"fingerprint":"aabb1122ccdd3344","state":"dismissed","user_decision":"intentional — debug log","timestamp":"2026-05-17T10:00:00.000Z","session_id":"deep-2"}
{"fingerprint":"eeff5566aabb7788","state":"promoted-to-#119","issue_id":119,"timestamp":"2026-05-17T10:01:00.000Z","session_id":"deep-2"}
computeFingerprint({probe, file, severity, ruleId}) → 16-char hex (sha256 prefix).
line_number is intentionally excluded — it drifts on refactoring without the underlying issue changing. A finding is considered "the same" as long as the probe, file path, severity, and ruleId match.
| State | Meaning |
|---|---|
open | Actively needs triage or was explicitly marked for re-review |
dismissed | User dismissed as intentional or false positive — suppressed on future runs |
accepted-as-known | Known issue, accepted without creating a VCS issue — suppressed on future runs |
reopened | Previously suppressed but re-surfaced by user decision — shown again |
promoted-to-#NNN | VCS issue created; shown informational ("tracked in #NNN") on future runs |
On each /discovery run, Phase 5 loads the state file and partitions findings before presenting them:
open or reopened → shown for triagedismissed or accepted-as-known → suppressed (silent — no user interaction needed)promoted-to-#NNN → informational line only ("tracked in #NNN")A suppressed finding re-appears only if its fingerprint changes — i.e., the probe, file path, severity, or ruleId changes. No TTL on dismissed state.
scripts/lib/discovery/triage-state.mjs — pure ESM, Node stdlib only. Exports:
computeFingerprint({probe, file, severity, ruleId}): stringloadTriageState(stateFilePath?): Promise<Map<fingerprint, entry>>appendTriageEntry(stateFilePath, entry): Promise<void>filterFindings({findings, stateMap}): {toShow, suppressed, tracked}code scope, don't scan infrastructuresessions.jsonl directly — session-end handles metrics persistencenpx claudepluginhub kanevry/session-orchestrator --plugin session-orchestratorRuns a repository engineering audit with SARIF-compatible evidence, 4-level confidence scoring, and OpenSSF-style health evaluation. Use when assessing code quality or repository health.
Runs a structured 14-dimension bug hunt using Draft context (architecture, tech-stack, product) to eliminate false positives. Generates severity-ranked reports with code evidence, data flow traces, and optional regression tests.
Proactively hunts for bugs by analyzing codebase risk (complexity, coverage, structure), then spawns investigators that write reproducing tests to validate suspected bugs. Advisory only — produces findings and tickets, no fixes.