Skill

griot-compact

Run the learnings validation pipeline using griot-* subagents. Bills against pooled Claude Code subscription tokens (no ANTHROPIC_API_KEY required). Use ONLY when explicitly invoked via /griot-compact. Processes every unprocessed session-note in learnings/session-notes/, runs the four-judge tier-based panel via parallel Agent calls, mediates consensus through `bin/griot mediate-panel`, attempts up to config.rewrite.max_attempts rewrites on UNCHANGED/REGRESSED outcomes, and escalates stuck notes to griot-operator for diagnosis. After per-note processing, re-runs every existing rollup entry through a single-judge pass-check to detect regressions, then composes a markdown PR body summarizing the run (delivered via the skill response — copy-paste into a GitHub PR). All JSONL writes go through `bin/griot operator-checks log-intervention`.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/griot:griot-compact

User invocable

Model invocation disabled

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteBashAgent

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Drive the learnings validation pipeline using griot-* subagents and

SKILL.md

782 lines · ~7.9k tokens(exceeds 5k compaction limit)

Stats

LanguageTypeScript

Parent stars0

MaintenanceExcellent

Last CommitMay 20, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Learnings Compact

Drive the learnings validation pipeline using griot-* subagents and the deterministic verbs under bin/griot. The skill itself is the orchestrator — there is no Node script delegating orchestration. Each LLM call is a subagent spawn; each parsing/tally/threshold step is a Bash invocation of a bin/griot verb; each JSONL append is a Bash invocation of bin/griot operator-checks log-intervention.

This skill is idempotent: if there are no unprocessed session-notes AND learnings/rollup.json is empty or missing, it does nothing.

Prerequisites

All inputs are repo-local. No environment variables required.
The pipeline runs on pooled Claude Code subscription tokens via subagent dispatch. There is no @anthropic-ai/sdk dependency.
learnings/config.yaml carries the panel composition, consensus thresholds, tiebreak rule, and per-role tier assignments. Read it at startup; do not re-read mid-run.

Substrate this skill composes

Five subagent role files under .claude/agents/: griot-base (shared stance), griot-judge, griot-rubric-author, griot-rewriter, griot-debate-summarizer, griot-operator.
Tier-based config in learnings/config.yaml.
bin/griot mediate-panel — pipes judge outputs through to get parsed verdicts, tally, threshold check, tier-split detection, tiebreak application.
bin/griot operator-checks — two modes: verify-rubric (rubric tampering detection, called between rewrite attempts) and log-intervention (sole writer for all JSONL appends in this pipeline: operator-log.jsonl, bench-history.jsonl, and regressions.jsonl).

The pipeline also spawns general-purpose subagents for the test subject (control and treatment outputs in §2 and §3) and for the regression suite's pass-check call (§3). These are not griot-* roles — they are plain Claude calls with role-specific system prompts and tier overrides at spawn time.

Procedure

1. Pre-flight

Read learnings/config.yaml. Extract:
- judges: array of {id, tier} (4 entries: 2 opus, 1 sonnet, 1 haiku)
- consensus: {round_1_blind, round_2_debate}
- tiebreak: {rule, top_tier}
- agents: {rubric_author.tier, rewriter.tier, debate_summarizer.tier, operator.tier}
- test_subject.tier
- rewrite.max_attempts
List learnings/session-notes/, excluding the archived/ subdir.
If there are no unprocessed notes AND learnings/rollup.json is empty or missing, tell the user "nothing to compact" and stop.
Otherwise echo the list of unprocessed notes to the user, then proceed.

Format-detection error path (substrate-cli Phase 4 rollup whiteboard skeptic Finding 1): if learnings/rollup.md is present AND learnings/rollup.json is missing, this session predates the Phase 4 rollup cutover. Stop, log an intervention via operator-checks.ts log-intervention with category format_predates_phase_4_rollup_cutover, and emit the error "session predates Phase 4 rollup cutover; restart session to pick up new skill body and run node .claude/scripts/migrate-rollup-md-to-json.ts" to the user. Do NOT fall back to writing into rollup.md — silent fallback risks splitting writes across two formats.

2. Per-note pipeline

For each unprocessed session-note (folder under learnings/session-notes/), run the steps below in order. Process notes serially; the parallelism is within each note's panel rounds, not across notes.

Step pre-A — Classification routing

Read the note's state.json sidecar via the Read tool and parse it. The classification field drives routing. The state.json sidecar was introduced in substrate-cli Phase 4; before that, classification lived as YAML frontmatter on learning.md. The migration script shipped with Phase 4 converts old-format folders to the new shape.

Format-detection error path (substrate-cli Phase 4 whiteboard skeptic Finding 1): if state.json is missing in the note folder AND learning.md starts with a --- YAML frontmatter block, this session predates the Phase 4 cutover and is running the old skill body against new on-disk state (or vice versa). Stop processing this note, log an intervention via operator-checks.ts log-intervention with category format_predates_phase_4_cutover (record: {note_slug, status: "format mismatch"}), and emit the error "session predates Phase 4 cutover; restart session to pick up new skill body and run .claude/scripts/migrate-session-notes.ts" to the user. Do NOT silently fall back to frontmatter parsing — silent fallback risks splitting writes across two formats.

If state.json is missing AND learning.md has no frontmatter, this is a pre-migration folder that needs the migration script run; emit the same error.

Branch on the classification value:

recurring — frequency-evidenced. The four-judge panel adds nothing the frequency counter hasn't already validated. Skip Steps A–B entirely. Go directly to Step C with synthetic verdict IMPROVED and proceed with the IMPROVED outcome (assign next L-NNN, push entry to learnings/rollup.json, archive).
generator-antipattern — run the full Step A + Step B panel exactly as today. The only divergence is in Step C's IMPROVED outcome: promote to learnings/rollup.json as an entry with classification: "AP", using an AP-NNN identifier scheme (compute the next ID as max(existing AP-NNN) + 1, starting at AP-001). See Step C's IMPROVED branch for the details. Post- Phase-4-rollup, antipattern entries live alongside learning entries in the same JSON array, distinguished by their classification field; the rendering layer (bin/griot use --as=llm) groups them under a ## Project antipatterns section in the prose output.
catalog-gap | evaluator-conflict | sanctioned-exception → not-yet-supported in v1. Skip Steps A–C entirely. Do NOT archive the note. Add it to a separate run-summary list under ## Not-yet-supported classifications so the PR-body output is honest about what was skipped and why.
unclassified → continue with the historical flow (Step A + Step B + Step C as today). All correction-text captures (--from-checkin path) record this classification in state.json.

This routing block reads state.json ONLY. It does not modify the note. The classification field is data the CLI authored when capturing; the SKILL trusts it.

Step A — Rubric (one-time per note)

If the note's rubric.md is missing, spawn griot-rubric-author via the Agent tool with model matching agents.rubric_author.tier from config.

The brief contains the contents of the note's correction.md and wrong.md only — deliberately NOT learning.md. The rubric must be derived from what the correction demands, not from any prose proposing a fix.

The subagent's response ends with a fenced ```rubric block containing a JSON array of 2–3 binary assertions. Extract the block and parse it. If extraction fails (no block, malformed JSON, wrong array shape), log a JSONL intervention via operator-checks.ts log-intervention with category rubric_author_failed and stop processing this note (proceed to the next session-note in the outer loop).

Otherwise write the rubric as learnings/session-notes/<note>/rubric.md in this canonical format:

# Rubric

_Immutable. Written by griot-rubric-author with fresh context. Any
attempt to modify this file is a hard violation._

- assertion 1
- assertion 2
- assertion 3 (optional)

Capture the full text of the rubric.md file you just wrote (or the existing one if it was already present) into the variable expected_rubric for the rest of this note's pipeline. This is the ground-truth rubric the rewriter is forbidden from modifying; attempt > 1 verifies on-disk against this captured value.

Initialize an empty attempts_history list. It accumulates one entry per attempt: {attempt, learning_text, verdicts}. The griot-operator consumes this list if the note ends up stuck.

Step B — Attempt loop

Iterate attempt from 1 to config.rewrite.max_attempts. Each iteration runs the full panel sequence (B.3 through B.6) and ends with a decision (B.7) to either exit the loop with a final verdict or continue to the next attempt.

B.1 — Verify rubric integrity (attempt > 1 only)

Skip on the first attempt — Step A just wrote the rubric.

On attempt > 1, invoke operator-checks.ts verify-rubric via Bash with stdin JSON:

griot operator-checks verify-rubric <<'INPUT'
{"rubric_path": "learnings/session-notes/<note>/rubric.md",
 "expected": "<full text of expected_rubric, JSON-escaped>"}
INPUT

Branch on the result:

Stdout {"ok": true} → continue to B.2.
Stdout {"ok": false, "actual": <text>} → the rubric on disk has been modified since Step A. Log a JSONL intervention via operator-checks.ts log-intervention with category rubric_tampered (record includes note_slug, attempt, short excerpts of expected and actual). Exit the attempt loop with synthetic verdict RUBRIC_TAMPERED.
Script-level error (non-zero exit) → the rubric file is missing or unreadable. Log an intervention with category rubric_tampered and a note in the record explaining "rubric file missing or unreadable". Exit the attempt loop with synthetic verdict RUBRIC_TAMPERED.

B.2 — Spawn rewriter (attempt > 1 only)

Skip on the first attempt — the candidate learning under evaluation is the one already in the note's learning.md.

On attempt > 1, spawn griot-rewriter via the Agent tool with model matching agents.rewriter.tier. The brief contains, in order:

Attempt number and config.rewrite.max_attempts.
Origin prompt (note's prompt.md).
Correction (note's correction.md).
Current learning text (note's learning.md as-is from the prior attempt).
Immutable rubric (expected_rubric from Step A).
Last panel reasoning: each non-errored judge's judge_id, tier, verdict, and reasoning from the prior attempt's final round.

The subagent's response ends with a fenced ```learning block containing the revised learning body. Extract the block. If extraction fails (no block, wrong shape), log a JSONL intervention with category rewriter_failed and exit the attempt loop with synthetic verdict REWRITER_FAILED.

Otherwise overwrite the note's learning.md with the revised text. Steps B.3 onward will use this updated learning.

B.3 — Generate control + treatment

In a single tool-use message, spawn two general-purpose subagents in parallel, both with model matching test_subject.tier:

Control: prompt is the contents of the note's prompt.md. System-style framing: "You are Claude, helping with a software task in the aart.camp repo." (no learning injected).
Treatment: prompt is the same prompt.md. System-style framing: "You are Claude, helping with a software task in the aart.camp repo. Before answering, consider this validated learning and apply it if relevant: <learning> plus <contents of learning.md> plus </learning>."

Capture both subagents' full text outputs as control_output and treatment_output.

If either subagent errors or returns no usable text, log a JSONL intervention via operator-checks.ts log-intervention with category test_subject_failed, the note slug, attempt number, and error text. Exit the attempt loop with synthetic verdict TEST_SUBJECT_FAILED.

B.4 — Round 1 panel

In a single tool-use message, spawn 4 griot-judge subagents in parallel — one per entry in the judges array. Each spawn's model parameter matches that judge's tier.

The brief for each judge contains, in this order:

Origin prompt (note's prompt.md)
Correction (note's correction.md)
Candidate learning (note's learning.md, as updated by B.2 on attempt > 1)
Rubric (expected_rubric)
Control output (from B.3)
Treatment output (from B.3)

The judge's response ends with a fenced ```verdict block. Capture each judge's full text as raw_output[i], paired with judges[i].id and judges[i].tier, preserving order.

B.5 — Mediate round 1

Pipe the four raw outputs through bin/griot mediate-panel via Bash with stdin JSON:

griot mediate-panel <<'INPUT'
{
  "round_num": 1,
  "verdicts": [
    {"judge_id": "opus-A", "tier": "opus", "raw_output": "..."},
    {"judge_id": "opus-B", "tier": "opus", "raw_output": "..."},
    {"judge_id": "sonnet", "tier": "sonnet", "raw_output": "..."},
    {"judge_id": "haiku", "tier": "haiku", "raw_output": "..."}
  ],
  "config": {
    "consensus": {"round_1_blind": 4, "round_2_debate": 3},
    "tiebreak": {"rule": "top_tier_consensus", "top_tier": "opus"}
  }
}
INPUT

Parse the JSON output. Save the parsed verdicts array into attempts_history for this attempt. If threshold_met is true, this attempt's verdict is consensus_verdict — proceed to B.7. If threshold_met is false, continue to B.6.

B.6 — Round 2 panel (only when round 1 didn't reach unanimity)

Spawn griot-debate-summarizer with model matching agents.debate_summarizer.tier. The brief contains:
- The round number that just completed (1).
- For each non-errored judge from round 1: their judge_id, tier, verdict, and reasoning text. The summarizer's response ends with a fenced ```summary block. Extract the summary text.
Spawn 4 griot-judge subagents again (same as B.4), but append the summary to each judge's brief under a clearly- delimited "## Other judges' reasoning from the previous round" section. Each judge re-evaluates with full knowledge of the prior round's positions.
Pipe the round 2 outputs through mediate-panel.ts with round_num: 2. Save the parsed verdicts array into attempts_history for this attempt (overwriting the round 1 entry — the latest round is what the rewriter and operator consume).
Branch on the result:
- threshold_met == true → this attempt's verdict is consensus_verdict. Proceed to B.7.
- threshold_met == false AND tiebreak_applied == true → this attempt's verdict is tiebreak_verdict. Proceed to B.7.
- threshold_met == false AND tiebreak_applied == false → no consensus and no tiebreak fired. Log a JSONL intervention via operator-checks.ts log-intervention with category no_consensus, the note slug, attempt number, and both rounds' tallies. Exit the attempt loop with synthetic verdict NO_CONSENSUS.

B.7 — Decide attempt outcome

Branch on the verdict for this attempt (set in B.5 or B.6):

IMPROVED → exit the attempt loop with verdict IMPROVED. Record attempt_count (= current attempt) for the run summary.
DID_NOT_REPRODUCE → exit the attempt loop with verdict DID_NOT_REPRODUCE.
NO_CONSENSUS → already handled in B.6 (loop exited).
UNCHANGED or REGRESSED:
- If attempt < config.rewrite.max_attempts, increment attempt and continue to the next iteration (back to B.1).
- If attempt == config.rewrite.max_attempts, exit the attempt loop with synthetic verdict STUCK_LEARNING. The attempts_history list now contains every attempt; pass it through to Step C.

Step C — Outcome handling

Branch on the final verdict reached when the attempt loop exited.

IMPROVED:

Branch on classification (read in Step pre-A):

Promotion now writes to learnings/rollup.json (post-Phase-4-rollup cutover): the rollup is a JSON array of entries. Promotion reads the array, computes the next ID, pushes a new entry object, and writes the array back. The legacy markdown-append path is removed.

generator-antipattern → promote to the rollup as an AP-NNN entry:
1. Read learnings/rollup.json via the Read tool (default to [] if missing). Find the maximum existing AP-NNN ID across all entries with classification: "AP". The new ID is one more, zero-padded to three digits. If no AP-NNN entries exist yet, the first ID is AP-001. The AP-NNN and L-NNN namespaces are independent — do not conflate.
2. Push a new entry to the array with the shape:
```
{
  "id": "AP-<NNN>",
  "title": "<short title derived from learning.md>",
  "classification": "AP",
  "promoted": "<today's date in YYYY-MM-DD>",
  "origin": "<note's slug>",
  "body": "<full contents of note's learning.md, verbatim>",
  "rubric": null,
  "evaluator": "<state.json `evaluator` field>",
  "code": "<state.json `code` field>"
}
```
3. Write the updated array back to learnings/rollup.json via the Write tool (JSON.stringify(entries, null, 2) with a trailing newline — matches the migration script's output shape).
4. Move the session-note folder to archived/ (same as the learnings path).
5. Mark this note's outcome as "promoted as AP-NNN on attempt N (generator-antipattern)" for the run summary.
recurring | unclassified → promote to the rollup as an L-NNN entry:
1. Read learnings/rollup.json via the Read tool (default to [] if missing). Find the maximum existing L-NNN ID across all entries with classification: "L"; the new ID is one more, zero-padded to three digits. If no L-NNN entries exist yet, the first ID is L-001.
2. Push a new entry to the array with the shape:
```
{
  "id": "L-<NNN>",
  "title": "<short title derived from learning.md>",
  "classification": "L",
  "promoted": "<today's date in YYYY-MM-DD>",
  "origin": "<note's slug>",
  "body": "<full contents of note's learning.md, verbatim — post-Phase 4, learning.md is pure prose>",
  "rubric": [<expected_rubric criteria as an array of strings>, or null for `recurring` notes (panel skipped)]
}
```
  The "short title" is the first line of learning.md if it reads as a title, or a 3–5 word distillation of the learning's first sentence otherwise.
3. Write the updated array back to learnings/rollup.json via the Write tool.
4. Move the session-note folder from learnings/session-notes/<note>/ to learnings/session-notes/archived/<note>/ via Bash.
5. Mark this note's outcome as "promoted as L-NNN on attempt N" (or "promoted as L-NNN (recurring, no panel)" for the recurring branch) for the run summary.

DID_NOT_REPRODUCE:

Log a JSONL intervention via operator-checks.ts log-intervention with category did_not_reproduce, the note slug, attempt count, and the round tallies.
Move the session-note folder to archived/.
Mark this note's outcome as "skipped (did not reproduce)".

NO_CONSENSUS:

Already logged in B.6 — no additional log here.
Do not archive — the note is left in place for human review.
Mark this note's outcome as "escalated (no consensus)" for the run summary.

RUBRIC_TAMPERED or REWRITER_FAILED or TEST_SUBJECT_FAILED:

Already logged in B.1 / B.2 / B.3 — no additional log here.
Do not archive — the note is left in place for human review.
Mark this note's outcome as escalated (<reason>) (e.g. escalated (rubric tampered)).

STUCK_LEARNING:

Spawn griot-operator via the Agent tool with model matching agents.operator.tier. The brief contains:
- Immutable rubric (expected_rubric).
- Origin prompt (note's prompt.md).
- Correction (note's correction.md).
- Full attempts_history: for each attempt, the learning text that was tried and the per-judge verdicts (with pass/fail counts on control and treatment if available).
The operator's response ends with a fenced ```diagnosis block containing JSON {category, notes} where category is one of:
- same_assertion_fails_every_attempt
- different_assertions_fail_each_attempt
- control_and_treatment_always_identical
- other Extract and parse the block.
Log a JSONL intervention via operator-checks.ts log-intervention with category stuck_learning, record shape {note_slug, attempts_count: max_attempts, diagnosis: <parsed object>}.
Do not archive — the note is left in place for human review.
Mark this note's outcome as escalated (stuck: <diagnosis.category>).

3. Regression suite

After all session-notes are processed, re-run every existing rollup entry through a single-judge pass-check to detect regressions: entries that were passing in the prior run and now fail. The check is intentionally lighter than the per-note panel — one treatment generation plus one binary pass-check per entry — to keep cost reasonable as the rollup grows.

If learnings/rollup.json is empty or missing, skip §3 entirely (set the regression counters to 0 and current_passing_ids to the empty set).

Step R.1 — Read prior passing set

Read the most recent line from learnings/bench-history.jsonl (if the file exists). Parse it as JSON and extract the passing_ids field (default empty array if missing or unparseable). This prior_passing_ids set drives "new failure" detection.

If learnings/bench-history.jsonl does not exist, treat prior_passing_ids as the empty set — every entry is "new" on the first run, so no regression flags fire.

Step R.2 — Parse rollup

Read learnings/rollup.json via the Read tool and parse the array. Each entry is already structured as {id, title, classification, promoted, origin, body, rubric, ...}. For each entry, capture {id, origin_slug: entry.origin, rubric: entry.rubric}. The pre-Phase-4-rollup markdown-parse step is gone — the rollup is now machine-format JSON.

Initialize counters: regressions_passing = 0, regressions_failing = 0, regressions_new = 0. Initialize current_passing_ids as an empty list. Initialize new_regression_records as an empty list (used in §4).

Step R.3 — Per-entry pass check

Process entries serially (rate-limit hygiene; the rollup can grow large). For each entry:

Read the archived prompt. Look for learnings/session-notes/archived/<origin_slug>/prompt.md. If it does not exist, skip this entry — it cannot be regression-tested without an origin prompt. Do not increment any counter; the entry simply doesn't appear in this run's regression record.
Generate treatment with full rollup. Spawn ONE general-purpose subagent with model matching test_subject.tier. System-style framing: "You are Claude. Apply the validated learnings below where relevant: <stdout of bin/griot use --as=llm>" (the CLI renders rollup.json to LLM prose; pipe its output into the framing). User prompt: the archived prompt.md. Capture the output as treatment_output. If the spawn errors or returns no text, log a JSONL intervention via operator-checks.ts log-intervention with category regression_check_failed and skip this entry.
Run the pass check. Spawn ONE general-purpose subagent with model: opus (the SDK pinned this to top-tier; we keep it). System-style framing:

"You evaluate whether a Claude output passes every assertion in a rubric. Binary judgement, no hedging. End your response with a fenced regression-check block containing JSON {all_pass: boolean, failed_assertions: string[], reasoning: string}."

User message contains the entry's rubric and the treatment output, clearly labelled. The subagent's response ends with a fenced ```regression-check block. Extract and parse the block. If extraction fails, log a JSONL intervention via log-intervention with category regression_check_failed (record includes learning_id) and skip this entry.
Record the result.
- If all_pass: true → increment regressions_passing and add id to current_passing_ids.
- If all_pass: false:
  - Increment regressions_failing.
  - If id was in prior_passing_ids → this is a new regression. Log a JSONL intervention via operator-checks.ts log-intervention:
```
griot operator-checks log-intervention <<'INPUT'
{"log_path": "learnings/regressions.jsonl",
 "record": {
   "ts": "<ISO 8601 timestamp>",
   "category": "regression",
   "learning_id": "L-NNN",
   "failed_assertions": ["..."],
   "reasoning": "...",
   "was_passing_in_prior_run": true
 }}
INPUT
```
    Increment regressions_new and append a summary record {id, failed_count: <length>, reasoning} to new_regression_records for §4.
  - If id was NOT in prior_passing_ids → chronic failure. Do NOT log (avoids per-run noise on entries that have never passed). The chronic failure still counts toward regressions_failing.

After all entries are processed, the regression counters and current_passing_ids are ready for §4 (PR body) and §5 (bench-history append).

4. Compose nightly PR body

Compose a markdown blob synthesizing the run's results and emit it to the user via the skill response. No file write — the chat output is the delivery channel.

Blob shape (always include every section, with (none) in empty lists for predictable structure):

# Nightly /griot-compact run — <today's date YYYY-MM-DD>

## Run summary
- Notes processed: <N>
- Promoted: <N>
- Did not reproduce: <N>
- Escalated: <N> (stuck: <S>, rubric tampered: <T>, rewriter
  failed: <RF>, test subject failed: <TSF>, no consensus: <NC>)
- Rollup entries checked: <regressions_total>
  (passing: <regressions_passing>, failing:
  <regressions_failing>, new regressions: <regressions_new>)

## Promoted learnings
- L-NNN — "<short title>" (from <slug>) — attempt <M>
[...for each note marked promoted; or "(none)"]

## Escalations needing review
- <slug>: <reason>
[...for each escalation; or "(none)"]

## New regressions
- L-NNN — <count> failed assertions
[...for each entry in new_regression_records; or "(none)"]

## Run metadata
\`<bench-history record JSON, single line>\`

Where regressions_total = regressions_passing + regressions_failing (entries successfully processed; entries skipped for missing prompts are not counted). The bench-history record JSON in the ## Run metadata section is the same record that §5 will append — it can be assembled in working state here and reused.

After emitting the blob, add a one-line pointer below it: "Copy the section above into a GitHub PR. Diagnosis detail is in learnings/operator-log.jsonl and learnings/regressions.jsonl."

5. End-of-run housekeeping

Append the run record to learnings/bench-history.jsonl via operator-checks.ts log-intervention:

griot operator-checks log-intervention <<'INPUT'
{"log_path": "learnings/bench-history.jsonl",
 "record": {
   "ts": "<ISO 8601 timestamp>",
   "notes_processed": <int>,
   "notes_promoted": <int>,
   "notes_did_not_reproduce": <int>,
   "notes_no_consensus": <int>,
   "notes_escalated": <int>,
   "regressions_total": <int>,
   "regressions_passing": <int>,
   "regressions_failing": <int>,
   "regressions_new": <int>,
   "passing_ids": ["L-001", "L-002", "..."],
   "run_kind": "compact"
 }}
INPUT

notes_escalated is the sum of STUCK_LEARNING, RUBRIC_TAMPERED, REWRITER_FAILED, and TEST_SUBJECT_FAILED outcomes. (NO_CONSENSUS has its own counter.) passing_ids is the sorted array form of current_passing_ids from §3 — the next run reads this field to decide which failures are NEW vs chronic.

This append is the skill's last action. The PR body in §4 is already in the user's chat; §5 just persists the record.

Do not

Do not invoke this skill automatically. The user triggers it.
Do not edit rubric.md files post-write. They are immutable after griot-rubric-author writes them. Tampering is detected programmatically by operator-checks.ts verify-rubric between rewrite attempts; a tampered rubric exits the attempt loop and escalates.
Do not skip the rubric integrity check on attempt > 1. The check is the only thing standing between the rewriter and rubric drift; bypassing it defeats the immutability guarantee.
Do not hand-promote a learning to rollup.json. Only IMPROVED via the judge panel gets in. Round-1 unanimity OR round-2 supermajority OR round-2 tiebreak — anything else does not promote.
Do not append to learnings/operator-log.jsonl or learnings/bench-history.jsonl directly. Every JSONL write goes through operator-checks.ts log-intervention. No raw >> redirects, no echo to JSONL paths.
Do not import @anthropic-ai/sdk or call the Anthropic API directly. The dependency was removed in Phase 3; the only path is pooled tokens via subagent dispatch.
Do not iterate the attempt loop past config.rewrite.max_attempts. The loop's exit conditions are exhaustive: a final verdict at any attempt that isn't UNCHANGED or REGRESSED exits immediately; UNCHANGED/REGRESSED exits at max_attempts with STUCK_LEARNING.
Do not skip the regression suite when rollup.json has entries. The pass-check is what detects model drift between runs; an empty rollup is the only valid skip condition.
Do not log a chronic regression failure (a rollup entry that was failing the prior run and is still failing now). Only NEW failures — entries that flipped from passing to failing — warrant a record in regressions.jsonl. Chronic failures count toward regressions_failing but do not log.
Do not write the nightly PR body to disk. The user's chat output is the delivery channel; persisting to a file is out of scope for this skill.
Do not use the full 4-judge panel for regression checks. The regression suite uses a single general-purpose subagent for treatment generation and a single general-purpose subagent (opus tier) for the pass-check. The full panel is reserved for per-note judgement; using it for regression doubles+ the cost without proportional signal.

After a successful run

The post-run summary is delivered via the skill response — copy-paste into a GitHub PR. Trend / cost / judge-calibration reporting was previously offered by the /griot-report wrapper skill, which was removed in substrate-cli Phase 1 once bin/griot made the wrapping unnecessary; future reporting can read bench-history.jsonl and operator-log.jsonl directly.

griot-compact

Invocation

Tool Access

Context Preview

SKILL.md

griot-compact

Invocation

Tool Access

Context Preview

SKILL.md

Learnings Compact

Prerequisites

Substrate this skill composes

Procedure

1. Pre-flight

2. Per-note pipeline

Step pre-A — Classification routing

Step A — Rubric (one-time per note)

Step B — Attempt loop

B.1 — Verify rubric integrity (attempt > 1 only)

B.2 — Spawn rewriter (attempt > 1 only)

B.3 — Generate control + treatment

B.4 — Round 1 panel

B.5 — Mediate round 1

B.6 — Round 2 panel (only when round 1 didn't reach unanimity)

B.7 — Decide attempt outcome

Step C — Outcome handling

3. Regression suite

Step R.1 — Read prior passing set

Step R.2 — Parse rollup

Step R.3 — Per-entry pass check

4. Compose nightly PR body

5. End-of-run housekeeping

Do not

After a successful run

Similar Skills

Learnings Compact

Prerequisites

Substrate this skill composes

Procedure

1. Pre-flight

2. Per-note pipeline

Step pre-A — Classification routing

Step A — Rubric (one-time per note)

Step B — Attempt loop

B.1 — Verify rubric integrity (attempt > 1 only)

B.2 — Spawn rewriter (attempt > 1 only)

B.3 — Generate control + treatment

B.4 — Round 1 panel

B.5 — Mediate round 1

B.6 — Round 2 panel (only when round 1 didn't reach unanimity)

B.7 — Decide attempt outcome

Step C — Outcome handling

3. Regression suite

Step R.1 — Read prior passing set

Step R.2 — Parse rollup

Step R.3 — Per-entry pass check

4. Compose nightly PR body

5. End-of-run housekeeping

Do not

After a successful run

Similar Skills