From nuclear-grade
Records agent run tool calls, decisions, inputs, outputs, token use, and approval steps as repeatable evidence for debugging, auditing, cost review, or release decisions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nuclear-grade:recording-what-an-agent-didThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A "does it work" check proves what an agent produced, not how it got there. Sometimes how it got there matters: for debugging, for auditing, for reviewing cost, or for defending a release decision. This skill says what to record about the run, how much detail to capture, and how to link it into the packet's `trace.md` and `verification.md` as evidence someone else could reproduce.
A "does it work" check proves what an agent produced, not how it got there. Sometimes how it got there matters: for debugging, for auditing, for reviewing cost, or for defending a release decision. This skill says what to record about the run, how much detail to capture, and how to link it into the packet's trace.md and verification.md as evidence someone else could reproduce.
basis.md granted.basis.md and plan.md -> step-level trace rows, decision-point and approval records, and a token/delay summary in trace.md/verification.md, linked to ship.md.ship.md decision relies on.ship.md; a power breach or unexpected side effect escalates to pause/incident.basis.md (the scope the run was meant to stay in, the allowed actions, and the stop conditions).plan.md (the planned order of steps).pass, gap, fail, or not applicable.verification.md that the evidence supports. When a tracing platform already holds the run, link to its export rather than copying it — an OpenAI trace, a LangSmith run, a Claude Code session log, a GitHub Actions run, a local command transcript, or an MCP/tool-call export — and record the link and its trace id in trace.md. The packet holds the link and the verified facts; the platform holds the raw spans.trace.md or verification.md: step, action, inputs, outputs, evidence status.ship.md.verification.md.python tools/ng.py validate .nuclear/changes/<slug> passes.basis.md.Trace this agent run and produce clear evidence.
Inputs:
- packet: .nuclear/changes/<slug>/
- execution source: <log / transcript / tool-call export>
- authority scope: <basis.md section or inline>
- token/latency data available: <yes/no>
- approval gates exercised: <list or none>
For each consequential step (tool call, file edit, command run, API call,
approval gate):
- Name the action and the tool.
- Record the inputs (shortened) and the output or result.
- Set an evidence status: pass, gap, fail, or not applicable.
- At decision points: record the choice made, the limit applied, and the authority check.
- For approval gates: the reviewer, the date, and the decision.
Return:
- trace rows for trace.md: step, action, inputs, outputs, evidence status.
- the decision-point records.
- a summary of token use and speed (if available).
- a run summary: steps within scope, steps uncertain, and gaps.
- a link from each trace row to the claim in verification.md it supports.
This skill is an original run-evidence workflow for AI agents, influenced by W&B Weave trace-tree observability (span-per-call, auto-logging, audit lineage), the NVIDIA NeMo Agent Toolkit profiling model (token, latency, and cost captured per step), and OpenTelemetry distributed tracing concepts (structured spans, parent-child relationships, reproducible records), all mapped as supporting context in docs/00-standards-foundation/source-map.md. It does not create formal audit assurance, security certification, compliance, or regulatory adequacy. A run trace is a focused engineering record, not a formal audit trail.
npx claudepluginhub flyfission/nuclear-grade-context-engineering --plugin nuclear-gradeView audit logs, decision traces, and session history for AI transparency. Supports log, trace, summary, and search subcommands.
Fetches and analyzes LangSmith execution traces to debug LangChain/LangGraph agents. Use when investigating errors, tool calls, memory operations, or agent performance.
Audits local AI coding-agent sessions with agenttrace for cost, tool failures, latency, anomalies, health scores, diffs, and CI gates. Use when a run was slow, expensive, or unreliable.