Skill

agent-introspector

Self-introspection, self-debug, reasoning analysis, decision trace, internal state inspection. Use when diagnosing agent behavior or unexpected reasoning.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agent-platform-pro:agent-introspector

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill allows an AI agent to systematically examine its own reasoning process, decision chain, tool calls, and context state. Inspired by the Agent Introspection & Debugging approach of ECC, the agent maintains a log of everything it does, analyzes it transparently, and detects/resolves errors, inefficiencies, and contradictions. It operates on the principle of "knowing oneself."

SKILL.md

259 lines · ~3.5k tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMay 25, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Agent Introspector — Agent Introspection & Debugging

Overview

Core Principle: Trust no claim without evidence. Every decision, tool call, and inference must be auditable and verifiable.

The Iron Law

DO NOT ACCEPT ANY DECISION, CLAIM, OR TOOL CALL WITHOUT AN EXPLICIT SOURCE.
DO NOT ASSUME A CLAIM IS CORRECT UNTIL PROVED WITH SOURCE FILES, LOGS, OR TOOL OUTPUTS.

When to Use

Always use when:

The agent produces unexpected or erroneous output.
You need to understand how the agent arrived at a particular decision.
You suspect inefficiencies or redundant loops in tool calls.
You suspect hallucinations (made-up information).
You want to optimize token usage.
You suspect important information has been lost in the context window.
You receive repetitive errors on the same issue.
You are investigating why a task takes longer than expected.

Especially use when:

The agent claims to be "sure" but provides no source — it is likely hallucinating.
You feel "this time is different" but fail to realize you are repeating the same pattern.
You feel something is going wrong in the middle of a multi-step, complex task.
Under time pressure, you feel the urge to rush through, thinking "it must be correct anyway."

Skip ONLY when:

You are at the very beginning of a task and no tool calls have been made yet.
The agent has not yet made a single decision.
The operations are simple, one-step, and completely unquestionable.

Phase 1: Decision Trace Analysis

Before proceeding: Uncover the entire chain of a specific decision made by the agent.

Define the Decision — Which decision are you auditing? Define it clearly: "I am examining decision X because of Y."
Rewind the Reasoning Chain — List all intermediate steps leading to the decision. For each step: what information did it rely on, what assumption was made, and what alternative was evaluated?
Identify Weak Links — Assign a confidence score (1-10) to each step in the chain. Mark steps scored below 7 as red flags.
Re-evaluate the Decision — If there are weak links, what would have happened if a different decision had been made at that point? Simulate the alternative path.

QUERY FORMAT:
Decision: [description]
Reasoning Chain:
  Step 1: [information/source] → [inference] (confidence: X/10)
  Step 2: [information/source] → [inference] (confidence: X/10)
  ...
Weak Links: [list]
Alternative Scenario: [what could have been done differently?]

Phase 2: Tool Call Audit

Before proceeding: Systematically review all tool calls made by the agent.

List All Tool Calls — For each call: which tool, which inputs, which outputs, and how long did it take?
Detect Redundant Calls — Are there calls requesting the same data twice, returning outputs that are never used, or repeating an already completed operation?
Mark Inefficient Calls — Could the same result have been achieved with fewer calls? Could more specific parameters have been used?
Analyze Failed Calls — Why did calls returning errors fail? Was the error message interpreted correctly? Was the retry strategy appropriate?

Efficiency Metrics:

Success rate per call: X%
Percentage of redundant calls: X%
Average call duration: X seconds

Phase 3: Context Window Inspection

Before proceeding: Determine what is currently in the agent's context, what has been compacted/summarized, and what is missing.

Dump Active Context Content — What information is currently in the active memory/context?
Detect Compacted/Deleted Content — What information has been compacted or summarized out of the context in previous steps? Could an important detail have been lost?
Identify Missing Information — Is there information missing from the context that is required to make a correct decision?
Context Refresh Strategy — Which tool call should be made to retrieve the missing information? Or is the summary sufficient?

CONTEXT STATUS REPORT:
Active context size: ~X tokens
Number of compacted details: X
Critical details that might be missing: [list]
Refresh recommendation: [which tool/required step]

Phase 4: Token Usage Breakdown

Before proceeding: Break down token consumption by phase, tool, and file.

Token Consumption by Phase — How many tokens were spent in each phase/subtask?
Token Consumption by Tool — Which tool consumed the most tokens? Is it input-heavy or output-heavy?
Token Consumption by File — What is the token cost of the files read/written?
Optimization Opportunities — Which calls can be consolidated? Which files can be read using only specific line ranges?

TOKEN REPORT:
Total tokens: X (input: X, output: X)
Most expensive tool: [name] — X tokens
Most expensive phase: [name] — X tokens
Savings potential: ~X tokens (%Y)

Phase 5: Hallucination Detection

Before proceeding: Cross-verify every claim made by the agent using source files, tool outputs, and logs.

List Claims — Write down every concrete claim made by the agent one by one (e.g., "Function X exists in the file", "API Y returns Z").
Verify Sources — For each claim: open the source file, find the relevant line, and check if it matches the claim.
Consistency Check — Do claims contradict one other? Are two different things being said about the same topic in different places?
Verification Report — List the claims that passed or failed verification. Write corrective recommendations for failed claims.

HALLUCINATION REPORT:
Total claims: X
Verified: X
Refuted: X
Unverified (insufficient sources): X

Phase 6: Circular Reasoning Detection

Before proceeding: Check for circular or looping logic in the decision-making process.

Look for Repetitive Patterns — Has the same tool call been made twice? Has the same decision been audited twice?
Extract Response Dependency Graph — Does decision A depend on B, B depend on C, and C depend back on A?
Define a Loop-Breaking Strategy — If a loop exists, at which step can it be broken using external input (user confirmation, new tool call, static data)?
Document the Loop — The loop's starting point, number of repetitions, and token cost.

Phase 7: Confidence Scoring

Before proceeding: Assign a confidence score to each key claim/decision made by the agent and identify weak evidence.

Label Claims by Confidence Level — High (8-10), Medium (5-7), Low (1-4)
Identify Weak Evidence — Why are low-confidence claims low? Insufficient sources, forced inferences, or ambiguity?
Confidence Escalation Plan — For each low-confidence claim: what additional evidence is required, and which tool call can collect it?
Overall Confidence Report — What is the overall confidence score of the decision chain?

Phase 8: Alternative Path Analysis

Before proceeding: Examine the alternatives that were evaluated alongside the selected path.

List Evaluated Alternatives — What alternatives did the agent evaluate? Create a pros/cons list for each.
Determine Elimination Rationale — For each alternative: by what criterion was it eliminated? Was this criterion objective? Was there bias?
Simulate Alternative Performance — What would have happened if the alternative had been implemented? Would it be faster, or consume fewer tokens?
Alternative Report — Is there an alternative that should have been chosen? If so, why is it not being switched to now?

Phase 9: Performance Profiling

Before proceeding: Measure the agent's performance in terms of time, resources, and success metrics.

Duration by Task — How long did each task/subtask take?
Tool Efficiency — How many tool calls were made per task? What is the success rate?
Bottleneck Analysis — In which phase was the most time spent? Why?
Improvement Recommendations — What changes can be made to increase performance?

PERFORMANCE REPORT:
Total duration: X mins
Average duration per task: X mins
Average tool calls per task: X
Overall success rate: X%
Slowest phase: [name] — X mins
Bottleneck explanation: [reason]

Phase 10: State Dump & Final Verification

Before proceeding: Export the entire internal state of the agent and perform a final verification.

Perform Full State Dump — Gather all decisions, context history, tool call records, and token usage into a single report.
Verify Commitments — What promises did the agent make to the user? (editing a file, calling an API, writing a test) Did it fulfill them all?
Completeness Check — Was the initial goal achieved? Are there missing pieces?
Traceability Report — Does every output contain a reference to which input/task it is based on?

Verification checklist before finishing:

Red Flags — Stop and Audit

If you catch yourself thinking:

"No need for such a detailed review, I'm sure it's correct."
"I checked the same thing again, it will yield the same result."
"I don't have time, I'll fix it now and inspect later."
"This much hallucination is not important, the user won't notice."
"There was no alternative anyway, no other choice."
"This is not a loop, I'm just repeating it to be careful."
"Token usage doesn't matter as long as the problem is solved."
"This information was already in the context, no need to re-verify."

ALL OF THE ABOVE MEAN: STOP. RETURN TO THE RELEVANT PHASE.

Signals Your Human Partner Warns You With

Pay close attention to these instructions:

"Where did you get this from?" — You made a claim without showing a source; return to Phase 5.
"You said this before, but you still haven't fixed it." — Circular reasoning or faulty correction; return to Phase 6.
"Why did this take so long?" — Performance profiling might have been skipped; return to Phase 9.
"There is no such thing in this file." — Hallucination; return to Phase 5.
"We tried this before." — You skipped alternative paths; return to Phase 8.
"I don't understand at which step you made this decision." — The decision trace is not clear; return to Phase 1.
"Don't you think this was a bit too many calls?" — Tool call audit skipped; return to Phase 2.

When you see these: STOP. Return to the phase indicated in parentheses.

Common Rationalizations

Excuse	Reality
"I already know it's correct, no need to check."	Information is just a claim until verified. Agent memory can make mistakes.
"I'm skipping to save time, I'll return anyway."	The cost of fixing is usually 10 times the cost of prevention. Do it now.
"This was a repetitive check, minor."	Repetitive checks are the biggest indicator of circular reasoning. Question why it is repeating.
"Hallucination is a small detail, it doesn't affect the workflow."	Small hallucinations grow into major bugs. They are the weak links in the chain.
"Tokens are cheap, what if we spend too many?"	Token cost is not just money; it is context window waste and performance degradation.
"The user doesn't want this much detail."	The user wants correct results. Correct results are guaranteed through introspection.
"There was no alternative, so I chose this."	There are always at least two alternatives. If you couldn't find them, you didn't think enough.

Related Skills

debugging-engineer — Systematic error-finding from the outside; introspector calls this skill to fix detected errors
thinking-patterns — Analyzes the thought patterns used by the agent; useful for determining which thought model was used during introspection
bias-detector — Detects cognitive biases in decisions; used in conjunction with confidence scoring and alternative path analysis
context-manager — Provides optimized strategies for context window management; apply these strategies during context window inspection
tester — Tests outputs produced by the agent; should be called for code/tests corrected after hallucination detection
observer — Observes and reports agent behaviors; used to regularly report introspection findings

Self-Review

After completing the process of this skill:

Scope Check: Have you completed each phase (decision trace, tool audit, context inspection, token analysis, hallucination detection, loop detection, confidence scoring, alternative analysis, performance profile, state dump)?
Edge Case Check: Have you handled unexpected scenarios (failed tool calls, missing context, contradictory claims)?
Quality Check: Do the outputs meet the standard of the Iron Law — is every claim referable?
Improvement Loop: Did you note action items from this introspection to apply in the next task?
User Trust: Is the report clear enough for the user to say "okay, I trust this"?

agent-introspector

Invocation

Context Preview

SKILL.md

agent-introspector

Invocation

Context Preview

SKILL.md

Agent Introspector — Agent Introspection & Debugging

Overview

The Iron Law

When to Use

Phase 1: Decision Trace Analysis

Phase 2: Tool Call Audit

Phase 3: Context Window Inspection

Phase 4: Token Usage Breakdown

Phase 5: Hallucination Detection

Phase 6: Circular Reasoning Detection

Phase 7: Confidence Scoring

Phase 8: Alternative Path Analysis

Phase 9: Performance Profiling

Phase 10: State Dump & Final Verification

Red Flags — Stop and Audit

Signals Your Human Partner Warns You With

Common Rationalizations

Related Skills

Self-Review

Similar Skills

Agent Introspector — Agent Introspection & Debugging

Overview

The Iron Law

When to Use

Phase 1: Decision Trace Analysis

Phase 2: Tool Call Audit

Phase 3: Context Window Inspection

Phase 4: Token Usage Breakdown

Phase 5: Hallucination Detection

Phase 6: Circular Reasoning Detection

Phase 7: Confidence Scoring

Phase 8: Alternative Path Analysis

Phase 9: Performance Profiling

Phase 10: State Dump & Final Verification

Red Flags — Stop and Audit

Signals Your Human Partner Warns You With

Common Rationalizations

Related Skills

Self-Review

Similar Skills