Skill

reflexion

Implements reflexion loop for self-correcting deliverables: implement, validate (tests/security/accessibility), self-critique issues, retry up to 3 iterations. Ensures quality in code/content/AI/services.

code-quality

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/mycelium:reflexion

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Self-correcting implementation loop from the n-trax pattern.

SKILL.md

91 lines · ~1.2k tokens

Stats

LanguagePython

Parent stars27

Parent forks3

MaintenanceExcellent

Last CommitMay 9, 2026

Actions

View Source View Plugin View on GitHub View README

Reflexion Skill

Self-correcting implementation loop from the n-trax pattern.

Workflow

Iteration Loop (max 3)

Step 1: Implement

Create the deliverable according to the specification/acceptance criteria.
- Software: write code. Content: write/produce content. AI tool: write prompts/configs. Service: document workflow.
Follow ${CLAUDE_PLUGIN_ROOT}/harness/engineering-principles.md (principles apply to all product types).
Apply patterns from patterns.md.
Check corrections.md for relevant past mistakes.

Step 2: Validate

Software: Run tests, linter, type checker, security scan, accessibility checks (if UI).
- Security validation (OWASP): Check input validation, output encoding, parameterized queries, no hardcoded secrets, authentication/authorization patterns, dependency vulnerabilities. Reference OWASP Top 10:2025 categories for each check.
Content: Review against learning objectives/editorial standards, check accessibility (captions, alt text), fact-check claims.
AI tool: Run eval test cases, red-team testing, bias assessment.
Service: Walk through the service blueprint end-to-end, verify documentation completeness.
All: Verify acceptance criteria.

Step 3: Self-Critique Review the implementation against (select items relevant to product_type):

Engineering principles: DRY, KISS, YAGNI, SoC (apply to all product types)
Security: Input validation, output encoding, no secrets, parameterized queries (software, ai_tool)
Accessibility: Semantic HTML, keyboard nav, contrast, screen reader (software); captions, transcripts, alt text (content)
Edge cases: What happens with unexpected input? Empty? Adversarial? (software, ai_tool)
Error handling / user recovery: Are errors handled gracefully? Can users recover? (software, service)
Quality: Factual accuracy, style consistency, source attribution (content); eval scores, safety scores (ai_tool)
Naming / clarity: Do names reveal intent? Would a new reader understand this? (all)
Completeness: Is anything missing that the user would expect? (all)

Step 4: Decide

If all validations pass AND self-critique finds no issues: DONE
If issues found AND iteration < 3: FIX and return to Step 1
If iteration = 3 AND issues remain: ESCALATE with documented issues

Escalation Protocol

When max iterations reached without full resolution:

Document what was attempted in each iteration.
Document remaining issues with severity assessment.
Recommend: fix now (blocking) vs. fix later (non-blocking) vs. accept risk.
Update corrections.md with learnings.

Verification Modes

The validate step in the reflexion loop should use the appropriate verification mode:

Rules-Based (deterministic)

Linters, formatters, schema validators, type checkers
Pass/fail is unambiguous — no judgment needed
Always run first — fastest and cheapest
Examples: eslint, mypy, yamllint, YAML schema validation against ${CLAUDE_PLUGIN_ROOT}/engine/canvas-guidance.yml

Computational (deterministic)

Test runners, build systems, security scanners
Requires executing code — slower than rules-based
Results are objective but may need interpretation (flaky tests)
Examples: pytest, npm test, cargo clippy, OWASP dependency check

Inferential (probabilistic)

LLM-as-judge, peer review, heuristic evaluation
Used when rules-based and computational verification are insufficient
Results require confidence scoring — never treat as definitive
Examples: /mycelium:devils-advocate, /mycelium:usability-check, auto-dogfood evaluation, design review
The auto-dogfood system is an inferential verification loop

Order: Always attempt rules-based → computational → inferential. Only escalate to the next mode when the previous mode cannot verify the property in question.

Source: Trivedy (Anatomy of an Agent Harness, LangChain blog). Three-mode taxonomy adapted from Böckeler (Harness Engineering, martinfowler.com — computational vs inferential distinction). Note: harnesses continue to matter even as models improve — they engineer systems around model intelligence, not just patch deficiencies.

Rules

Each iteration must show measurable improvement over the previous.
If the same issue recurs across iterations, investigate root cause rather than patching symptoms.
Never skip the self-critique step, even if tests pass.
Log the reflexion loop outcome in delivery-journal.md.

Theory Citations

Reflexion pattern (Shinn et al.)
Clean Code (Martin)
OWASP secure coding
WCAG 2.1 AA

reflexion

Popularity

Invocation

Context Preview

SKILL.md

reflexion

Popularity

Invocation

Context Preview

SKILL.md

Reflexion Skill

Workflow

Iteration Loop (max 3)

Escalation Protocol

Verification Modes

Rules-Based (deterministic)

Computational (deterministic)

Inferential (probabilistic)

Rules

Theory Citations

Similar Skills

Reflexion Skill

Workflow

Iteration Loop (max 3)

Escalation Protocol

Verification Modes

Rules-Based (deterministic)

Computational (deterministic)

Inferential (probabilistic)

Rules

Theory Citations

Similar Skills