From plugin-creator
Provides knowledge reference for Autonomous Refinement Loop (ARL) research on prerequisites, failure categories, and 10 machine-verifiable gates for human-out-of-the-loop (HOOTL) AI execution. Use when designing or evaluating autonomous agent loops and gate conditions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/plugin-creator:arlopusThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Autonomous Refinement Loop (ARL)** is pattern research into what an AI assistant needs — in information, tools, verification mechanisms, access to external resources, and knowledge of past failures — to produce outcomes that match the user's vision without requiring the human to be a synchronous blocking gate during execution.
Autonomous Refinement Loop (ARL) is pattern research into what an AI assistant needs — in information, tools, verification mechanisms, access to external resources, and knowledge of past failures — to produce outcomes that match the user's vision without requiring the human to be a synchronous blocking gate during execution.
The foundational question:
What determines whether an AI can produce a satisfactory outcome for a given piece of work, and how do we ensure those prerequisites are met before and during execution?
ARL is not a process to run. It is a body of research that informs how processes (like SAM) should be designed, and what conditions enable autonomous execution.
SOURCE: Autonomous Refinement Loop
This document provides:
When working on improvements, refinement, or autonomous execution tasks, consult this reference to understand what prerequisites must be in place, what could fail without them, and how the gates work together.
DESIGN GOAL — The concept describes the desired outcome of ARL research.
HOOTL means achieving human-in-the-loop outcome quality with human-out-of-the-loop execution.
Breaking this down:
The quality bar is HOOTL success: the artifact meets the same acceptance criteria as if a human reviewed every intermediate step, but the human did not have to be present synchronously to approve that work.
SOURCE: HOOTL: Human Out Of The Loop
DESIGN GOAL — This architecture describes how HOOTL execution is designed. The research body is observed and evidence-backed. The execution model and observation layer are design goals being researched.
The empirical findings from cross-framework analysis:
How HOOTL execution works in practice:
Passive agents monitoring execution in real-time:
agentskill-kaizen is the current implementation of this layer in post-hoc mode (mining historical transcripts after sessions complete). The ARL vision extends it to real-time observation during execution.
SOURCE: Three Layers of ARL
The three together form a research cycle: ARL hypothesizes, SAM applies, agentskill-kaizen validates.
SOURCE: Relationship Triangle
ARL researches what it takes to move AI-human interactions from blocking, high-friction to asynchronous, low-friction. The spectrum, from worst to best:
The goal is moving as many interactions as possible to level 4. Level 4 is HOOTL: the human gets quality outcomes without being a synchronous blocking gate.
SOURCE: Interaction Spectrum
These gates formalize the machine-verifiable conditions that replace human judgment at key points in an iterative refinement loop.
| Gate | What It Checks | When It Fires | What Failure Looks Like |
|---|---|---|---|
| R1: Information Completeness | Sufficient context to operate loop without escalation | Loop entry, re-entry after escalation | Loop proceeds with gaps, agent hallucinate-fills missing information, produces fluent but wrong artifacts |
| R2: Loop Detection | Oscillating, stalling, or exceeding resource bounds | Start of each iteration before assessment | Loop runs indefinitely without converging, fix A breaks B repeatedly, same findings recurring |
| R3: Validity Filtering | Findings have verifiable evidence (file:line citations) | After assessment, before planning | False positives consume iteration budget, regressions introduced, phantom issues trigger changes |
| R4: Plan Quality | Plan internally consistent, addresses actual findings | After planning, before implementation | Inconsistent plan proceeds, addresses wrong findings, changes must be reverted |
| R5: Purpose Anchor | Artifact still serves original stated purpose | Captured at iteration 0, checked each iteration | After N iterations, artifact optimized for assessment metrics but no longer serves original use case |
| R6: Content-Loss Detection | All semantic units preserved after changes | After implementation, before next iteration | Refactoring removes sections deemed "redundant", no gate catches removal, human discovers loss later |
| R7: Convergence Tracking | Findings decreasing, stable, or alternating across iterations | Each iteration boundary after assessment | Loop cannot determine progress, fixes trivial issues indefinitely, or oscillates without converging |
| R8: Proportionality Check | Proposed fix proportional to finding severity | During plan quality gate (R4) | Low-severity finding triggers high-scope change that introduces risk without proportional benefit |
| R9: Downstream Impact | All references still resolve after changes | After implementation, alongside R6 | Refactoring renames a file, breaks three other components linking to old path, not detected until runtime |
| R10: Split Justification | New component independently viable, not just parent-dependent | When plan proposes splitting content into separate artifacts | Component split into three pieces, two only used from parent, adds navigation complexity without value |
SOURCE: The 10 Gates
| Coverage Level | R-Requirements | What Exists Today |
|---|---|---|
| Import directly | R1, R3, R4 | RT-ICA (SAM), GAN-inspired validation (Octocode/BMAD), 7-dimension plan checking (GSD) |
| Partial coverage | R2, R5, R9 | Bounded iteration count (GSD), objective injection (Ralph), downstream impact analysis (Octocode) |
| Build from scratch | R6, R7, R8, R10 | No framework provides content-loss detection, convergence tracking, proportionality checks, or split justification |
The 4 build-from-scratch requirements all emerge specifically from iterative refinement — they are invisible to single-pass pipeline designs.
SOURCE: Gate Coverage Across Existing Frameworks
A human gate can potentially be replaced by machine-verifiable conditions when ALL of the following hold:
When ANY of these conditions fails, the gate requires either human judgment or a more sophisticated verification mechanism (adversarial review, cross-examination between independent agents, or escalation).
Evidence status: These 4 conditions were synthesized from cross-framework evidence. They correlate with gates classified as eliminable, but have not been tested as a predictive model.
SOURCE: Decision Tree
Seven patterns discovered across all 6 frameworks that apply to any autonomous development system:
SOURCE: Universal Principles
The same type of human judgment can be eliminable in one context and irreducible in another. The determining factors are:
| Scope Clarity | Goal Measurability | Data Enumeration | Human Gates Eliminable? |
|---|---|---|---|
| High (specific tool, platform, use case) | High (binary pass/fail, checklist) | High (official docs, known examples) | Yes — Autonomous loop feasible |
| Medium (domain-specific best practices) | Medium (scoring with weights) | Medium (reference examples, community patterns) | Partial — Autonomous with periodic human checkpoints |
| Low (general improvement, meta-goals) | Low (subjective, emergent criteria) | Low (unknown what "complete" means) | No — Requires human at each decision point |
This means ARL cannot be applied uniformly. A scope-classification step must precede any attempt at autonomous operation.
SOURCE: Key Findings
Complete detail on each gate, framework patterns, and prerequisites:
npx claudepluginhub jamie-bitflight/claude_skills --plugin plugin-creatorDesigns agent UX patterns and human-in-the-loop flows: autonomy levels (L0-L5), inbox pattern, progressive trust, decision journals, gate reviews. Six-phase methodology from pain point to data model.
Designs autonomous agent harnesses with research loops, evaluation scaffolds, locked/editable surfaces, durable logs, novelty gates, pruning, rollback, and human approval boundaries.
Runs Karpathy-inspired autonomous iteration loops on any task: modify, verify, keep/discard, repeat. Subcommands for planning, debugging, fixing, security audits, shipping.