By richfrem
An opinionated learning layer and harnessing discipline above what Claude Code ships natively. Provides a structured memory hierarchy, a continuous improvement loop for model instructions, and multi-agent event bus coordination. Designed for developers running long-horizon workflows who need a cohesive feedback control system rather than isolated orchestration primitives.
Trigger with "use the agentic-os-setup agent", "run the setup agent", "set up an agentic OS", "persist memory", "add the OS harness", or when the user requires memory persistence, repository-level conventions, or autonomous background loops. Directs the orchestration, synthesis, and provisioning of a persistent AI environment. <example> Context: User wants to initialize their project for AI agents. user: "Can you help me set up an agentic OS in this folder?" assistant: "I'll use the agentic-os-setup agent to handle the full orchestration for you." <commentary> User requesting specific specialized task execution. Trigger agent. </commentary> </example> <example> Context: A non-technical user wants the AI to remember things. user: "How do I get Claude to persist its memory in my repo between sessions?" assistant: "I'll launch the agentic-os-setup agent to scaffold a persistent memory environment for you." <commentary> User asking for a core Agentic OS feature (persistence). Trigger agent. </commentary> </example> <example> Context: User has an existing codebase but no .claude config. user: "I already have a big project here, can you just add the OS harness without breaking it?" assistant: "Yes, I will run the agentic-os-setup agent to carefully layer the Agentic OS into your existing project." <commentary> Partial setup / integration requested. Trigger agent. </commentary> </example>
Trigger with "run health check", "check os metrics", "system monitor", or when the user wants to review the Agentic OS liveness metrics across the Event Bus, locks, and memory arrays. <example> user: "Run a system monitor check on the OS." assistant: "I'll execute the os-health-check agent to scan the event bus and state file." <commentary> User explicitly requested a system diagnostic, triggering the health check agent. </commentary> </example>
Interactive entry point for starting a skill evaluation loop via the Triple-Loop Learning System. Trigger with "eval [skill]", "evaluate [skill]", "run eval on [skill]", "setup triple-loop lab for [skill]". Handles full setup using the canonical Sibling Repo Labs protocol (creates an isolated repo for safe iteration). <example> Context: User wants to start an eval loop on a skill safely. user: "eval using-git-worktrees" assistant: [triggers triple-loop-architect, resolves skill path, scaffolds sibling lab repo, prepares evals] </example>
Unattended overnight Triple-Loop Learning orchestrator. Oversees the autonomous INNER looping (Strategic Double-Loop and Tactical Single-Loop) on a target skill in its isolated sibling lab. Uses Gemini or Copilot CLI for proposals, gated strictly by objective `evaluate.py` performance. Trigger with "trigger the triple-loop-orchestrator on [skill] for [N] iterations", or "run orchestrator all night on [skill]". <example> Context: User wants to improve a skill headlessly. user: "Trigger triple-loop-orchestrator on link-checker for 80 iterations." assistant: "Launching the Triple-Loop Orchestrator to oversee unattended iterations on the link-checker lab..." </example>
Trigger with "/os-clean-locks", "clear all locks", "reset agent locks", or when an agent is deadlocked and cannot acquire a lock because a previous agent crashed and left a stale lock behind in `context/.locks/`. <example> Context: User is seeing errors about locks already existing. user: "/os-clean-locks" assistant: <Bash> rm -r context/.locks/ python3 context/kernel.py state_update active_agent os-clean-locks </Bash> </example> <example> Context: Agent detects a deadlock when trying to acquire a lock during a task. assistant: [autonomously] "The acquire_lock call for 'memory' failed -- a prior agent likely crashed and left a stale lock. I'll invoke os-clean-locks to clear it before retrying." <commentary> Implicit audit trigger -- agent detects deadlock from kernel output and self-heals using os-clean-locks without user prompting. </commentary> </example>
Reviews a completed os-eval-runner lab run and backports approved changes to master plugin sources. Trigger with "backport the eval results", "review the lab run", "apply eval improvements to master", "check what the eval agent changed".
Bootstraps a skill evaluation lab repo for an autoresearch improvement run. Trigger with "set up an eval lab", "bootstrap the eval repo", "prepare the test repo for skill evaluation", "create an eval environment for this skill", "set up the lab space for this skill", or when starting a new skill optimization run that needs a standalone test environment. <example> Context: User wants to start an improvement run on a skill in an isolated lab repo. user: "Set up an eval lab for the link-checker skill" assistant: [triggers os-eval-lab, runs intake interview, bootstraps lab repo, installs engine, copies plugin files, generates eval-instructions.md] </example> <example> Context: User has a lab repo but needs it configured. user: "Prepare the test repo at <USER_HOME>/Projects/test-my-skill-eval for skill evaluation" assistant: [triggers os-eval-lab, installs engine, copies plugin files, generates eval-instructions.md] </example>
Trigger: "evaluate this skill", "run autoresearch loop on", "optimize this skill". Use when an agent proposes a change to an existing skill and needs empirical validation. <example> Context: Start autonomous improvement loop on a skill. user: "Run the autoresearch loop on <SKILL_PATH> for 20 iterations" assistant: [triggers os-eval-runner, runs Mode 1 intake] </example> <example> Context: Incomplete optimize request. user: "Optimize the commit skill" assistant: [triggers os-eval-runner, runs Phase 0 intake interview] </example> <example> Context: `Triple-Loop Retrospective` proposes a skill edit. assistant: [autonomously] "Before I apply this description change, I'll run os-eval-runner to confirm." </example> <example> Context: An agent is asking for general information about a skill, not evaluating a proposed change. agentic-os-setup: "Tell me about the os-clean-locks skill." assistant: "It cleans up stale lock files..." </example>
Trigger with "explain agentic os", "how do I set up a persistent agent environment", "what is the CLAUDE.md hierarchy", "explain the context folder structure", "how does session memory work", "what is soul.md or user.md", "explain auto-memory or MEMORY.md", "what is a loop scheduler or heartbeat", or when the user asks for the canonical guide.
Matches all tools
Hooks run on every tool call, not just specific ones
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A strictly cross-platform (Windows, Mac, Ubuntu) library that serves as the universal upstream source for reusable AI agent plugins and skills across multiple IDEs and agent frameworks:
.agents/ folder standard (no duplicate copies needed for .github, .gemini, .agent, etc).120 skills across 29 plugins — all maintained from a single hub-and-spoke source tree.
This repository is built on a pragmatic acceptance of the current AI engineering landscape: the ecosystem changes weekly, and workflows that were revolutionary six months ago are obsolete today.
Frameworks like agent-agentic-os, spec-kitty, and agent-execution-disciplines are treated as Transitional Architectures — bridges between what agents need to do today and what native SDKs will eventually handle. When Anthropic, Google, and GitHub harden native memory persistence, execution safety, and multi-agent orchestration, large swaths of this tooling will be happily discarded.
Skills are Applications; the SDK is the OS. Individual skills must function in complete isolation — no hard dependencies on sibling plugins, no assumptions about which framework is running.
[!IMPORTANT] Start here — fresh clone or first-time setup. The single
.agents/environment directory is not committed to your repo. It will be empty by default.All installation methods (uvx, bootstrap.py, npx skills, and Claude Marketplace) are now consolidated in a single authoritative guide:
👉 Go to INSTALL.md
The agent-agentic-os plugin implements a Triple-Loop architecture for continuous, autonomous skill improvement:
| Layer | Agent | Role |
|---|---|---|
| L0 | triple-loop-architect (Claude) | Interactive setup: scaffolds isolated sibling lab, seeds all files, launches L1 |
| L1 | Gemini CLI (gemini --yolo --model gemini-3-flash-preview) | Headless orchestrator: reads eval-instructions.md, runs the loop, gates via evaluate.py |
| L2 | Copilot CLI (gpt-5-mini) | Cheap mutation proposer: proposes SKILL.md edits using free Copilot quota |
The loop is autonomous and cost-effective: L2 uses GitHub Copilot's gpt-5-mini (free quota), enabling 20–80 mutation proposals per run at near-zero cost. L1 (Gemini Flash) orchestrates unattended overnight. evaluate.py is the absolute gate — exit 0 = KEEP, exit 1 = DISCARD + auto-revert.
Not all skills are good candidates — the best targets have clear, objective routing criteria and adversarial eval cases. Use eval-autoresearch-fit to score a skill before running a loop.
To start a loop on any skill:
@triple-loop-architect
Kick off a 10-iteration Triple-Loop optimization run targeting the `<skill-name>` skill
inside the `<plugin-folder>` plugin. Use gemini-3-flash-preview as L1 and gpt-5-mini as L2.
See the full sample prompt: references/sample-prompts/triple-loop-architect-prompt.md
Live example — convert-mermaid skill, 26 iterations across 2 rounds: 0.61 → 1.00

Each blue diamond is a baseline anchor (one per session). Green = new best score. Amber = kept but not a record. The two-segment shape shows a fresh re-baseline for round 2 — the plotter handles this automatically.
Monitor a live run: python3 plugins/agent-agentic-os/scripts/plot_eval_progress.py --tsv <lab>/evals/ --live
Flywheel layers:
os-improvement-loop): improves OS-level protocols and session ledgers between sessionsos-eval-runner + os-skill-improvement): improves individual skill routing accuracy within a sessionos-nightly-evolver): runs the INNER flywheel unattended — see agents/os-nightly-evolver.mdSkills that score HIGH on the autoresearch viability rubric (objectivity + speed + frequency + utility) can run fully autonomous self-improvement loops:
mutate SKILL.md → evaluate.py → exit 0 (KEEP) or exit 1 (DISCARD) → repeat
Ecosystem Fitness Sweep v1 is complete — all 116/120 production skills scored for autoresearch viability. Results:
Orchestration hub for the plugin ecosystem. Manages structural audits, agent environment sync (install + cleanup orphans), cross-repo plugin replication, and Universal Bridge Installation to adapt standard plugins to 30+ target environments (GitHub Copilot, Gemini, Cursor, Roo, etc).
Python dependency management with pip-compile locked-file workflow and tiered hierarchy for Python backends.
Meta-plugin containing the ecosystem generation primitives. Includes scaffolding for Agent Skills, Plugins, CLI sub-agents, autonomous GitHub workflows, Azure Foundry agents, and more.
Spec-Driven Development lifecycle + Universal Bridge sync engine — the flagship workflow plugin for AI-assisted feature development
Autonomous discovery, business requirements capture, and prototyping loop for spec-driven engineering.
npx claudepluginhub richfrem/agent-plugins-skills --plugin agent-agentic-osUnified capability management center for Skills, Agents, and Commands.
Automatically detects and loads AGENTS.md files to provide agent-specific instructions alongside CLAUDE.md. Enables specialized agent behaviors without manual intervention.
This skill should be used when the model's ROLE_TYPE is orchestrator and needs to delegate tasks to specialist sub-agents. Provides scientific delegation framework ensuring world-building context (WHERE, WHAT, WHY) while preserving agent autonomy in implementation decisions (HOW). Use when planning task delegation, structuring sub-agent prompts, or coordinating multi-agent workflows.
HelloAGENTS — The orchestration kernel that makes any AI CLI smarter. Adds intelligent routing, unified QA gates, safety guards, and notifications.
Multi-agent collaboration plugin for Claude Code. Spawn N parallel subagents that compete on code optimization, content drafts, research approaches, or any problem that benefits from diverse solutions. Evaluate by metric or LLM judge, merge the winner. 7 slash commands, agent templates, git DAG orchestration, message board coordination.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.