Universal Agent Plugins & Skills Ecosystem
Project Overview
A strictly cross-platform (Windows, Mac, Ubuntu) library that serves as the universal upstream source for reusable AI agent plugins and skills across multiple IDEs and agent frameworks:
- Claude Code, GitHub Copilot, Gemini CLI, Antigravity, Roo Code, Windsurf, Cursor, and other compliant integrations.
- Now universally supporting the single
.agents/ folder standard (no duplicate copies needed for .github, .gemini, .agent, etc).
120 skills across 29 plugins — all maintained from a single hub-and-spoke source tree.
Core Philosophy: Transitional Architectures & Decoupled Skills
This repository is built on a pragmatic acceptance of the current AI engineering landscape: the ecosystem changes weekly, and workflows that were revolutionary six months ago are obsolete today.
Frameworks like agent-agentic-os, spec-kitty, and agent-execution-disciplines are treated as Transitional Architectures — bridges between what agents need to do today and what native SDKs will eventually handle. When Anthropic, Google, and GitHub harden native memory persistence, execution safety, and multi-agent orchestration, large swaths of this tooling will be happily discarded.
Skills are Applications; the SDK is the OS. Individual skills must function in complete isolation — no hard dependencies on sibling plugins, no assumptions about which framework is running.
Installation
[!IMPORTANT]
Start here — fresh clone or first-time setup. The single .agents/ environment directory is not committed to your repo. It will be empty by default.
All installation methods (uvx, bootstrap.py, npx skills, and Claude Marketplace) are now consolidated in a single authoritative guide:
Architecture Highlights
Triple-Loop Autonomous Skill Improvement
The agent-agentic-os plugin implements a Triple-Loop architecture for continuous, autonomous skill improvement:
| Layer | Agent | Role |
|---|
| L0 | triple-loop-architect (Claude) | Interactive setup: scaffolds isolated sibling lab, seeds all files, launches L1 |
| L1 | Gemini CLI (gemini --yolo --model gemini-3-flash-preview) | Headless orchestrator: reads eval-instructions.md, runs the loop, gates via evaluate.py |
| L2 | Copilot CLI (gpt-5-mini) | Cheap mutation proposer: proposes SKILL.md edits using free Copilot quota |
The loop is autonomous and cost-effective: L2 uses GitHub Copilot's gpt-5-mini (free quota), enabling 20–80 mutation proposals per run at near-zero cost. L1 (Gemini Flash) orchestrates unattended overnight. evaluate.py is the absolute gate — exit 0 = KEEP, exit 1 = DISCARD + auto-revert.
Not all skills are good candidates — the best targets have clear, objective routing criteria and adversarial eval cases. Use eval-autoresearch-fit to score a skill before running a loop.
To start a loop on any skill:
@triple-loop-architect
Kick off a 10-iteration Triple-Loop optimization run targeting the `<skill-name>` skill
inside the `<plugin-folder>` plugin. Use gemini-3-flash-preview as L1 and gpt-5-mini as L2.
See the full sample prompt: references/sample-prompts/triple-loop-architect-prompt.md
Live example — convert-mermaid skill, 26 iterations across 2 rounds: 0.61 → 1.00

Each blue diamond is a baseline anchor (one per session). Green = new best score. Amber = kept but not a record. The two-segment shape shows a fresh re-baseline for round 2 — the plotter handles this automatically.
Monitor a live run: python3 plugins/agent-agentic-os/scripts/plot_eval_progress.py --tsv <lab>/evals/ --live
Flywheel layers:
- OUTER flywheel (
os-improvement-loop): improves OS-level protocols and session ledgers between sessions
- INNER flywheel (
os-eval-runner + os-skill-improvement): improves individual skill routing accuracy within a session
- Overnight (
os-nightly-evolver): runs the INNER flywheel unattended — see agents/os-nightly-evolver.md
Karpathy Autoresearch Loop
Skills that score HIGH on the autoresearch viability rubric (objectivity + speed + frequency + utility) can run fully autonomous self-improvement loops:
mutate SKILL.md → evaluate.py → exit 0 (KEEP) or exit 1 (DISCARD) → repeat
Ecosystem Fitness Sweep v1 is complete — all 116/120 production skills scored for autoresearch viability. Results: