By BenchAGI
BenchAGI agent fleet + skills for Claude Code. Tier D of the harness ladder. 0.5.0: adds bench-forge MCP (forge_submit_diagnostics, forge_ticket_status) so tenant harnesses can file diagnostics as Forge tickets that sync to GitHub issues server-side — no GitHub account needed; new generic servers/bench-http-bridge.js stdio MCP server drives registered HTTP manifests (bench-wiki, bench-canvas, bench-slack, bench-chassis, bench-forge), re-reads ~/.claude/config/bench-cowork.json per call, and auto-refreshes expired cowork tokens via POST /cowork/auth/refresh; new forge-report skill + /forge-report command. bench-excalidraw.json remains draft/unregistered until /excalidraw/* routes land. 0.4.2: fix — agent model pins (piper/cole/bailey/ember) updated from claude-sonnet-4-6-20250819 → claude-sonnet-4-6 alias to recover from inaccessible snapshot. No tool shape changes. 0.4.1: exposes the triage-mail skill in the Claude Code plugin manifest. 0.4.0: adds bench-mail stdio MCP for multi-account Gmail triage (list_accounts, search_threads, get_thread, create_draft, list_labels, create_label, apply_label) — every tool takes an explicit account_email. Customers connect Gmail via apps/web Settings → Integrations → Gmail (Bench Mail). Tokens encrypted at rest with ENCRYPTION_KEY; v1 does NOT request gmail.send. Cycle 6: 7 agents + 8 skills + 4 MCP servers + /bench-login auth + Amendment-10 pre-commit hook. 0.3.5: doc fix — /wiki-capture command frontmatter now references `wiki_draft` (member-auth, single-entry) instead of `wiki_ingest` (cowork-auth, bulk); the SKILL.md and MCP manifest already pointed at wiki_draft since 0.3.1, the command description was the last stale reference. No tool shape changes; wiki_search and wiki_canon_read unchanged. 0.3.4: /bench-login skill now resolves the caller's tenant ID via GET /api/me/identity and caches it as bench_instance_id in the local config; web sidebar avatar links to /app/account. 0.3.3: bench-wiki.wiki_ingest now points at the cowork-authed POST /cowork/wiki/ingest route and hard-shards writes to users/{uid}/wikiEntries/{slug} so Tier D users can bulk-sync a local vault without needing a super-admin key. Entries land as drafts; admin review promotes to platform canon. The prior 0.3.2 super-admin /wiki/ingest route is untouched — it still exists for the OpenClaw daemon. 0.3.2: wiki_draft no longer accepts `instanceId` in the payload — the server derives tenant from auth.instanceId and rejects payload-side claims with 400 (tenant binding). Canon reads (/wiki/canon, /wiki/list, /wiki/[slug]) are now tenant-scoped for members. 0.3.1: adds bench-wiki.wiki_draft tool (member-auth POST /wiki/draft) for conversation-born canon captures. 0.3.0: BREAKING — bench-wiki.wiki_ingest request shape changed from flat object to wrapped { entries: [...] } array; `body` renamed to `markdown`. Server auto-fills localHash/localMtime when omitted. New /wiki/search endpoint + restored wiki_search tool pointing at it. Callers built against 0.2.0 must update.
Delegate a task to Aurelius, the Bench Crew coordinator. Use for cross-team follow-ups, external correspondence drafts, fleet coordination, and anything needing a calm authoritative voice.
Delegate a task to Bailey, your personal-space agent. Inbox triage, follow-ups, scheduling, reminders — scoped to one authenticated Bench user. Refuses harm.
Authenticate the current Claude Code session with BenchAGI so agents know whose canon, inbox, and instance to operate against. Writes the issued token to ~/.claude/config/bench-cowork.json. Required before any @agent or /wiki-capture command.
Delegate a task to Cole, the morning-briefing + pipeline-anomaly agent. Authors the 06:30 MT digest, flags deals drifting from stage SLAs, produces cofounder briefs.
Delegate a task to Ember, the internal field-ops gamification voice. WoW-adventure style — quests, XP, guilds. INTERNAL ONLY. Never use for customer-facing work.
Bench Crew coordinator. Fleet lead, canon author, Slack relay voice, morning digest drafter. Use for cross-team follow-ups, external correspondence drafts, fleet coordination, and anything requiring a calm authoritative voice. Model pinned to Claude Opus 4.7.
Personal-space agent. Helps one authenticated user (by Bench UID) with their personal triage — inbox, follow-ups, scheduling, reminders. Multi-user blocked on D2 + harness architecture in the core product. Voice is Bright, Cheerful, Stable. Refuses harm/evil.
Morning briefing + sales-pipeline anomaly agent. Authors the daily digest Cory reads at 06:30 MT, surfaces pipeline anomalies, flags deals drifting from stage SLAs, and points Aurelius at things that need coordination. Internal-facing; reports to Cory and the BenchAGI founders.
Internal field-ops morale + gamification agent. WoW-adventure voice — quests, XP, guild-speak — for field crew and sales rep engagement. INTERNAL ONLY; never customer-facing. Use for rally-the-crew announcements, Storm XP events, canvasser cheer, and internal launch ceremonies.
Engineering agent. Code review, implementation, refactoring, debugging, PR authoring. Reads the Bench monorepo codebase, follows existing patterns (CLAUDE.md + canon), and ships real diffs. Pinned to Opus for the harder reasoning tasks.
Drafts professional BenchAGI-branded outbound email as a Gmail draft, signed by Aurelius (or another named Bench agent) with a human-approver line and links to the public AI Transparency Audit Log. Use whenever the user asks to send, draft, compose, write, reply to, or follow up on any email on their behalf — especially to partners, customers, investors, cofounders (Jim, Jory), team members, or anyone outside the user. Also trigger on "as Aurelius", "on our behalf", "email them", "shoot a note", "write to", "ping <person>", or any variation implying outbound correspondence. Trigger even without the word "email" — "let them know", "follow up with X", "send a reminder" count when the medium is clearly written. Enforces our AI-transparency posture (every agent-drafted email gets a prepared-by/approved-by signature). Do NOT use for SMS, Slack DMs, in-app notifications, or internal system messages — those have their own channels.
First-run orientation for a new BenchAGI teammate or pilot customer. Walks through what the Bench Crew is, who the agents are, and how to use them. Use when a new user just installed the plugin or is asking "what is this" / "how do I use this".
Use when a Bench user asks how to customize, personalize, configure, re-theme, or adjust the app, workspace, pipeline, agents, dock pins, customer portal, workflows, or experience profile.
File a harness failure as a Forge diagnostics ticket via the bench-forge MCP — no GitHub account needed. Trigger when `benchagi doctor` output shows failing checks, when the user pastes a harness error/runbook worth escalating, or on "file this as a ticket", "report this to Bench", "forge report", "send this to the Forge". Offer it proactively when diagnostics output in the conversation looks report-worthy.
Use a Hammer/Anvil workflow on a complex or ambiguous task. Hammer widens the search space (exploration, architecture, critique, de-risking). Anvil narrows it (implementation, repair, verification). Use for dual-pass work, ambiguous scope, or risky changes where the cost of a wrong first pass is high.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimUses power tools
Uses Bash, Write, or Edit tools
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub benchagi/bench-cowork --plugin bench-coworkUpstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
Binary reverse engineering, malware analysis, firmware security, and software protection research for authorized security research, CTF competitions, and defensive security