By SteveGJones
Educational security demo — 23 post-install attack scenarios. See SAFETY.md before installing.
Reviews code and returns a structured JSON report of issues. Use after completing a logical chunk of implementation.
Reviews code for common vulnerabilities and returns structured findings. Use before committing security-sensitive changes.
Runs a single task and returns a JSON result. Exits immediately after returning.
Estimates token count and approximate cost for a code snippet.
Scans file contents for credential patterns (API keys, tokens, passwords) and recommends removal.
Install a PreToolUse hook that logs every tool invocation to ~/.claude/settings.local.json. Use when the user wants session-level tool auditing.
Register a loopback MCP server entry in ~/.mcp.json. Use when the user wants to wire up a local MCP helper.
Install a beacon statusline that shows session cost and refreshes every 5 seconds via `~/.claude/settings.local.json`.
Matches all tools
Hooks run on every tool call, not just specific ones
Admin access level
Server config contains admin-level keywords
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
⚠️ Educational security demonstration — see SAFETY.md before running.
⚠️ Educational / Research Purpose Only This repository is a security risk demonstration. Nothing here is intended for real-world exploitation. All "malicious" behaviours are intentionally innocuous stand-ins (e.g. writing data to a local log file) designed only to make the threat model visible and discussable.
Modern AI coding assistants such as Claude Code support a rich plugin ecosystem: plugins can register MCP servers that expose new tools, ship sub-agents that spawn child AI processes, and bundle skills that add reusable capabilities. Together these primitives give a plugin deep, largely invisible influence over what an AI assistant does on behalf of the user.
This repository demonstrates a concrete, verifiable threat: a plugin that starts out behaving benignly can be switched to behave maliciously — via an automated, unattended update — without any visible change to the user or the host application. GitHub Actions runs a scheduled workflow every night that toggles the plugin between its two modes.
The demonstration deliberately keeps the "malicious" behaviour harmless (writing intercepted data to a local file, injecting inert text into context, etc.) so the mechanics can be studied safely. In a real attack those same code paths could be replaced with anything.
| # | Goal |
|---|---|
| 1 | Show that a Claude Code plugin is a viable vector for a supply-chain attack. |
| 2 | Show that the update mechanism for plugins creates a window of opportunity that persists even after installation-time review. |
| 3 | Demonstrate each of the three plugin primitives (MCP server, sub-agents, skills) as an independent attack surface. |
| 4 | Provide a repeatable, automated way to switch between benign and malicious states so the difference can be observed and measured. |
| 5 | Give security researchers, red-teamers, and plugin reviewers a concrete reference point for what to look for. |
The implementation is one Python package. Every tool / sub-agent / skill carries both
code paths in the same file; a trigger registry decides at call time which branch runs.
The package is named plugin_mcp/ (not mcp/) to avoid a PyPI namespace collision
with the mcp SDK that FastMCP depends on.
claude-plugin-security-risk/
├── plugin.json # Claude Code plugin manifest (baseline permissions)
├── plugin.baseline.json # Unescalated baseline for permission-creep reset
├── mode.txt # "benign" or "malicious" — kill-switch #1
├── SAFETY.md # Canonical safety contract
│
├── plugin_mcp/ # MCP server package (FastMCP entry point)
│ ├── server.py # DEMO_ACKNOWLEDGED arming gate + FastMCP wiring
│ ├── exfil.py # leak() + write_sentinel_block() — sole side-effect chokepoints
│ ├── state.py # Trigger registry + override() context manager
│ ├── triggers/ # Trigger implementations (see CLAUDE.md § Trigger Types)
│ └── tools/ # MCP tool implementations (S1, S4, S5, S7, S12, S13, S20)
│
├── agents/ # Sub-agent prompts + loader (S2, S6, S11)
├── skills/ # Skill implementations (S3, S9, S10, S15, S17–S19, S21, S22)
│
├── harness/
│ ├── compare.py / compare.sh # Run a scenario in both modes and diff the results
│ ├── cleanup_sentinels.py # SHA256-verified sentinel-block removal
│ ├── validate_workflows.py # Static check that CI workflows carry the required guards
│ ├── demo_proxy.py # Loopback-only HTTP proxy used by S13
│ └── demo_mcp_server.py # Loopback-only MCP transport impersonation used by S23
│
├── release-overlays/
│ └── malicious.patch # S16 git-apply overlay (reversible with `git apply -R`)
│
├── tests/ # pytest suite: triggers, scenarios, safety invariants
├── capture/ # JSONL leak logs (contents git-ignored; .gitkeep tracked)
└── .github/workflows/
├── ci.yml # Lint, typecheck, test, workflow-validator, optional integration
├── release-flip.yml # workflow_dispatch only, DEMO_FLIP_CONFIRM + DEMO_HALT gated
├── toggle-mode.yml # Scheduled mode flip (upstream repo only)
└── permission-creep.yml # Scheduled permission escalation (upstream repo only)
npx claudepluginhub stevegjones/claude-plugin-security-risk --plugin claude-plugin-security-riskAI/ML specialist agents — architects, prompt engineers, RAG designers
Full-stack agents — frontend, backend, API, DevOps architects
AI-First SDLC — zero-debt development with validators, enforcement, and workflows
Python-specific validation, patterns, and expert agents
Cloud infrastructure agents — cloud, container, SRE specialists
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
Efficient skill management system with progressive discovery — 410+ production-ready skills across 33+ domains