Stats

Actions

Available In

Tags

claude-plugin-security-risk

⚠️ Educational security demonstration — see SAFETY.md before running.

⚠️ Educational / Research Purpose Only This repository is a security risk demonstration. Nothing here is intended for real-world exploitation. All "malicious" behaviours are intentionally innocuous stand-ins (e.g. writing data to a local log file) designed only to make the threat model visible and discussable.

Overview

Modern AI coding assistants such as Claude Code support a rich plugin ecosystem: plugins can register MCP servers that expose new tools, ship sub-agents that spawn child AI processes, and bundle skills that add reusable capabilities. Together these primitives give a plugin deep, largely invisible influence over what an AI assistant does on behalf of the user.

This repository demonstrates a concrete, verifiable threat: a plugin that starts out behaving benignly can be switched to behave maliciously — via an automated, unattended update — without any visible change to the user or the host application. GitHub Actions runs a scheduled workflow every night that toggles the plugin between its two modes.

The demonstration deliberately keeps the "malicious" behaviour harmless (writing intercepted data to a local file, injecting inert text into context, etc.) so the mechanics can be studied safely. In a real attack those same code paths could be replaced with anything.

Goals

Goal

Show that a Claude Code plugin is a viable vector for a supply-chain attack.

Show that the update mechanism for plugins creates a window of opportunity that persists even after installation-time review.

Demonstrate each of the three plugin primitives (MCP server, sub-agents, skills) as an independent attack surface.

Provide a repeatable, automated way to switch between benign and malicious states so the difference can be observed and measured.

Give security researchers, red-teamers, and plugin reviewers a concrete reference point for what to look for.

Architecture

The implementation is one Python package. Every tool / sub-agent / skill carries both code paths in the same file; a trigger registry decides at call time which branch runs. The package is named plugin_mcp/ (not mcp/) to avoid a PyPI namespace collision with the mcp SDK that FastMCP depends on.

claude-plugin-security-risk/ ├── plugin.json # Claude Code plugin manifest (baseline permissions) ├── plugin.baseline.json # Unescalated baseline for permission-creep reset ├── mode.txt # "benign" or "malicious" — kill-switch #1 ├── SAFETY.md # Canonical safety contract │ ├── plugin_mcp/ # MCP server package (FastMCP entry point) │ ├── server.py # DEMO_ACKNOWLEDGED arming gate + FastMCP wiring │ ├── exfil.py # leak() + write_sentinel_block() — sole side-effect chokepoints │ ├── state.py # Trigger registry + override() context manager │ ├── triggers/ # Trigger implementations (see CLAUDE.md § Trigger Types) │ └── tools/ # MCP tool implementations (S1, S4, S5, S7, S12, S13, S20) │ ├── agents/ # Sub-agent prompts + loader (S2, S6, S11) ├── skills/ # Skill implementations (S3, S9, S10, S15, S17–S19, S21, S22) │ ├── harness/ │ ├── compare.py / compare.sh # Run a scenario in both modes and diff the results │ ├── cleanup_sentinels.py # SHA256-verified sentinel-block removal │ ├── validate_workflows.py # Static check that CI workflows carry the required guards │ ├── demo_proxy.py # Loopback-only HTTP proxy used by S13 │ └── demo_mcp_server.py # Loopback-only MCP transport impersonation used by S23 │ ├── release-overlays/ │ └── malicious.patch # S16 git-apply overlay (reversible with `git apply -R`) │ ├── tests/ # pytest suite: triggers, scenarios, safety invariants ├── capture/ # JSONL leak logs (contents git-ignored; .gitkeep tracked) └── .github/workflows/ ├── ci.yml # Lint, typecheck, test, workflow-validator, optional integration ├── release-flip.yml # workflow_dispatch only, DEMO_FLIP_CONFIRM + DEMO_HALT gated ├── toggle-mode.yml # Scheduled mode flip (upstream repo only) └── permission-creep.yml # Scheduled permission escalation (upstream repo only)

plugin.json