From agent-security
Scans AI-agent, LLM, MCP-server, and tool-calling code for security vulnerabilities using ast-grep rules, then triages findings with concrete fixes. Catches prompt injection, hardcoded credentials, denial-of-wallet, tool poisoning, SSRF, and more.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-security:agent-security-review [path-to-code][path-to-code]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run the raxIT **agent-security-review** pack (an [ast-grep](https://ast-grep.github.io/) ruleset) against AI-agent / LLM / MCP code, then turn the matches into a prioritized report with the architectural fix for each finding.
references/rules-vendored/rules/agent/agent-without-bounded-loop-ts.ymlreferences/rules-vendored/rules/agent/agent-without-bounded-loop.ymlreferences/rules-vendored/rules/agent/god-agent-tool-count.ymlreferences/rules-vendored/rules/agent/llm-call-without-timeout-ts.ymlreferences/rules-vendored/rules/agent/llm-call-without-timeout.ymlreferences/rules-vendored/rules/agent/rule-of-two-violation.ymlreferences/rules-vendored/rules/agent/runner-without-max-turns-openai-agents.ymlreferences/rules-vendored/rules/agent/runner-without-timeout-openai-agents.ymlreferences/rules-vendored/rules/control-flow/llm-output-to-control-flow.ymlreferences/rules-vendored/rules/control-flow/llm-output-to-sink.ymlreferences/rules-vendored/rules/control-flow/llm-provider-missing-moderation-ts.ymlreferences/rules-vendored/rules/control-flow/llm-provider-missing-moderation.ymlreferences/rules-vendored/rules/control-flow/log-output-with-secrets.ymlreferences/rules-vendored/rules/control-flow/tool-dispatcher-without-policy-gate-ts.ymlreferences/rules-vendored/rules/control-flow/tool-dispatcher-without-policy-gate.ymlreferences/rules-vendored/rules/control-flow/tool-without-policy-gate-openai-agents.ymlreferences/rules-vendored/rules/gateway/a2a-communication-without-auth.ymlreferences/rules-vendored/rules/gateway/direct-tool-bypasses-gateway.ymlreferences/rules-vendored/rules/gateway/gateway-unauthenticated-endpoint.ymlreferences/rules-vendored/rules/gateway/gateway-without-policy-engine.ymlRun the raxIT agent-security-review pack (an ast-grep ruleset) against AI-agent / LLM / MCP code, then turn the matches into a prioritized report with the architectural fix for each finding.
This deliberately pairs two things: a deterministic scanner (fast, reproducible, consistent) and your own review (catches what patterns can't). Treat the scanner as a floor, not a ceiling — ast-grep matches fixed patterns, so it reliably finds the shapes it knows but silently misses variants and whole classes it has no rule for. A report that only repeats scanner output will quietly miss real, obvious vulnerabilities. Your job is to run the scanner and read the code, then produce a report the user can trust.
One thing this pack is not: a general-purpose SAST tool. Every rule targets an agent failure mode — prompt injection, tool dispatch, agent loops, MCP, memory, agent identity. That focus is the point, and it means the most important judgment call you make is whether the target is even an agentic codebase (see step 3). On a plain library or a model-training repo, a clean scan is the correct answer, not a gap to paper over.
The rules are pulled live from GitHub so the checks stay current as the pack grows, with a bundled copy as an offline fallback. More on why below.
The scan needs the ast-grep CLI. Check with command -v ast-grep. If it's missing, install it with whatever package manager the user has:
brew install ast-grep (macOS)npm i -g @ast-grep/clipip install ast-grep-cli (or uv tool install ast-grep-cli)cargo install ast-grep --lockedInstalling software changes the user's machine, so on an unfamiliar or shared setup, say what you're about to run before you run it.
Resolve the target path first: if the skill was invoked as /agent-security-review <path>, the path is $ARGUMENTS. If $ARGUMENTS is empty, use the path the user named, or the current directory (.) as the default.
bash ${CLAUDE_SKILL_DIR}/scripts/scan.sh <path-to-code> > findings.json
scan.sh fetches the latest rules from GitHub (main), caches them locally, and falls back to the bundled copy in references/rules-vendored/ if GitHub is unreachable. It writes ast-grep's JSON findings to stdout.
For a reproducible scan (CI, audits), pin a commit or tag so the result can't drift:
bash ${CLAUDE_SKILL_DIR}/scripts/scan.sh <path-to-code> --pin <sha-or-tag> > findings.json
This step exists because the pack is agent-specific. Forcing an agent-security narrative onto code that has no agents produces a confident, useless report — the worst outcome. So classify the target first, then scope the review to match. Use what you can see: dependencies (pyproject.toml / requirements.txt / package.json), imports, and the actual entry points — not the repo's marketing.
Sort the target into one of three buckets:
Agentic — it builds or runs an agent / LLM app / MCP server / tool-calling system. Signals: an agent framework (LangChain/LangGraph, OpenAI Agents SDK, CrewAI, LlamaIndex, AutoGen/AG2, Anthropic SDK with tools), an MCP server/client, a tool/function dispatch loop, retrieval + generation, or prompts assembled from untrusted input. → Full review. Every component applies; do the scan and the manual pass.
AI-adjacent, not an agent — it touches LLMs but isn't an agent: a model-training / fine-tuning library, an inference kernel, an eval harness, a thin SDK wrapper, a prompt-template collection. → Scoped review. State plainly that the agent ruleset mostly doesn't apply and why. Report only what genuinely transfers (hardcoded secrets, eval/exec on external input, SSRF, unsafe deserialization of model files are universal). Do not manufacture agent-architecture findings (no "missing gateway auth" on a library that has no gateway).
Not AI at all — a plain library, CLI, web app, or data pipeline with no LLM/agent surface. → Out of scope. Say so in two sentences, note that the scan found nothing agent-specific (it won't have), and suggest a general SAST tool (Semgrep, CodeQL, Bandit) instead. Stop. A short honest "this isn't what the pack is for" beats a padded report every time.
When you're genuinely unsure, say which bucket you picked and why, and let the user redirect. Misclassifying down (treating an agent as a library) is worse than asking.
uv run ${CLAUDE_SKILL_DIR}/scripts/report.py findings.json # or: ... | uv run ${CLAUDE_SKILL_DIR}/scripts/report.py -
report.py groups the scanner's matches by severity and surfaces the "Architectural fix:" each rule carries. That's the deterministic floor. Now make the report worth trusting:
eval, SQL, shell, unsafe/raw-HTML rendering), SSRF and path traversal, secrets written to logs, unbounded agent loops, and tool dispatch with no authorization. If you find one the scanner didn't, include it and label it manual.tests/, evals/, examples/, conftest.py): an env-var read or ungated call in a test isn't shipped attack surface. Note it, don't rank it as a real finding.CORS_ALLOW_ORIGINS, a release tag, a feature flag flagged as a "module-scope secret." Check the name and value, not just the X = os.environ[...] shape.api_key = "" or a bare prefix like "hf_" is not a hardcoded credential.@function_tool when the code is LangChain @tool). The risk class is usually still right; correct the attribution and move on.scan.sh and report.py both exit non-zero when an error-severity finding exists — that's the gate working, not a crash. Don't report a successful scan as a tool failure.Use this exact skeleton so every report is consistent and skimmable. Lead with the verdict and the applicability call; put the table before the prose.
# Agent Security Review — <target>
**Applicability:** <Agent ✓ (full ruleset) | AI-adjacent (scoped) | Not an agent app (out of scope)> — <one clause why>
**Verdict:** <one line — e.g. "1 high, 3 medium; fix H1 before deploy" or "No agent-security issues; not an agentic codebase">
**Scope of this review:** ast-grep (<N> rules) — <X> matches · manual pass: <files/areas read, or "n/a — out of scope">
<!-- If "Not an agent app": write 2–3 sentences on why the pack doesn't apply + what (if anything) universal was checked, then STOP. No findings table of invented issues. -->
## Findings
| # | Sev | Rule / class | Location | Source | Confidence |
|---|-----|--------------|----------|--------|------------|
| H1 | 🔴 high | prompt-injection-sink | report.py:74 | manual | high |
| M1 | 🟠 med | sign.mcp-tool-without-allowlist | agents.py:148 | scanner | med |
Legend — Severity: 🔴 high/error · 🟠 medium/warning · 🟡 low/info. Source: scanner | manual.
## Details
<one block per finding, ordered by severity>
### H1 · prompt-injection-sink · report.py:74
- **What:** the actual data flow in *this* code (e.g. scraped page → LLM synthesis → `st.markdown(unsafe_allow_html=True)`)
- **Why it matters:** concrete impact tied to their threat model
- **Fix:** the architectural change, not just a patch
- **Coverage note:** scanner matched | scanner missed (pack blind spot worth a rule)
## What's solid
- controls the code gets right (keep the report honest and balanced)
## Coverage & gaps
- what the scan covered, what the manual pass covered, what was **not** looked at, and known pack blind spots
## Priority actions
1. highest-impact fix first …
For an out-of-scope target the report is just the header block plus the short explanation — that is a complete, correct report.
To fail a pipeline only on real problems:
bash ${CLAUDE_SKILL_DIR}/scripts/scan.sh <path> > findings.json
uv run ${CLAUDE_SKILL_DIR}/scripts/report.py findings.json --gate # exits 1 if any error-severity finding
The pack grows — new agent frameworks, new attack classes. Pulling main on each run means every scan uses the newest checks without re-installing the skill, and the repo stays the single source of truth. The local cache plus the bundled fallback keep it working offline and on the network-less Claude API; --pin keeps a given scan reproducible when freshness matters less than determinism. ast-grep rules are declarative YAML (data, not executable code), so fetching them at runtime is low-risk.
Across seven components — agent (unbounded loops, denial-of-wallet, God-Agent sprawl, Rule-of-Two), identity (hardcoded credentials, plaintext DB, ambient identity / Confused Deputy), control-flow (LLM output to sinks/control flow, secrets in logs, ungated tool dispatch), mcp (tool poisoning, prompt injection in tool descriptions, SSRF, missing allowlists), memory (shared-store poisoning, tainted-read exfiltration, unauthorized writes), skills (lethal-trifecta tool combos, permission bypass, missing descriptions), and gateway (unauthenticated endpoints, gateway bypass, missing policy engine). Python and TypeScript today; the pack is expanding.
The exact, current rule count lives in the catalog — don't quote a hardcoded number, link the catalog: https://github.com/raxITlabs/agent-security-review/blob/main/docs/RULES.md
report.py tolerates both the pretty array and the streaming (--json=stream) shapes, and reads from a file or stdin.metadata (with --include-metadata, which scan.sh sets) carries the component and architectural shift — no external lookup needed.Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub raxitlabs/agent-security-review --plugin agent-security