From secure-sdlc-agents
Analyzes security risks in AI/LLM features including prompt injection, excessive agency, RAG systems, agents, and output handling per OWASP Top 10 for LLMs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/secure-sdlc-agents:ai-securityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill applies structured security analysis to AI and LLM-powered features.
This skill applies structured security analysis to AI and LLM-powered features. The threat categories here — prompt injection, excessive agency, output misuse, supply chain — did not exist before 2023 and are still being misunderstood by most developers shipping AI features today.
Working assumption: every model is a trust boundary, not a trusted component. Model outputs must be treated as untrusted user input to every downstream system.
Reference framework: OWASP Top 10 for LLMs 2025 (LLM01–LLM10).
Before finding vulnerabilities, enumerate:
| Question | Why it matters |
|---|---|
| Who sends input to the model? | Determines direct injection risk |
| What external sources feed the prompt context? | Determines indirect injection risk |
| What tools / functions can the model invoke? | Determines excessive agency blast radius |
| What happens to the model's output? | Determines output handling risk |
| Is user PII sent to a third-party API? | Determines data leakage and legal risk |
| Where does the model or its weights come from? | Determines supply chain risk |
Input trust classification:
| Input Source | Trust Level | Injection Risk |
|---|---|---|
| Authenticated user (UI) | LOW | Direct prompt injection |
| Public / unauthenticated user | UNTRUSTED | Direct + jailbreak attempts |
| Retrieved document (RAG) | UNTRUSTED | Indirect prompt injection |
| Tool / function call result | MEDIUM | Injection via external API response |
| Database query result | MEDIUM | Injection via poisoned records |
| Web scraping / search | UNTRUSTED | Indirect injection |
Mitigations to verify:
Excessive agency is the most dangerous risk for agentic systems. A model tricked via prompt injection into misusing its tool access can exfiltrate data, delete records, or send external requests — all without the user's knowledge.
Review checklist:
Key principle: model outputs are untrusted input. Validate before acting. Require explicit human confirmation for destructive or high-value operations.
| Model output used as… | Risk | Required mitigation |
|---|---|---|
| Rendered in HTML / DOM | Stored XSS | DOMPurify, output encoding |
| Executed as code | Remote code execution | Never execute model output directly |
| Inserted into SQL queries | SQL injection | Parameterise all queries; validate schema |
| Used in HTTP requests | SSRF | Validate and allowlist URLs from model output |
| Passed to shell commands | Command injection | Never pass model output to shell |
| Used as a file path | Path traversal | Validate against allowlist of permitted paths |
| Used for access control decisions | Privilege escalation | Never use model output for authorisation alone |
Supply chain:
Data leakage:
## AI Security Review: [Feature Name]
### Attack Surface Summary
[Inputs, model access, tools available, output usage]
### Threat Findings
| ID | OWASP LLM Category | Severity | Description | Mitigation |
|----|--------------------|----------|-------------|------------|
| AI-001 | LLM01: Prompt Injection | HIGH | [Description] | [Concrete fix] |
### Mitigations Required Before Release
[Priority list with owners and references]
### Accepted Risks
[Any risks accepted with justification and approver]
| Excuse | Counter |
|---|---|
| "The model won't do harmful things — it's aligned" | Alignment is not a security boundary. Prompt injection bypasses alignment systematically. |
| "Our users are trusted — no injection risk" | Indirect injection comes from retrieved documents, not users. Malicious content in your RAG source is an injection vector. |
| "We validate the model output in the UI" | XSS prevention in the UI is correct but insufficient. Validate at every trust boundary, not just display. |
| "It's a read-only agent — no write tools" | Is it truly read-only? Check every tool definition. HTTP GET requests can trigger side effects in external systems. |
| "We use a well-known model — supply chain is fine" | Supply chain risk includes fine-tunes, LoRA adapters, embedding models, and model API intermediaries — not just the base model. |
| "We'll add rate limiting later" | LLM cost exhaustion attacks (LLM10) are cheaper than traditional DoS. Rate limit before you ship. |
Do not close this review until:
docs/threat-model.md)grc-analyst for GDPR/compliance implicationsnpx claudepluginhub kaademos/secure-sdlc-agents --plugin secure-sdlc-agentsReviews AI/LLM applications for security risks including prompt injection, RAG security, agent permissioning, jailbreaks, data leakage, and model supply chain threats.
Audit applications for AI prompt injection, agent security, and LLM permission boundary vulnerabilities. Use when securing AI features or agents.
Mitigate prompt injection risks in LLM-based systems. Use when designing, building, or reviewing AI systems that accept user prompts, or when evaluating model safety for deployment.