From agentic-ai-security
Test what an AI agent will actually do without human confirmation, including under injected-goal / prompt-injection scenarios, to validate its autonomy and approval boundaries. Use on an authorized agent to confirm excessive-agency controls hold in practice.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-ai-security:autonomy-boundary-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evidence on whether the agent's autonomy limits hold: that high-impact,
Evidence on whether the agent's autonomy limits hold: that high-impact, irreversible, or externally-visible actions require human approval and cannot be reached unintentionally or via injected goals.
rag-security:retrieval-poisoning-test).
Does content-borne instruction reach a tool call?A results table: case · trigger · expected gate · actual behavior · evidence ·
mitigation. Confirmed boundary failures → security-reporting:finding (high+ when
irreversible/external actions execute without approval).
Test in a sandbox with mock tools so a "passing" attack doesn't actually send the email, make the payment, or delete the data. The most serious finding is any content-borne instruction (case 3) reaching a real action — that's prompt injection turned into agency.
Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub jassics/awesome-claude-security --plugin agentic-ai-security