From ai-safety
Review or design the content-safety guardrails of an AI system — input/output classifiers, refusal and safe-completion behavior, escalation/human handoff, and coverage across harm categories, languages, and modalities. Use when assessing or building the safety controls around a model.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-safety:guardrail-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An assessment (or design) of the guardrail stack: what's blocked, how well, where
An assessment (or design) of the guardrail stack: what's blocked, how well, where the gaps are, and whether it balances under- vs over-blocking.
llm-security).harm-modeling, across
languages, and across modalities (image/audio/doc — cross-ref
multimodal-security). Gaps usually hide in non-English and non-text.safety-red-team)? Is there logging, drift monitoring, and an update process?A guardrail review: layer · coverage · gaps · severity · recommendation, plus a
target layered design. Validate changes with safety-evaluation and
safety-red-team.
Guardrails are defense-in-depth, not a single classifier — combine input, output, refusal, escalation, and monitoring. The two most common gaps: non-English/ non-text coverage, and over-refusal that quietly breaks legitimate use.
npx claudepluginhub jassics/awesome-claude-security --plugin ai-safetyProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.