From llm-patterns
Use when LLM inputs or outputs must be validated for safety, policy compliance, schema conformance, or content appropriateness before they reach users or downstream systems. Apply when LLM responses could contain harmful content, PII leakage, prompt injection, off-topic responses, or policy violations. Covers input validation, output validation, content filtering, and prompt injection defence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-patterns:guardrailsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Users can inject instructions into prompts via their input (prompt injection)
Guardrails are a Chain of Responsibility applied to AI safety. Each guard inspects the input or output and can pass, modify, or block it. The guards are independent, composable, and ordered by cost (cheapest first).
User input → [Input Guards] → LLM → [Output Guards] → User response
│ │
▼ ▼
Block/modify Block/modify
before LLM before delivery
Validate and sanitise user input before it reaches the LLM.
Detect attempts to override the system prompt or inject new instructions.
class PromptInjectionGuard:
"""Detects common prompt injection patterns in user input."""
INJECTION_PATTERNS = [
r"ignore (all |any )?(previous|prior|above) (instructions|prompts)",
r"you are now",
r"new instructions:",
r"system:\s",
r"<\|.*\|>", # common delimiter injection
]
def check(self, user_input: str) -> GuardResult:
lowered = user_input.lower()
for pattern in self.INJECTION_PATTERNS:
if re.search(pattern, lowered):
return GuardResult.block(
reason="Potential prompt injection detected",
pattern=pattern
)
return GuardResult.pass_through()
Note: regex-based detection catches obvious cases. For production systems, use a classifier or a dedicated content moderation API as a second layer.
Prevent excessively long inputs that waste tokens or could be used for context stuffing.
class InputLengthGuard:
def __init__(self, max_tokens: int = 4000):
self._max = max_tokens
def check(self, user_input: str) -> GuardResult:
tokens = count_tokens(user_input)
if tokens > self._max:
return GuardResult.block(
reason=f"Input too long: {tokens} tokens (max {self._max})"
)
return GuardResult.pass_through()
Restrict the LLM to its intended domain — prevent off-topic queries from consuming resources.
class TopicGuard:
def __init__(self, allowed_topics: list[str], classifier: TopicClassifier):
self._allowed = allowed_topics
self._classifier = classifier
def check(self, user_input: str) -> GuardResult:
topic = self._classifier.classify(user_input)
if topic not in self._allowed:
return GuardResult.block(
reason=f"Off-topic query (classified as '{topic}')"
)
return GuardResult.pass_through()
Validate LLM output before it reaches the user or downstream systems.
Scan for personally identifiable information that should not be in the response.
class PIIGuard:
"""Detects PII patterns in LLM output."""
PII_PATTERNS = {
"email": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
"phone": r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",
"ssn": r"\b\d{3}-\d{2}-\d{4}\b",
"credit_card": r"\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b",
}
def check(self, output: str) -> GuardResult:
for pii_type, pattern in self.PII_PATTERNS.items():
if re.search(pattern, output):
return GuardResult.modify(
reason=f"PII detected: {pii_type}",
modified=re.sub(pattern, f"[REDACTED {pii_type.upper()}]", output)
)
return GuardResult.pass_through()
Verify that the LLM's claims are supported by the retrieved context.
class GroundingGuard:
def check(self, output: str, context: str) -> GuardResult:
claims = extract_factual_claims(output)
for claim in claims:
if not is_supported_by_context(claim, context):
return GuardResult.modify(
reason=f"Unsupported claim: '{claim}'",
modified=add_disclaimer(output, claim)
)
return GuardResult.pass_through()
Ensure structured output matches the expected schema (complements structured-generation).
class SchemaGuard:
def __init__(self, schema: type[BaseModel]):
self._schema = schema
def check(self, output: str) -> GuardResult:
try:
parsed = json.loads(output)
self._schema.model_validate(parsed)
return GuardResult.pass_through()
except (json.JSONDecodeError, ValidationError) as e:
return GuardResult.block(
reason=f"Output does not match schema: {e}"
)
Compose guards into a pipeline. Order by cost — cheap regex guards first, expensive classifier guards last.
class GuardPipeline:
def __init__(self, guards: list[Guard]):
self._guards = guards # ordered: cheapest first
def run(self, content: str, **context) -> GuardPipelineResult:
current = content
for guard in self._guards:
result = guard.check(current, **context)
if result.action == "block":
return GuardPipelineResult(
blocked=True,
reason=result.reason,
guard=guard.__class__.__name__
)
if result.action == "modify":
current = result.modified
# Log the modification for audit
log_modification(guard.__class__.__name__, result.reason)
return GuardPipelineResult(blocked=False, content=current)
# Compose the pipeline
input_guards = GuardPipeline([
InputLengthGuard(max_tokens=4000),
PromptInjectionGuard(),
TopicGuard(allowed_topics=["support", "billing", "product"]),
])
output_guards = GuardPipeline([
PIIGuard(),
SchemaGuard(schema=SupportResponse),
GroundingGuard(),
])
structured-generation instead.npx claudepluginhub entelligentsia/skillforge --plugin llm-patternsImplements input/output guardrails for LLM apps using NeMo Guardrails Colang, Python PII/toxicity validators, and Guardrails AI to block prompt injection, data leaks, toxic content, hallucinations, and ensure JSON schema compliance. For AI safety in chatbots, RAG pipelines.
Implements input/output guardrails for LLM apps using NeMo Guardrails Colang, Python PII/toxicity validators, and Guardrails AI to block prompt injection, data leaks, toxic content, hallucinations, and ensure JSON schema compliance. For AI safety in chatbots, RAG pipelines.
Builds input/output validation guardrails for LLM apps using NeMo Guardrails Colang and custom Python validators to prevent prompt injection, data leakage, toxic content, and hallucinations.