From dspy-api-skills
Adds guardrails to verify and validate AI outputs before they reach users, using Pydantic validation and DSPy's Refine (iterative self-correction) or BestOfN (sampling best candidate).
How this skill is triggered — by the user, by Claude, or both
Slash command
/dspy-api-skills:ai-checking-outputsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide the user through adding verification and guardrails so bad AI outputs never reach users. The pattern: generate, check, fix or reject.
Guide the user through adding verification and guardrails so bad AI outputs never reach users. The pattern: generate, check, fix or reject.
Ask the user:
The simplest way to add checks combines Pydantic for structure and dspy.Refine for iterative self-correction. Define a reward function that returns a float (1.0 = pass, 0.0 = fail), then wrap the module:
import dspy
class GenerateResponse(dspy.Signature):
question: str = dspy.InputField()
answer: str = dspy.OutputField()
class CheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
return self.respond(question=question)
def response_quality_reward(args: dict, pred: dspy.Prediction) -> float:
answer = pred.answer or ""
word_count = len(answer.split())
# Hard requirements — return 0.0 if violated
if len(answer) == 0:
return 0.0
if word_count > 200:
return 0.0
# Soft preferences reduce score
score = 1.0
if "i don't know" in answer.lower():
score -= 0.3
if any(w in answer.lower() for w in ["definitely", "absolutely", "100%"]):
score -= 0.2
return max(score, 0.0)
# Wrap with Refine — retries up to N times feeding back the reward signal
checked = dspy.Refine(CheckedResponder(), N=3, reward_fn=response_quality_reward, threshold=0.8)
result = checked(question="What is the boiling point of water?")
dspy.Refine retries the module up to N times, passing reward feedback to guide self-correction. dspy.BestOfN generates N candidates in parallel and returns the one with the highest reward — use it when you want diversity rather than iterative refinement.
DSPy validates typed outputs automatically:
from typing import Literal
from pydantic import BaseModel, Field
class Response(BaseModel):
answer: str = Field(min_length=1, max_length=500)
confidence: float = Field(ge=0.0, le=1.0)
category: str
class MySignature(dspy.Signature):
question: str = dspy.InputField()
response: Response = dspy.OutputField()
Pydantic catches malformed JSON, out-of-range values, and wrong types before your code ever sees them.
import re
class ExtractContact(dspy.Signature):
text: str = dspy.InputField()
email: str = dspy.OutputField()
phone: str = dspy.OutputField()
class ContactExtractor(dspy.Module):
def __init__(self):
self.extract = dspy.ChainOfThought(ExtractContact)
def forward(self, text):
return self.extract(text=text)
def contact_format_reward(args: dict, pred: dspy.Prediction) -> float:
email_ok = bool(re.match(r"[^@]+@[^@]+\.[^@]+", pred.email or ""))
phone_digits = len(re.sub(r"\D", "", pred.phone or ""))
phone_ok = phone_digits >= 10
if not email_ok or not phone_ok:
return 0.0
return 1.0
validated = dspy.Refine(ContactExtractor(), N=3, reward_fn=contact_format_reward, threshold=1.0)
result = validated(text="Call me at 555-1234567 or email [email protected]")
class VerifyFacts(dspy.Signature):
"""Check if the answer is supported by the given context."""
context: list[str] = dspy.InputField(desc="Source documents")
answer: str = dspy.InputField(desc="Generated answer to verify")
is_supported: bool = dspy.OutputField(desc="Is the answer fully supported by the context?")
unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")
class GroundedResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(AnswerFromDocs)
self.verify = dspy.Predict(VerifyFacts)
def forward(self, question):
context = self.retrieve(question).passages
return self.answer(context=context, question=question)
def faithfulness_reward(args: dict, pred: dspy.Prediction) -> float:
context = args.get("context", [])
# Use a verify module to check grounding — instantiate outside for efficiency
check = verify_module(context=context, answer=pred.answer)
if not check.is_supported:
return 0.0
return 1.0
# Note: build verify_module once at the module level
verify_module = dspy.Predict(VerifyFacts)
grounded = dspy.Refine(GroundedResponder(), N=3, reward_fn=faithfulness_reward, threshold=1.0)
result = grounded(question="What did the report say about Q3 revenue?")
class CompareAnswers(dspy.Signature):
"""Check if two independently generated answers agree."""
question: str = dspy.InputField()
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="Do the answers substantially agree?")
discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")
class CrossCheckedAnswer(dspy.Module):
def __init__(self):
self.answer_b = dspy.ChainOfThought(AnswerQuestion)
self.compare = dspy.ChainOfThought(CompareAnswers)
def forward(self, question, answer_a):
b = self.answer_b(question=question)
comparison = self.compare(
question=question,
answer_a=answer_a,
answer_b=b.answer,
)
return comparison
# Use BestOfN to generate N candidates then pick the most consistent one
def consistency_reward(args: dict, pred: dspy.Prediction) -> float:
# Higher confidence answers score better; refine toward agreement
return 1.0 if pred.agree else 0.0
import re
BLOCKED_PATTERNS = [
r"\b(password|secret|api.?key)\b",
r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
]
class SafeResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
return self.respond(question=question)
def safety_reward(args: dict, pred: dspy.Prediction) -> float:
answer = pred.answer or ""
for pattern in BLOCKED_PATTERNS:
if re.search(pattern, answer, re.IGNORECASE):
return 0.0
return 1.0
safe_responder = dspy.Refine(SafeResponder(), N=3, reward_fn=safety_reward, threshold=1.0)
result = safe_responder(question="Tell me about our API authentication setup")
class SafetyCheck(dspy.Signature):
"""Check if the response is safe and appropriate."""
question: str = dspy.InputField()
response: str = dspy.InputField()
is_safe: bool = dspy.OutputField()
concern: str = dspy.OutputField(desc="Safety concern if not safe, empty if safe")
safety_judge = dspy.Predict(SafetyCheck)
class SafetyCheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
return self.respond(question=question)
def ai_safety_reward(args: dict, pred: dspy.Prediction) -> float:
check = safety_judge(question=args["question"], response=pred.answer)
return 1.0 if check.is_safe else 0.0
safe_checked = dspy.Refine(SafetyCheckedResponder(), N=3, reward_fn=ai_safety_reward, threshold=1.0)
For high-stakes outputs, use dspy.BestOfN to generate multiple independent candidates and keep the highest-scoring one:
class GenerateAnswer(dspy.Signature):
question: str = dspy.InputField()
answer: str = dspy.OutputField()
class AnswerModule(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought(GenerateAnswer)
def forward(self, question):
return self.generate(question=question)
def answer_quality_reward(args: dict, pred: dspy.Prediction) -> float:
answer = pred.answer or ""
if len(answer) == 0:
return 0.0
word_count = len(answer.split())
if word_count > 200:
return 0.0
# Reward concise, substantive answers
score = min(word_count / 50.0, 1.0)
return score
# Generate 5 candidates, return the one with the highest reward
best = dspy.BestOfN(AnswerModule(), N=5, reward_fn=answer_quality_reward, threshold=0.5)
result = best(question="Explain why the sky is blue")
Use BestOfN when diversity matters — it does not feed one attempt's feedback into the next. Use Refine when you want iterative self-correction with prior context.
dspy.Refine:
reward_fnthreshold, it retries — passing the reward signal back as feedback contextN times totalfail_count is set and too many attempts faildspy.BestOfN:
N times independently (no shared feedback between runs)reward_fnthresholdfail_count is hit)This is why reward functions must be specific — a score of 0.3 vs 0.9 signals very different things to Refine's retry logic. Return 0.0 for hard failures, partial scores for soft preferences.
When combined with optimization (/ai-improving-accuracy), the model learns to satisfy reward functions on the first try, reducing retries in production.
Do not reach for Refine/BestOfN when simpler approaches work:
/ai-improving-accuracy).Consider /ai-following-rules for declarative constraint enforcement, or /ai-improving-accuracy for systematic prompt optimization that reduces the need for post-hoc checking.
| Check | When to use | How |
|---|---|---|
| Non-empty output | Always | return 0.0 in reward_fn if len(answer) == 0 |
| Length limits | User-facing text | return 0.0 if word count exceeds N |
| Valid format | Structured output | Pydantic model + reward_fn format check |
| Grounded in sources | RAG / doc search | Verification signature inside reward_fn |
| No sensitive data | Any user-facing output | Regex patterns in reward_fn |
| Safe content | Public-facing apps | AI safety judge inside reward_fn |
| Consistent | Critical decisions | Cross-check two generations in reward_fn |
| High quality | High-stakes outputs | dspy.BestOfN with quality reward_fn |
| Approach | Bad output rate (typical) | Notes |
|---|---|---|
| Typed signature only | ~15-25% format errors | Pydantic retries handle most |
| + Refine with reward_fn (N=3) | ~2-5% | Iterative feedback fixes most remaining |
| + BestOfN (N=5) with quality reward | ~1-3% | Best for creative/high-stakes tasks |
| + AI-as-judge in reward_fn | < 1% | Highest quality, 2x LM cost per attempt |
Exact numbers depend on task difficulty and model capability. Measure your baseline first with dspy.Evaluate before adding checks.
dspy.Refine and dspy.BestOfN expect a float from reward_fn(args, pred). Returning True/False or raising an exception will cause unexpected behavior. Always return 0.0 for failure and 1.0 (or a partial score) for success.args dict passed to reward_fn holds the keyword arguments the module was called with. Access them by name: args["question"], args["context"]. Don't assume positional order.dspy.Refine when the model can improve given feedback from prior attempts (iterative self-correction). Use dspy.BestOfN when you want independent samples with no cross-contamination — e.g., creative generation where you want diverse outputs.dspy.Predict or dspy.ChainOfThought inside the reward function creates a new module object on every call. Instantiate verification modules once at the module or class level and reference them in the closure.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-stopping-hallucinations — citation enforcement, faithfulness verification, grounding AI in facts/ai-following-rules — defining and enforcing content policies, format rules, and business constraints/ai-building-pipelines — wire checks into multi-step systems/ai-making-consistent — output consistency (not correctness)/ai-testing-safety — stress-test your guardrails with adversarial attacks/ai-scoring — evaluate human work against criteria/ai-improving-accuracy — measure and improve quality systematically/dspy-refine — deep-dive on iterative refinement with reward functions/dspy-best-of-n — deep-dive on sampling N candidates and picking the best/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-donpx claudepluginhub lebsral/dspy-programming-not-prompting-lms-skills --plugin dspy-tools-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.