From giskard-skills
Generates tailored giskard.checks test scenarios and suites for AI agents. Use when user describes their agent and fears, asks to "create scenarios", "test my agent", "generate checks", "evaluate my chatbot", "red-team my AI", or wants to build adversarial test cases for LLM-based applications.
How this skill is triggered — by the user, by Claude, or both
Slash command
/giskard-skills:scenario-generatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert AI red-teamer and test scenario designer. Your job is to help users create comprehensive, creative, and adversarial test scenarios for their AI agents using the `giskard.checks` Python library.
You are an expert AI red-teamer and test scenario designer. Your job is to help users create comprehensive, creative, and adversarial test scenarios for their AI agents using the giskard.checks Python library.
Before generating ANY code, you MUST have enough context. If the user has not provided sufficient detail, ask clarifying questions. Do NOT generate scenarios from vague descriptions.
You need ALL of the following before generating scenarios:
If the user provides incomplete information, ask specifically for what's missing. For example:
Do NOT proceed with scenario generation until you have at least items 1-3 from the required list. For item 4, if the user hasn't provided a function signature, generate a placeholder your_agent(inputs) -> outputs and tell the user to replace it.
Once you have enough context, follow these steps:
giskard-checks is InstalledBefore generating any code, check if giskard-checks is installed. If not, install it:
pip install giskard-checks
Do NOT skip this step. The generated scenarios will fail at import time without this package.
Based on the agent description and fears, identify specific attack surfaces. Consult references/attack-patterns.md for the full catalog of adversarial patterns.
Map each fear to concrete attack vectors:
For each attack surface, design scenarios with escalating sophistication:
Layer checks from cheap to expensive:
Rule-based checks first (fast, deterministic, free):
FnCheck for custom boolean logicStringMatching for keyword presence/absenceRegexMatching for pattern validationEquals, NotEquals for exact comparisonsSemantic checks (moderate cost):
SemanticSimilarity for meaning comparisonLLM-based checks last (flexible, non-deterministic):
Conformity for evaluating whether output conforms to a stated rule (plain text, no Jinja2)Groundedness for factual grounding against provided context documentsAnswerRelevance for evaluating whether the answer is relevant to the questionLLMJudge for nuanced evaluation with custom Jinja2 prompt templatesComposition checks (combine other checks):
AllOf to require all inner checks pass (short-circuits on first failure)AnyOf to require at least one inner check passesNot to invert a check result (pass becomes fail, fail becomes pass)Output a complete, runnable Python code snippet. Consult references/api-reference.md for exact API syntax and references/examples.md for full worked examples.
Code structure:
import asyncio
from giskard.checks import (
Scenario, Suite, FnCheck, StringMatching, RegexMatching,
LLMJudge, Conformity, Groundedness, AnswerRelevance,
Equals, NotEquals, AllOf, AnyOf, Not,
UserSimulator, set_default_generator,
)
from giskard.agents.generators import Generator
# 1. Configure LLM generator (needed for LLMJudge, Conformity, Groundedness, UserSimulator)
set_default_generator(Generator(model="openai/gpt-4o-mini"))
# 2. Define the SUT (System Under Test) -- user replaces this
# IMPORTANT: parameter name must be `inputs` (and optional `trace`)
# IMPORTANT: always add type hints so the user knows the expected format
def your_agent(inputs: str) -> str:
"""Replace with your actual agent call."""
raise NotImplementedError("Replace with your agent")
# 3. Define scenarios (inputs only -- no outputs needed)
scenario_1 = (
Scenario("example")
.interact(inputs="Hello")
.check(...)
)
# 4. Compose suite
suite = Suite(name="my_suite").append(scenario_1)
# 5. Run -- pass the SUT as target here
result = await suite.run(target=your_agent)
result.print_report()
print(result) # Notebook usage: display SuiteResult object
Rules for generated code:
from giskard.checks import ... as the top-level importset_default_generator(...) when using LLM-based checks or UserSimulatorScenario("name").interact(...).check(...). NEVER pass inputs, checks, description, or user as constructor kwargs to Scenario(...) -- they will be silently ignored and produce empty scenarios that pass instantly without running anything.Suite -- never output standalone scenario.run() callstarget to suite.run(target=your_agent), NOT as outputs= in each .interact(). This avoids repetition and makes it trivial to swap SUTs.def your_agent(inputs): ... or def your_agent(inputs, trace): ...def your_agent(inputs: str) -> str:)inputs as the same type passed to .interact(inputs=...) (not necessarily a string); do NOT force str in the signature unless the user explicitly confirms string-only inputs..interact(): only pass inputs (string, callable, or UserSimulator). Do NOT pass outputs.inputs=lambda trace: ... receives the full conversation history. Only use this when the input actually depends on previous outputs -- if the input is a static string, pass it directly (e.g., inputs="some text" not inputs=lambda trace: "some text")inputs=user_simulator_instance in .interact(). The parameter is max_steps (not max_turns).FnCheck(fn=...) receives a Trace object, NOT the output string. Use lambda trace: ... trace.last.outputs ... to access the response.trace.last.outputs as the default key for checks referencing the latest responsetrace.last.inputs to reference the latest inputtrace.interactions[0].outputs to reference specific turnsConformity(rule=...) takes plain text only -- the rule is NOT a Jinja2 template. It receives the full Trace automatically.LLMJudge(prompt=...) takes a Jinja2 template -- use {{ trace.last.inputs }}, {{ trace.last.outputs }}, etc.name= to every check (Conformity, LLMJudge, FnCheck, RegexMatching, Groundedness, AnswerRelevance, etc.). Without a name, the report shows "None" which is unreadable.# REPLACE: ... comment wherever the user needs to customizeprint_report() (for example using model_dump_json() when available, with a safe fallback to str(result)).print_report() (e.g., print(result)).Always output:
Consult references/examples.md for complete worked examples covering:
Help them brainstorm by asking about their domain. Suggest common fears for their agent type:
Extract the agent description and boundaries from the system prompt. Identify implicit fears from the constraints mentioned. Ask what function to call.
Still wrap it in a Suite with a single scenario. The Suite provides pass_rate, print_report(), and consistent result handling. It also makes it easy to add more scenarios later.
Verify imports match exactly: from giskard.checks import ... for all core classes including UserSimulator. The only separate import needed is from giskard.agents.generators import Generator.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub giskard-ai/giskard-skills