From evaluate-koog
Sets up Dokimos evaluations for Koog AI agents in Kotlin, as system under test or judge using ExactMatchEvaluator, LLMJudgeEvaluator, or DSL.
How this skill is triggered — by the user, by Claude, or both
Slash command
/evaluate-koog:evaluate-koogThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via `$ARGUMENTS`.
Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via $ARGUMENTS.
dokimos-koog/src/main/kotlin/dev/dokimos/koog/KoogSupport.ktdokimos-koog/src/test/kotlin/dev/dokimos/koog/dev.dokimos:dokimos-koogBefore writing code, read KoogSupport.kt to understand the available utilities.
KoogSupport.kt provides:
asJudge(agentCall: suspend (String) -> String) — wraps any suspend function into a JudgeLMasJudge(agent: () -> AIAgent<String, String>) — wraps a Koog agent factory into a JudgeLMAIAgent.runBlocking(input, context) — extension to run a Koog agent synchronouslyval agent: () -> AIAgent<String, String> = { createMyAgent() }
val task = Task { example ->
val input = example.inputs()["input"] as String
val output = agent().runBlocking(input)
mapOf("output" to output)
}
val result = Experiment.builder()
.name("Koog Agent Evaluation")
.dataset(dataset)
.task(task)
.evaluator(ExactMatchEvaluator.builder().build())
.build()
.run()
val judge = asJudge { prompt -> myAgent().run(prompt) }
// or
val judge = asJudge { createMyAgent() }
val evaluator = LLMJudgeEvaluator.builder()
.name("helpfulness")
.judge(judge)
.criteria("Is the response helpful and accurate?")
.evaluationParams(listOf(
EvalTestCaseParam.INPUT,
EvalTestCaseParam.ACTUAL_OUTPUT,
EvalTestCaseParam.EXPECTED_OUTPUT
))
.threshold(0.7)
.build()
If the user has dokimos-kotlin as a dependency, use the DSL:
val result = experiment {
name = "Koog Agent Eval"
dataset = Dataset.fromJson(Path.of("datasets/qa.json"))
task { example ->
val output = agent().runBlocking(example.input())
mapOf("output" to output)
}
evaluator(ExactMatchEvaluator.builder().build())
}
The user needs dokimos-koog:
<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-koog</artifactId>
<version>${dokimos.version}</version>
</dependency>
Koog itself is a provided-scope dependency — the user must bring their own version.
$ARGUMENTS what the Koog agent does and how it's constructedKoogSupport utilitiesnpx claudepluginhub dokimos-dev/dokimos --plugin evaluate-koogSets up Dokimos evaluation for AI agents using tools, assessing tool call validity, correctness, task completion, argument hallucinations, and tool definition quality.
Builds AI agent evaluations using Anthropic patterns: code/model/human graders, tasks, trials, benchmarks for coding, conversational, research agents.
Sets up Dokimos evaluation for Spring AI apps including ChatClient, RAG pipelines, and advisor chains. Use for Spring Boot LLM testing and benchmarking.