Skill

evaluate-koog

Sets up Dokimos evaluations for Koog AI agents in Kotlin, as system under test or judge using ExactMatchEvaluator, LLMJudgeEvaluator, or DSL.

Kotlin

ai-ml

testing

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/evaluate-koog:evaluate-koog

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via `$ARGUMENTS`.

SKILL.md

105 lines · ~777 tokens

Stats

LanguageJava

Parent stars21

Parent forks3

MaintenanceFair

Last CommitFeb 21, 2026

Actions

View Source View Plugin View on GitHub View README

Evaluate Koog Agent

Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via $ARGUMENTS.

Where things live

Koog support: dokimos-koog/src/main/kotlin/dev/dokimos/koog/KoogSupport.kt
Koog tests: dokimos-koog/src/test/kotlin/dev/dokimos/koog/
Maven dependency: dev.dokimos:dokimos-koog

Before writing code, read KoogSupport.kt to understand the available utilities.

Key functions

KoogSupport.kt provides:

asJudge(agentCall: suspend (String) -> String) — wraps any suspend function into a JudgeLM
asJudge(agent: () -> AIAgent<String, String>) — wraps a Koog agent factory into a JudgeLM
AIAgent.runBlocking(input, context) — extension to run a Koog agent synchronously

Setting up evaluation

Using a Koog agent as the system under test

val agent: () -> AIAgent<String, String> = { createMyAgent() }

val task = Task { example ->
    val input = example.inputs()["input"] as String
    val output = agent().runBlocking(input)
    mapOf("output" to output)
}

val result = Experiment.builder()
    .name("Koog Agent Evaluation")
    .dataset(dataset)
    .task(task)
    .evaluator(ExactMatchEvaluator.builder().build())
    .build()
    .run()

Using a Koog agent as a judge

val judge = asJudge { prompt -> myAgent().run(prompt) }
// or
val judge = asJudge { createMyAgent() }

val evaluator = LLMJudgeEvaluator.builder()
    .name("helpfulness")
    .judge(judge)
    .criteria("Is the response helpful and accurate?")
    .evaluationParams(listOf(
        EvalTestCaseParam.INPUT,
        EvalTestCaseParam.ACTUAL_OUTPUT,
        EvalTestCaseParam.EXPECTED_OUTPUT
    ))
    .threshold(0.7)
    .build()

Kotlin DSL (with dokimos-kotlin)

If the user has dokimos-kotlin as a dependency, use the DSL:

val result = experiment {
    name = "Koog Agent Eval"
    dataset = Dataset.fromJson(Path.of("datasets/qa.json"))
    task { example ->
        val output = agent().runBlocking(example.input())
        mapOf("output" to output)
    }
    evaluator(ExactMatchEvaluator.builder().build())
}

Dependencies

The user needs dokimos-koog:

<dependency>
    <groupId>dev.dokimos</groupId>
    <artifactId>dokimos-koog</artifactId>
    <version>${dokimos.version}</version>
</dependency>

Koog itself is a provided-scope dependency — the user must bring their own version.

Steps

Understand from $ARGUMENTS what the Koog agent does and how it's constructed
Determine if the agent is the system under test, the judge, or both
Create a dataset appropriate for the agent's domain
Wire up the evaluation using KoogSupport utilities
Write tests in Kotlin using MockK for mocking

evaluate-koog

Popularity

Invocation

Context Preview

SKILL.md

evaluate-koog

Popularity

Invocation

Context Preview

SKILL.md

Evaluate Koog Agent

Where things live

Key functions

Setting up evaluation

Using a Koog agent as the system under test

Using a Koog agent as a judge

Kotlin DSL (with dokimos-kotlin)

Dependencies

Steps

Similar Skills

Evaluate Koog Agent

Where things live

Key functions

Setting up evaluation

Using a Koog agent as the system under test

Using a Koog agent as a judge

Kotlin DSL (with dokimos-kotlin)

Dependencies

Steps

Similar Skills