From evaluate-spring-ai
Sets up Dokimos evaluation for Spring AI apps including ChatClient, RAG pipelines, and advisor chains. Use for Spring Boot LLM testing and benchmarking.
How this skill is triggered — by the user, by Claude, or both
Slash command
/evaluate-spring-ai:evaluate-spring-aiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Set up Dokimos evaluation for a Spring AI application. The user will describe their application and evaluation goals via `$ARGUMENTS`.
Set up Dokimos evaluation for a Spring AI application. The user will describe their application and evaluation goals via $ARGUMENTS.
dokimos-spring-ai/src/main/java/dev/dokimos/springai/SpringAiSupport.javadokimos-examples/src/main/java/dev/dokimos/examples/springai/dokimos-examples/src/main/java/dev/dokimos/examples/springai/tutorial/dev.dokimos:dokimos-spring-aiBefore writing code, read SpringAiSupport.java to understand the available utilities.
SpringAiSupport provides:
asJudge(ChatClient.Builder) — wraps a Spring AI ChatClient.Builder into a JudgeLMasJudge(ChatModel) — wraps a ChatModel directly into a JudgeLMtoTestCase(EvaluationRequest) — converts Spring AI's EvaluationRequest to Dokimos EvalTestCasetoEvaluationResponse(EvalResult) — converts Dokimos EvalResult back to Spring AI EvaluationResponse@SpringBootTest
class MyChatEvaluationTest {
@Autowired
private ChatClient.Builder chatClientBuilder;
@Test
void evaluateChatbot() {
ChatClient chatClient = chatClientBuilder.build();
Task task = example -> {
String response = chatClient.prompt()
.user(example.input())
.call()
.content();
return Map.of("output", response);
};
JudgeLM judge = SpringAiSupport.asJudge(chatClientBuilder);
ExperimentResult result = Experiment.builder()
.name("Chatbot Evaluation")
.dataset(Dataset.fromJson(Path.of("src/test/resources/datasets/qa.json")))
.task(task)
.evaluator(LLMJudgeEvaluator.builder()
.name("answer-quality")
.judge(judge)
.criteria("Is the response helpful and accurate?")
.evaluationParams(List.of(
EvalTestCaseParam.INPUT,
EvalTestCaseParam.ACTUAL_OUTPUT,
EvalTestCaseParam.EXPECTED_OUTPUT))
.threshold(0.7)
.build())
.build()
.run();
}
}
Task task = example -> {
String input = example.input();
ChatClient.ChatClientRequestSpec request = chatClient.prompt().user(input);
request.advisors(new QuestionAnswerAdvisor(vectorStore));
String response = request.call().content();
List<Document> docs = vectorStore.similaritySearch(input);
List<String> context = docs.stream().map(Document::getText).toList();
return Map.of("output", response, "context", context);
};
EvaluationRequest request = new EvaluationRequest(userText, documents, responseContent);
EvalTestCase testCase = SpringAiSupport.toTestCase(request);
EvalResult result = evaluator.evaluate(testCase);
EvaluationResponse response = SpringAiSupport.toEvaluationResponse(result);
<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-spring-ai</artifactId>
<version>${dokimos.version}</version>
</dependency>
Spring AI itself is a provided-scope dependency — the user must bring their own version.
$ARGUMENTS what the Spring AI application doesSpringAiSupport utilities@SpringBootTestnpx claudepluginhub dokimos-dev/dokimos --plugin evaluate-spring-aiIntegrates Spring AI or LangChain4J into Spring Boot projects for AI features like chatbots, RAG, vector stores, streaming LLM responses, and tool calls.
Sets up Dokimos evaluation for LangChain4j apps and RAG pipelines with Q&A tasks, faithfulness, relevance, and retrieval checks.
Sets up Dokimos evaluation for AI agents using tools, assessing tool call validity, correctness, task completion, argument hallucinations, and tool definition quality.