By dokimos-dev
Scaffold custom Java evaluator classes for Dokimos LLM evaluation framework to define metrics, scoring functions, and grading logic for LLM outputs. Use builder pattern for simple evaluators or LLM-judged ones with JudgeLM.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Set up evaluation of Spring AI applications using Dokimos. Provides judge creation and type conversion via SpringAiSupport, with @SpringBootTest integration for evaluations in CI.
Create evaluation datasets for the Dokimos LLM evaluation framework in JSON, CSV, or JSONL format. Supports simple and structured example formats with inputs, expected outputs, and metadata.
Set up evaluation of AI agents with tool call validation, correctness checks, task completion, and tool reliability using Dokimos. Framework-agnostic — works with any agent framework.
Set up evaluation of LangChain4j applications and RAG pipelines using Dokimos. Provides task and judge creation via LangChain4jSupport, with evaluators for faithfulness, contextual relevance, and hallucination.
Scaffold eval-driven tests using dokimos-junit. Creates JUnit parameterized tests with @DatasetSource and Assertions.assertEval() for running Dokimos evaluations as unit tests in CI.
npx claudepluginhub dokimos-dev/dokimos --plugin create-evaluatorSet up evaluation of Spring AI applications using Dokimos. Provides judge creation and type conversion via SpringAiSupport, with @SpringBootTest integration for evaluations in CI.
Teaches AI coding agents to create promptfoo eval suites with deterministic assertions, provider configs, and best practices
Skills for building LLM evaluations: pipeline audit, error analysis, synthetic data generation, LLM-as-Judge design, evaluator validation, RAG evaluation, and annotation interfaces.
Skills for adding DeepEval evaluations, tracing, datasets, Confident AI reports, and iterative improvement loops to AI applications.
Agent Skills for NeMo Evaluator SDK
LLM Judges plugin