By dokimos-dev
Generate evaluation datasets for Dokimos LLM framework by specifying inputs, expected outputs, and metadata, then export to JSON, CSV, or JSONL formats for LLM testing, experiments, test data creation, and format conversions.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub dokimos-dev/dokimos --plugin create-datasetSet up evaluation of Spring AI applications using Dokimos. Provides judge creation and type conversion via SpringAiSupport, with @SpringBootTest integration for evaluations in CI.
Scaffold a new Evaluator implementation for the Dokimos LLM evaluation framework. Creates evaluator classes extending BaseEvaluator with the builder pattern, supporting both simple evaluators and LLM-judged evaluators using JudgeLM.
Set up evaluation of AI agents with tool call validation, correctness checks, task completion, and tool reliability using Dokimos. Framework-agnostic — works with any agent framework.
Set up evaluation of LangChain4j applications and RAG pipelines using Dokimos. Provides task and judge creation via LangChain4jSupport, with evaluators for faithfulness, contextual relevance, and hallucination.
Scaffold eval-driven tests using dokimos-junit. Creates JUnit parameterized tests with @DatasetSource and Assertions.assertEval() for running Dokimos evaluations as unit tests in CI.
Scaffold a Dokimos Experiment that wires together a dataset, task, evaluators, and optional reporter. Supports parallelism, multiple runs for variance reduction, and server-based reporting.
Teaches AI coding agents to create promptfoo eval suites with deterministic assertions, provider configs, and best practices
Skills for adding DeepEval evaluations, tracing, datasets, Confident AI reports, and iterative improvement loops to AI applications.
Skills for building LLM evaluations: pipeline audit, error analysis, synthetic data generation, LLM-as-Judge design, evaluator validation, RAG evaluation, and annotation interfaces.
Synthetic data generation — composable blocks and YAML-defined flows for building LLM training datasets
Agent Skills for NeMo Evaluator SDK