By judgmentlabs
Enables AI agents to use Judgeval for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts for common operations.
Claude Code plugin for automatic tracing and observability with Judgeval.
claude plugin marketplace add JudgmentLabs/judgeval-claude-plugin
claude plugin install trace-claude-code@judgeval-claude-plugin
See trace-claude-code/SKILL.md for setup instructions.
After installing, run the setup script in your project directory:
bash ~/.claude/plugins/marketplaces/judgeval-claude-plugin/skills/trace-claude-code/setup.sh
You'll need:
JUDGMENT_API_KEY - Get from Judgeval SettingsJUDGMENT_ORG_ID - Get from Organization SettingsClaude Code Session (root trace)
├── Turn 1: "Add error handling"
│ ├── LLM: claude-opus-4-5 (3.2s, 1,240 tokens)
│ ├── Read: src/app.ts
│ ├── Edit: src/app.ts
│ └── LLM: claude-opus-4-5 (1.8s, 890 tokens)
├── Turn 2: "Now run the tests"
│ ├── LLM: claude-opus-4-5
│ ├── Terminal: npm test
│ └── LLM: claude-opus-4-5
└── Turn 3: "Commit this"
└── ...
Captured data:
Test locally without marketplace:
claude --plugin-dir /path/to/judgeval-claude-plugin
After plugin updates are released:
claude plugin marketplace update judgeval-claude-plugin
claude plugin update trace-claude-code@judgeval-claude-plugin
MIT
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Skills for working with Judgment — the continuous-improvement stack for agents. Add tracing, evaluations, code judges, MCP server workflows, and monitoring with best practices.
npx claudepluginhub judgmentlabs/judgeval-claude-plugin --plugin judgevalEnables AI agents to use Braintrust for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts for common operations.
LLM observability tooling for agent development and Claude Code
OpenLit telemetry for Claude Code: sessions, tool calls, edit decisions, and cost rollups.
Claude Code skill pack for Langfuse LLM observability (24 skills)
Skills for adding DeepEval evaluations, tracing, datasets, Confident AI reports, and iterative improvement loops to AI applications.
Observability platform for Claude Code and Agent SDK — monitor, debug, and control AI coding agents