By Alex-Kopylov
General-purpose Langfuse integration for data exploration, dashboard management, prompt versioning, datasets, and experiments.
Use this agent when the user wants to explore or discover what data exists in their Langfuse project. This includes listing scores, traces, models, tags, prompts, datasets, or querying metrics. This agent is read-only and never modifies data. <example> Context: User wants to understand what data is available in their Langfuse project. user: "What scores do I have in Langfuse?" assistant: "I'll use the langfuse-data-explorer agent to enumerate all scores in your project." <commentary> User is asking about available data - this is a discovery task, not a modification task. </commentary> </example> <example> Context: User needs to know what models are being tracked. user: "Show me what models are being used and their costs" assistant: "Let me use the langfuse-data-explorer agent to list all models and their pricing information." <commentary> Querying model information is a read-only discovery operation. </commentary> </example> <example> Context: User wants to run an analytics query. user: "What's the average latency by model over the last 7 days?" assistant: "I'll use the langfuse-data-explorer agent to query the metrics API for latency data." <commentary> Running metrics queries is a read-only operation handled by this agent. </commentary> </example>
Use this agent when the user wants to create, browse, populate, or manage Langfuse datasets and dataset items. This includes creating datasets, adding items, designing item schemas, archiving items, or exploring what datasets exist. This agent handles the data preparation layer for experiments. <example> Context: User wants to see what datasets exist in their Langfuse project. user: "What datasets do I have in Langfuse?" assistant: "I'll use the langfuse-dataset-expert agent to list all datasets and their items." <commentary> Listing datasets and browsing their contents is a dataset management task. </commentary> </example> <example> Context: User wants to create a new evaluation dataset. user: "Create a dataset for testing my SSP generation pipeline" assistant: "I'll use the langfuse-dataset-expert agent to create the dataset and help design the item schema." <commentary> Creating datasets with appropriate schemas is the dataset expert's core responsibility. </commentary> </example> <example> Context: User wants to add items to an existing dataset. user: "Add test cases to my benchmark dataset" assistant: "I'll use the langfuse-dataset-expert agent to create the dataset items." <commentary> Adding items to datasets โ whether manually or from traces โ is a dataset management operation. </commentary> </example> <example> Context: User wants to populate a dataset from existing traces. user: "Take the last 10 traces and add them as dataset items" assistant: "I'll use the langfuse-dataset-expert agent to extract trace data and create dataset items from it." <commentary> Creating dataset items from production traces requires querying traces and inserting items. </commentary> </example> <example> Context: User wants to design the schema for dataset items. user: "What should my dataset items look like for HTML controls experiments?" assistant: "I'll use the langfuse-dataset-expert agent to help design the item input schema." <commentary> Schema design guidance for dataset items is the dataset expert's domain. </commentary> </example>
Use this agent when the user wants to create, update, delete, list, inspect, or manage LLM-as-a-Judge evaluators in Langfuse. This agent operates directly on the Langfuse PostgreSQL database to manage eval_templates and job_configurations. <example> Context: User wants to see what evaluators exist. user: "List all my Langfuse evaluators" assistant: "I'll use the langfuse-eval-manager agent to query and list all evaluators in your project." <commentary> Listing evaluators requires querying eval_templates and job_configurations โ use the eval manager. </commentary> </example> <example> Context: User wants to create a new evaluation criterion. user: "Create an evaluator that checks if the output contains markdown" assistant: "I'll use the langfuse-eval-manager agent to create that LLM-as-a-Judge evaluator." <commentary> Creating an evaluator involves inserting into eval_templates and job_configurations โ use the eval manager. </commentary> </example> <example> Context: User wants to modify an evaluator's prompt or filters. user: "Update the factuality evaluator to also check for citations" assistant: "I'll use the langfuse-eval-manager agent to update that evaluator's template." <commentary> Updating an evaluator may require a new template version and job config changes โ use the eval manager. </commentary> </example> <example> Context: User wants to activate or deactivate evaluators. user: "Activate the tone evaluator" or "Pause all evaluators" assistant: "I'll use the langfuse-eval-manager agent to toggle the evaluator status." <commentary> Toggling evaluator status involves updating job_configurations.status โ use the eval manager. </commentary> </example> <example> Context: User wants to set up filters for an evaluator. user: "Set the relevance evaluator to only run on traces named 'chat-completion'" assistant: "I'll use the langfuse-eval-manager agent to configure the trace filter for that evaluator." <commentary> Configuring evaluator filters requires discovering available filter values and updating job_configurations โ use the eval manager. </commentary> </example>
Use this agent when the user wants to trigger experiments, browse dataset runs, analyze experiment results, compare runs, or configure remote experiment webhooks in Langfuse. This agent handles the experiment execution and analysis layer. <example> Context: User wants to trigger an experiment against a dataset. user: "Run an experiment on my test dataset" assistant: "I'll use the langfuse-experiment-manager agent to trigger the experiment." <commentary> Triggering experiments โ whether via webhook or SDK โ is the experiment manager's core responsibility. </commentary> </example> <example> Context: User wants to see what experiment runs exist. user: "Show me all experiment runs for the test dataset" assistant: "I'll use the langfuse-experiment-manager agent to list all dataset runs." <commentary> Browsing dataset runs and their metadata is an experiment management task. </commentary> </example> <example> Context: User wants to analyze results of an experiment run. user: "How did the latest experiment perform? Show me the scores." assistant: "I'll use the langfuse-experiment-manager agent to analyze the experiment results." <commentary> Deep analysis of experiment run results โ scores, pass/fail rates, per-item details โ is the experiment manager's domain. </commentary> </example> <example> Context: User wants to compare two experiment runs. user: "Compare the gpt-4o run against the gpt-4o-mini run" assistant: "I'll use the langfuse-experiment-manager agent to compare the two runs side by side." <commentary> Cross-run comparison requires joining run items, traces, and scores across multiple runs. </commentary> </example> <example> Context: User wants to set up remote experiment triggering from Langfuse UI. user: "Configure my dataset so I can trigger experiments from the Langfuse UI" assistant: "I'll use the langfuse-experiment-manager agent to set the remote experiment URL and payload on the dataset." <commentary> Configuring the webhook URL for Custom Experiment triggers is experiment infrastructure setup. </commentary> </example>
Use this agent when the user wants to create, update, delete, or manage Langfuse dashboard widgets and dashboards. This agent writes directly to the Langfuse PostgreSQL database. <example> Context: User wants to create a new visualization. user: "Create a chart showing average score over time" assistant: "I'll use the langfuse-widget-manager agent to create that visualization for you." <commentary> Creating a widget is a write operation requiring the widget manager agent. </commentary> </example> <example> Context: User wants to modify an existing widget. user: "Update my cost chart to show a breakdown by model instead" assistant: "Let me use the langfuse-widget-manager agent to update that widget's configuration." <commentary> Modifying widget configuration is a CRUD operation handled by the widget manager. </commentary> </example> <example> Context: User wants to set up a complete dashboard. user: "Create a new dashboard with cost and latency charts" assistant: "I'll use the langfuse-widget-manager agent to create the dashboard and add the requested widgets." <commentary> Dashboard creation with widgets involves multiple write operations โ use widget manager. </commentary> </example> <example> Context: User wants suggestions for what to visualize. user: "What visualizations should I create for my project?" assistant: "I'll use the langfuse-widget-manager agent to analyze your data and suggest relevant visualizations." <commentary> Suggesting widgets leads to creating them, so use the widget manager which has the suggest-widgets skill. </commentary> </example>
Use when the user wants to analyze experiment results, inspect scores from a dataset run, check pass/fail rates, review per-item outputs, or deep-dive into experiment performance. Trigger phrases: "analyze results", "experiment scores", "how did the experiment perform", "show results", "inspect run", "experiment analysis".
Use when the user wants to compare two or more experiment runs, detect regressions, see score deltas between runs, or evaluate model performance differences. Trigger phrases include "compare runs", "compare experiments", "diff runs", "regression check", "which run is better", "model comparison", "A/B comparison".
This skill should be used when the user wants to configure a Langfuse dataset for remote experiment triggering from the UI, set up a webhook URL, update the default experiment payload, or enable the Custom Experiment feature. Trigger phrases include "configure remote experiment", "set webhook URL", "enable custom experiment", "set up experiment trigger", "configure dataset webhook".
This skill should be used when the user wants to create a new Langfuse dataset, set up a dataset for benchmarking, or create a dataset with input/output schema validation. Trigger phrases include "create dataset", "new dataset", "set up dataset", "add dataset".
Use when the user wants to create a Langfuse LLM-as-a-Judge evaluator. Trigger phrases: "create evaluator", "add evaluator", "new evaluation", "set up evaluation criteria", "create judge". Handles prompt composition, schema validation, ID generation, SQL insertion into eval_templates and job_configurations, and post-creation verification.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Simple, robust, and versatile marketplace for agent plugins, forged for the chaos of the AI world.
It collects practical tools for Codex and Claude Code across LLM observability, API exploration, development workflows, assistant operations, research notes, cloud storage, local automation, runtime app verification, and job-search workflows.
flowchart TB
subgraph RowOne[" "]
direction LR
Langfuse["`**langfuse**
Trace exploration, datasets,
evaluators, dashboards, and experiments`"]:::langfuse
OpenAPITools["`**openapi-tools**
List and inspect OpenAPI endpoints
on running services`"]:::openapi
LLMApplicationDev["`**llm-application-dev**
Agent pattern selection and
schema-guided reasoning`"]:::llm
PythonDevWorkflow["`**python-dev-workflow**
Pytest, Redis test patterns,
Celery, and unit-test review agents`"]:::python
DevWorkflow["`**dev-workflow**
Commits, PRs, tickets, releases,
and review-comment workflows`"]:::dev
end
subgraph RowTwo[" "]
direction LR
WorkSessionTools["`**work-session-tools**
Daily notes, task tracking,
interviews, and team planning`"]:::session
AIAssistantOps["`**ai-assistant-ops**
Assistant setup audits, skill improvement,
harness adaptation, and Markdown cleanup`"]:::ops
OSTools["`**os-tools**
Local macOS automation utilities
for assistant workflows`"]:::os
CloudStorageTools["`**cloud-storage-tools**
User-file storage workflows for
Dropbox, Drive, OneDrive, and MEGA`"]:::storage
JobHuntToolkit["`**job-hunt-toolkit**
Versioned job applications with
resume tailoring and PDF checks`"]:::job
end
subgraph RowThree[" "]
direction LR
RunAndVerifyApp["`**run-and-verify-app**
Launch apps, verify runtime behavior,
and record run recipes`"]:::runverify
MermaidDiagrams["`**mermaid-diagrams**
Mermaid generation,
syntax references, and linting`"]:::mermaid
end
Langfuse ~~~ OpenAPITools ~~~ LLMApplicationDev ~~~ PythonDevWorkflow ~~~ DevWorkflow
WorkSessionTools ~~~ AIAssistantOps ~~~ Research ~~~ OSTools ~~~ CloudStorageTools ~~~ JobHuntToolkit
RowOne ~~~ RowTwo
RowTwo ~~~ RowThree
RunAndVerifyApp ~~~ MermaidDiagrams
classDef langfuse fill:#dff7ff,stroke:#0284c7,stroke-width:2px,color:#0f172a;
classDef openapi fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#052e16;
classDef llm fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#451a03;
classDef python fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#2e1065;
classDef dev fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#450a0a;
classDef runverify fill:#e0f2fe,stroke:#0369a1,stroke-width:2px,color:#082f49;
classDef session fill:#ccfbf1,stroke:#0f766e,stroke-width:2px,color:#042f2e;
classDef ops fill:#fce7f3,stroke:#db2777,stroke-width:2px,color:#500724;
classDef research fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#422006;
classDef os fill:#e0e7ff,stroke:#4f46e5,stroke-width:2px,color:#1e1b4b;
classDef storage fill:#ecfccb,stroke:#65a30d,stroke-width:2px,color:#1a2e05;
classDef job fill:#ffedd5,stroke:#ea580c,stroke-width:2px,color:#431407;
classDef mermaid fill:#f0fdf4,stroke:#059669,stroke-width:2px,color:#052e16;
style RowOne fill:transparent,stroke:transparent,color:transparent;
style RowTwo fill:transparent,stroke:transparent,color:transparent;
style RowThree fill:transparent,stroke:transparent,color:transparent;
Use this README when you want to install the marketplace, install a plugin, or
choose what each plugin is for. Developer and maintenance notes live in
AGENTS.md.
Add the marketplace:
codex plugin marketplace add Alex-Kopylov/zweihander
Install a plugin:
codex plugin add langfuse@zweihander
List available plugins:
codex plugin list
Update the installed marketplace:
codex plugin marketplace upgrade zweihander
Add the marketplace from inside Claude Code:
/plugin marketplace add Alex-Kopylov/zweihander
Install a plugin:
/plugin install langfuse@zweihander
Update the installed marketplace:
/plugin marketplace update zweihander
For scripts or automation, use the non-interactive CLI:
claude plugin marketplace add Alex-Kopylov/zweihander
claude plugin install langfuse@zweihander
claude plugin marketplace update zweihander
plugin@zweihander, for example
langfuse@zweihander.langfuseUse when: you need to inspect Langfuse data, create or update evaluation assets, compare experiment runs, or manage dashboard widgets.
Skills
Generate and validate Mermaid diagrams with synced syntax references.
Skills for listing and inspecting OpenAPI endpoints on running services.
Python-specific pytest, Redis test patterns, Celery, and unit-test review/execution agents.
LLM application design, agent pattern selection, and schema-guided reasoning patterns.
Research knowledge-base and Obsidian vault workflows for agent-maintained notes.
npx claudepluginhub alex-kopylov/zweihander --plugin langfuseIntelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
TypeScript/JavaScript full-stack development with NestJS, React, and React Native
Complete AI coding workflow system. Self-correcting memory + persistent FTS5-indexed research wikis + auto-research loop + multi-LLM council on a single SQLite store. 33 skills, 8 agents, 22 commands, 37 hook scripts across 24 events. Cross-agent via SkillKit.
Open-source, local-first Claude Code plugin for token reduction, context compression, and cost optimization using hybrid RAG retrieval (BM25 + vector search), reranking, AST-aware chunking, and compact context packets.