Search everything...

Stats

Actions

Available In

langfuse

Name: langfuse
Author: alex-kopylov

By Alex-Kopylov

General-purpose Langfuse integration for data exploration, dashboard management, prompt versioning, datasets, and experiments.

npx claudepluginhub alex-kopylov/zweihander --plugin langfuse

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents5

langfuse-data-explorer

/langfuse-data-explorer

Use this agent when the user wants to explore or discover what data exists in their Langfuse project. This includes listing scores, traces, models, tags, prompts, datasets, or querying metrics. This agent is read-only and never modifies data. <example> Context: User wants to understand what data is available in their Langfuse project. user: "What scores do I have in Langfuse?" assistant: "I'll use the langfuse-data-explorer agent to enumerate all scores in your project." <commentary> User is asking about available data - this is a discovery task, not a modification task. </commentary> </example> <example> Context: User needs to know what models are being tracked. user: "Show me what models are being used and their costs" assistant: "Let me use the langfuse-data-explorer agent to list all models and their pricing information." <commentary> Querying model information is a read-only discovery operation. </commentary> </example> <example> Context: User wants to run an analytics query. user: "What's the average latency by model over the last 7 days?" assistant: "I'll use the langfuse-data-explorer agent to query the metrics API for latency data." <commentary> Running metrics queries is a read-only operation handled by this agent. </commentary> </example>

langfuse-dataset-expert

/langfuse-dataset-expert

Use this agent when the user wants to create, browse, populate, or manage Langfuse datasets and dataset items. This includes creating datasets, adding items, designing item schemas, archiving items, or exploring what datasets exist. This agent handles the data preparation layer for experiments. <example> Context: User wants to see what datasets exist in their Langfuse project. user: "What datasets do I have in Langfuse?" assistant: "I'll use the langfuse-dataset-expert agent to list all datasets and their items." <commentary> Listing datasets and browsing their contents is a dataset management task. </commentary> </example> <example> Context: User wants to create a new evaluation dataset. user: "Create a dataset for testing my SSP generation pipeline" assistant: "I'll use the langfuse-dataset-expert agent to create the dataset and help design the item schema." <commentary> Creating datasets with appropriate schemas is the dataset expert's core responsibility. </commentary> </example> <example> Context: User wants to add items to an existing dataset. user: "Add test cases to my benchmark dataset" assistant: "I'll use the langfuse-dataset-expert agent to create the dataset items." <commentary> Adding items to datasets — whether manually or from traces — is a dataset management operation. </commentary> </example> <example> Context: User wants to populate a dataset from existing traces. user: "Take the last 10 traces and add them as dataset items" assistant: "I'll use the langfuse-dataset-expert agent to extract trace data and create dataset items from it." <commentary> Creating dataset items from production traces requires querying traces and inserting items. </commentary> </example> <example> Context: User wants to design the schema for dataset items. user: "What should my dataset items look like for HTML controls experiments?" assistant: "I'll use the langfuse-dataset-expert agent to help design the item input schema." <commentary> Schema design guidance for dataset items is the dataset expert's domain. </commentary> </example>

langfuse-eval-manager

/langfuse-eval-manager

Use this agent when the user wants to create, update, delete, list, inspect, or manage LLM-as-a-Judge evaluators in Langfuse. This agent operates directly on the Langfuse PostgreSQL database to manage eval_templates and job_configurations. <example> Context: User wants to see what evaluators exist. user: "List all my Langfuse evaluators" assistant: "I'll use the langfuse-eval-manager agent to query and list all evaluators in your project." <commentary> Listing evaluators requires querying eval_templates and job_configurations — use the eval manager. </commentary> </example> <example> Context: User wants to create a new evaluation criterion. user: "Create an evaluator that checks if the output contains markdown" assistant: "I'll use the langfuse-eval-manager agent to create that LLM-as-a-Judge evaluator." <commentary> Creating an evaluator involves inserting into eval_templates and job_configurations — use the eval manager. </commentary> </example> <example> Context: User wants to modify an evaluator's prompt or filters. user: "Update the factuality evaluator to also check for citations" assistant: "I'll use the langfuse-eval-manager agent to update that evaluator's template." <commentary> Updating an evaluator may require a new template version and job config changes — use the eval manager. </commentary> </example> <example> Context: User wants to activate or deactivate evaluators. user: "Activate the tone evaluator" or "Pause all evaluators" assistant: "I'll use the langfuse-eval-manager agent to toggle the evaluator status." <commentary> Toggling evaluator status involves updating job_configurations.status — use the eval manager. </commentary> </example> <example> Context: User wants to set up filters for an evaluator. user: "Set the relevance evaluator to only run on traces named 'chat-completion'" assistant: "I'll use the langfuse-eval-manager agent to configure the trace filter for that evaluator." <commentary> Configuring evaluator filters requires discovering available filter values and updating job_configurations — use the eval manager. </commentary> </example>

langfuse-experiment-manager

/langfuse-experiment-manager

Use this agent when the user wants to trigger experiments, browse dataset runs, analyze experiment results, compare runs, or configure remote experiment webhooks in Langfuse. This agent handles the experiment execution and analysis layer. <example> Context: User wants to trigger an experiment against a dataset. user: "Run an experiment on my test dataset" assistant: "I'll use the langfuse-experiment-manager agent to trigger the experiment." <commentary> Triggering experiments — whether via webhook or SDK — is the experiment manager's core responsibility. </commentary> </example> <example> Context: User wants to see what experiment runs exist. user: "Show me all experiment runs for the test dataset" assistant: "I'll use the langfuse-experiment-manager agent to list all dataset runs." <commentary> Browsing dataset runs and their metadata is an experiment management task. </commentary> </example> <example> Context: User wants to analyze results of an experiment run. user: "How did the latest experiment perform? Show me the scores." assistant: "I'll use the langfuse-experiment-manager agent to analyze the experiment results." <commentary> Deep analysis of experiment run results — scores, pass/fail rates, per-item details — is the experiment manager's domain. </commentary> </example> <example> Context: User wants to compare two experiment runs. user: "Compare the gpt-4o run against the gpt-4o-mini run" assistant: "I'll use the langfuse-experiment-manager agent to compare the two runs side by side." <commentary> Cross-run comparison requires joining run items, traces, and scores across multiple runs. </commentary> </example> <example> Context: User wants to set up remote experiment triggering from Langfuse UI. user: "Configure my dataset so I can trigger experiments from the Langfuse UI" assistant: "I'll use the langfuse-experiment-manager agent to set the remote experiment URL and payload on the dataset." <commentary> Configuring the webhook URL for Custom Experiment triggers is experiment infrastructure setup. </commentary> </example>

langfuse-widget-manager

/langfuse-widget-manager

Use this agent when the user wants to create, update, delete, or manage Langfuse dashboard widgets and dashboards. This agent writes directly to the Langfuse PostgreSQL database. <example> Context: User wants to create a new visualization. user: "Create a chart showing average score over time" assistant: "I'll use the langfuse-widget-manager agent to create that visualization for you." <commentary> Creating a widget is a write operation requiring the widget manager agent. </commentary> </example> <example> Context: User wants to modify an existing widget. user: "Update my cost chart to show a breakdown by model instead" assistant: "Let me use the langfuse-widget-manager agent to update that widget's configuration." <commentary> Modifying widget configuration is a CRUD operation handled by the widget manager. </commentary> </example> <example> Context: User wants to set up a complete dashboard. user: "Create a new dashboard with cost and latency charts" assistant: "I'll use the langfuse-widget-manager agent to create the dashboard and add the requested widgets." <commentary> Dashboard creation with widgets involves multiple write operations — use widget manager. </commentary> </example> <example> Context: User wants suggestions for what to visualize. user: "What visualizations should I create for my project?" assistant: "I'll use the langfuse-widget-manager agent to analyze your data and suggest relevant visualizations." <commentary> Suggesting widgets leads to creating them, so use the widget manager which has the suggest-widgets skill. </commentary> </example>

Skills27

analyze-experiment-results

/analyze-experiment-results

Use when the user wants to analyze experiment results, inspect scores from a dataset run, check pass/fail rates, review per-item outputs, or deep-dive into experiment performance. Trigger phrases: "analyze results", "experiment scores", "how did the experiment perform", "show results", "inspect run", "experiment analysis".

compare-experiments

/compare-experiments

Use when the user wants to compare two or more experiment runs, detect regressions, see score deltas between runs, or evaluate model performance differences. Trigger phrases include "compare runs", "compare experiments", "diff runs", "regression check", "which run is better", "model comparison", "A/B comparison".

configure-remote-experiment

/configure-remote-experiment

This skill should be used when the user wants to configure a Langfuse dataset for remote experiment triggering from the UI, set up a webhook URL, update the default experiment payload, or enable the Custom Experiment feature. Trigger phrases include "configure remote experiment", "set webhook URL", "enable custom experiment", "set up experiment trigger", "configure dataset webhook".

create-dataset

/create-dataset

This skill should be used when the user wants to create a new Langfuse dataset, set up a dataset for benchmarking, or create a dataset with input/output schema validation. Trigger phrases include "create dataset", "new dataset", "set up dataset", "add dataset".

create-evaluator

/create-evaluator

Use when the user wants to create a Langfuse LLM-as-a-Judge evaluator. Trigger phrases: "create evaluator", "add evaluator", "new evaluation", "set up evaluation criteria", "create judge". Handles prompt composition, schema validation, ID generation, SQL insertion into eval_templates and job_configurations, and post-creation verification.

Stats

Version0.1.2

ReleasedMay 30, 2026

LanguagePython

Stars1

MaintenanceGood

Last CommitMay 29, 2026

AddedJun 1, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

zweihander5

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

🗡️ Zweihander

Simple, robust, and versatile marketplace for agent plugins, forged for the chaos of the AI world.

It collects practical tools for Codex and Claude Code across LLM observability, API exploration, development workflows, assistant operations, research notes, cloud storage, local automation, runtime app verification, and job-search workflows.

Plugin Catalog

flowchart TB

subgraph RowOne[" "]
  direction LR
  Langfuse["`**langfuse**
  Trace exploration, datasets,
  evaluators, dashboards, and experiments`"]:::langfuse
  OpenAPITools["`**openapi-tools**
  List and inspect OpenAPI endpoints
  on running services`"]:::openapi
  LLMApplicationDev["`**llm-application-dev**
  Agent pattern selection and
  schema-guided reasoning`"]:::llm
  PythonDevWorkflow["`**python-dev-workflow**
  Pytest, Redis test patterns,
  Celery, and unit-test review agents`"]:::python
  DevWorkflow["`**dev-workflow**
  Commits, PRs, tickets, releases,
  and review-comment workflows`"]:::dev
end

subgraph RowTwo[" "]
  direction LR
  WorkSessionTools["`**work-session-tools**
  Daily notes, task tracking,
  interviews, and team planning`"]:::session
  AIAssistantOps["`**ai-assistant-ops**
  Assistant setup audits, skill improvement,
  harness adaptation, and Markdown cleanup`"]:::ops
  OSTools["`**os-tools**
  Local macOS automation utilities
  for assistant workflows`"]:::os
  CloudStorageTools["`**cloud-storage-tools**
  User-file storage workflows for
  Dropbox, Drive, OneDrive, and MEGA`"]:::storage
  JobHuntToolkit["`**job-hunt-toolkit**
  Versioned job applications with
  resume tailoring and PDF checks`"]:::job
end

subgraph RowThree[" "]
  direction LR
  RunAndVerifyApp["`**run-and-verify-app**
  Launch apps, verify runtime behavior,
  and record run recipes`"]:::runverify
  MermaidDiagrams["`**mermaid-diagrams**
  Mermaid generation,
  syntax references, and linting`"]:::mermaid
end

Langfuse ~~~ OpenAPITools ~~~ LLMApplicationDev ~~~ PythonDevWorkflow ~~~ DevWorkflow
WorkSessionTools ~~~ AIAssistantOps ~~~ Research ~~~ OSTools ~~~ CloudStorageTools ~~~ JobHuntToolkit
RowOne ~~~ RowTwo
RowTwo ~~~ RowThree
RunAndVerifyApp ~~~ MermaidDiagrams

classDef langfuse fill:#dff7ff,stroke:#0284c7,stroke-width:2px,color:#0f172a;
classDef openapi fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#052e16;
classDef llm fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#451a03;
classDef python fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#2e1065;
classDef dev fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#450a0a;
classDef runverify fill:#e0f2fe,stroke:#0369a1,stroke-width:2px,color:#082f49;
classDef session fill:#ccfbf1,stroke:#0f766e,stroke-width:2px,color:#042f2e;
classDef ops fill:#fce7f3,stroke:#db2777,stroke-width:2px,color:#500724;
classDef research fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#422006;
classDef os fill:#e0e7ff,stroke:#4f46e5,stroke-width:2px,color:#1e1b4b;
classDef storage fill:#ecfccb,stroke:#65a30d,stroke-width:2px,color:#1a2e05;
classDef job fill:#ffedd5,stroke:#ea580c,stroke-width:2px,color:#431407;
classDef mermaid fill:#f0fdf4,stroke:#059669,stroke-width:2px,color:#052e16;
style RowOne fill:transparent,stroke:transparent,color:transparent;
style RowTwo fill:transparent,stroke:transparent,color:transparent;
style RowThree fill:transparent,stroke:transparent,color:transparent;

Notes for Users

Use this README when you want to install the marketplace, install a plugin, or choose what each plugin is for. Developer and maintenance notes live in AGENTS.md.

Quick Install

Codex

Add the marketplace:

codex plugin marketplace add Alex-Kopylov/zweihander

Install a plugin:

codex plugin add langfuse@zweihander

List available plugins:

codex plugin list

Update the installed marketplace:

codex plugin marketplace upgrade zweihander

Claude Code

Add the marketplace from inside Claude Code:

/plugin marketplace add Alex-Kopylov/zweihander

Install a plugin:

/plugin install langfuse@zweihander

Update the installed marketplace:

/plugin marketplace update zweihander

For scripts or automation, use the non-interactive CLI:

claude plugin marketplace add Alex-Kopylov/zweihander
claude plugin install langfuse@zweihander
claude plugin marketplace update zweihander

How to Use

Add this marketplace to Codex or Claude Code.
Pick a plugin from the catalog below.
Install the plugin with plugin@zweihander, for example langfuse@zweihander.
Ask the assistant naturally for the workflow you want. The installed plugin contributes skills, agents, or both.

Plugins

`langfuse`

Use when: you need to inspect Langfuse data, create or update evaluation assets, compare experiment runs, or manage dashboard widgets.

Skills

View full README on GitHub

langfuse

Popularity

What's Inside

Confidence

README

🗡️ Zweihander

Plugin Catalog

Notes for Users

Quick Install

Codex

Claude Code

How to Use

Plugins

langfuse

Similar Plugins

drawio-diagramming

creative-writing

developer-kit-typescript

pro-workflow

claude-token-reducer

More by Alex-Kopylov

mermaid-diagrams

openapi-tools

python-dev-workflow

llm-application-dev

research

🗡️ Zweihander

Plugin Catalog

Notes for Users

Quick Install

Codex

Claude Code

How to Use

Plugins

langfuse

More by Alex-Kopylov

mermaid-diagrams

openapi-tools

python-dev-workflow

llm-application-dev

research

Popularity

Health & Quality

Similar Plugins

drawio-diagramming

creative-writing

developer-kit-typescript

pro-workflow

claude-token-reducer

`langfuse`

`langfuse`