By jbaham2
Expert system for Langfuse setup, observability, prompt management, evaluation, and monitoring. Bundles distilled knowledge, skills, agents, commands, and hooks.
Plan or operate a self-hosted Langfuse deployment — sizing, scaling, backups, upgrades, security.
Design and set up a Langfuse evaluation — method choice, scores, datasets/experiments, online/offline.
Check a Langfuse connection/project is healthy — auth, ingestion, and recent activity.
Build or query Langfuse monitoring — dashboards, metrics (cost/latency/quality/volume), alerting.
Guide Langfuse onboarding and setup — deployment choice, keys, first trace, prod-readiness.
Designs a complete Langfuse evaluation strategy for an LLM application — choosing methods, defining scores, shaping datasets/experiments, and planning online + offline evaluation. Use when the user asks to "design an eval strategy", "figure out how to evaluate my agent/RAG/chatbot", "what should I measure and how", or wants a structured evaluation plan rather than ad-hoc scoring. Explores the codebase when available to ground the plan in the actual application.
Diagnoses why a Langfuse setup isn't working — traces not appearing, auth/connection errors, region/key mismatches, SDK init problems, flush-on-exit issues. Use when the user says "my traces aren't showing up", "Langfuse auth is failing", "can't connect to Langfuse", "is my Langfuse setup working", or a first-trace check fails. Read-only: it diagnoses and recommends, never mutates data.
Reviews Langfuse traces/observations at scale to find, classify, and quantify failure modes, then recommends fixes and regression test cases. Use when the user says "review my traces", "what's going wrong in production", "analyze failures in my Langfuse traces", "find error patterns", or wants systematic error analysis rather than reading traces one by one. Read-only: it analyzes and recommends, never mutates data.
Operating a self-hosted Langfuse deployment — architecture, sizing, scaling, backups, upgrades, and security. Use whenever the user is running or planning to run Langfuse on their own infrastructure: "operate / run self-hosted Langfuse", "deploy Langfuse on Kubernetes / Docker / AWS / GCP / Azure", "Langfuse sizing / resource requirements", "scale Langfuse / ingestion throughput", "back up Langfuse", "upgrade Langfuse / background migrations", "Langfuse SSO / encryption / VPC / air-gapped", or "Langfuse production deployment". Owns HOW to run self-hosted Langfuse well; the self-host-vs-Cloud and tier decision lives in the `langfuse-setup` skill, and exact configs in live docs.
Designs and runs LLM evaluation with Langfuse — the strategy and workflow layer for scoring quality, building datasets, and running experiments. Use whenever the user is evaluating LLM output quality with Langfuse: "evaluate my LLM app", "which eval method should I use", "set up LLM-as-a-judge", "create a dataset / run an experiment", "score my traces", "offline vs online evaluation", "test prompt changes before deploying", "build a regression test set", or interpreting experiment results. Owns eval STRATEGY and the datasets/experiments/scores workflow; defers judge calibration and CI/CD experiment code to the vendored `langfuse` skill, and exact SDK code to live docs.
Monitors and analyzes LLM application data already in Langfuse — dashboards, metrics, and alerting for cost, latency, quality, and volume. Use whenever the user wants to observe or report on production Langfuse data: "monitor my LLM app", "build a Langfuse dashboard", "track cost / latency / quality over time", "Langfuse metrics API", "score analytics", "set up a spend alert", "alert me when costs spike", "dashboard for production monitoring", or interpreting usage/cost/quality trends. Owns operating-the-data (dashboards/metrics/alerting); defers instrumentation to the vendored `langfuse` skill and score/evaluator design to the `langfuse-evaluation` skill.
Orchestrates Langfuse adoption decisions and production-readiness — the planning the official `langfuse` skill doesn't cover. Use whenever the user is deciding HOW to adopt Langfuse or whether their setup is ready: "set up Langfuse", "Langfuse Cloud or self-host", "which Langfuse region", "configure Langfuse keys/env", "is my Langfuse setup production ready", "Langfuse prod checklist", "my traces aren't showing up", or planning a Langfuse rollout. Defers instrumentation CODE to the vendored `langfuse` skill — this skill owns the decisions, order, and verification around it.
Interact with Langfuse and access its documentation. Use when needing to (1) query or modify Langfuse data programmatically via the CLI — traces, prompts, datasets, scores, sessions, and any other API resource, (2) look up Langfuse documentation, concepts, integration guides, or SDK usage, or (3) understand how any Langfuse feature works. This skill covers CLI-based API access (via npx) and multiple documentation retrieval methods.
Modifies files
Hook triggers on file write and edit operations
External network access
Connects to servers outside your machine
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
A Claude Code plugin that turns Claude into an expert at Langfuse — instrumentation, setup, evaluation, monitoring, and self-hosting — for Langfuse Cloud and self-hosted deployments, across the Python and JS/TS SDKs and 50+ framework integrations.
It bundles 5 skills, 6 commands, 3 agents, 1 safety hook, and both Langfuse MCP servers, designed on one principle: distill durable judgment, fetch the facts live. The plugin carries the decisions and workflows (which eval method when, how to size a deployment, how to read an experiment); it fetches exact, version-sensitive code from the live Langfuse docs at runtime so it never goes stale.
| Skill | What it owns |
|---|---|
langfuse (vendored, official) | Instrumentation code, the langfuse-cli, live-docs access, prompt migration, judge calibration, CI/CD experiment gates, error analysis, SDK upgrades |
langfuse-setup | Adoption decisions (Cloud vs self-host, region), onboarding sequence, first-trace verification, production-readiness checklist |
langfuse-evaluation | Eval strategy — methods, scores model, datasets/experiments, LLM-as-a-judge, code evaluators, human annotation, RAG/agent/multi-turn/external-pipeline evals, experiment interpretation |
langfuse-monitoring | Dashboards, the Metrics API, score analytics, alerting (Spend-Alerts vs app cost) |
langfuse-deployment | Self-hosting: architecture & sizing, scaling, backups/upgrades/migrations, security & SSO |
| Command | Does |
|---|---|
/lf-setup | Guides onboarding — deployment choice, keys, first trace, prod-readiness |
/lf-eval | Designs/sets up an evaluation — method, scores, datasets/experiments, online/offline |
/lf-monitor | Builds or queries monitoring — dashboards, metrics, alerting |
/lf-deploy | Plans/operates a self-hosted deployment — sizing, scaling, backups, security |
/lf-trace-review | Reviews traces to find, classify, and quantify failure modes |
/lf-health-check | Checks a connection/project: auth, ingestion, recent activity |
| Agent | Does | Access |
|---|---|---|
eval-designer | Explores your codebase and produces a concrete, app-specific evaluation plan | read + reasoning |
setup-doctor | Diagnoses why a setup isn't working (traces missing, auth, region mismatch, flush) | read-only |
trace-reviewer | Builds a quantified failure taxonomy from your traces and proposes fixes + regression cases | read-only |
A non-blocking PostToolUse check that warns if you accidentally hardcode a Langfuse secret key
(sk-lf-…) in an edited file. High-signal, never blocks your edit.
.mcp.json)langfuse-docs — public docs search/retrieval (no auth). 3 tools.langfuse-data-platform — authenticated access to your project's prompts, traces,
observations, scores, datasets, evaluators, metrics, and getHealth. 61 tools.cloud.langfuse.com, or self-hosted) and a
project API key pair (pk-lf-… / sk-lf-…) from Project Settings → API Keys.Local (development / trying it out):
git clone https://github.com/jbaham2/claude-langfuse-plugin
claude --plugin-dir /path/to/claude-langfuse-plugin
Via a marketplace (shareable install):
/plugin marketplace add jbaham2/claude-langfuse-plugin
/plugin install claude-langfuse-plugin
(Exact marketplace commands can vary by Claude Code version — see the Claude Code plugin docs.)
Copy the template and fill it in:
cp skills/langfuse-setup/assets/.env.example .env
Set in your shell or .env:
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_BASE_URL="https://cloud.langfuse.com" # your region or self-host URL
export LANGFUSE_HOST="$LANGFUSE_BASE_URL" # some tools read HOST instead
The authenticated MCP uses Basic Auth. Generate the token and export it before launching Claude
Code (the value is interpolated into .mcp.json at load):
export LANGFUSE_MCP_AUTH="$(printf '%s:%s' "$LANGFUSE_PUBLIC_KEY" "$LANGFUSE_SECRET_KEY" | base64)"
Set the right region endpoint in .mcp.json (the url of langfuse-data-platform):
npx claudepluginhub jbaham2/claude-langfuse-plugin --plugin claude-langfuse-pluginTurn any service's documentation into a Claude Code expert plugin: map the docs, fill the builder prompt, then build the plugin — as guided slash commands.
Makes Claude an expert at LlamaCloud (Parse, Extract, Classify, Split, Sheets, Cloud Index), LiteParse, and the LlamaIndex framework + agentic Workflows. Skills hold durable judgment; the wired LlamaIndex docs MCP fetches current API facts live.
Makes Claude an expert at herdr, the agent-aware terminal multiplexer: herding fleets of agents, designing and restoring workspace/tab/pane layouts, monitoring agent state, managing sessions, and authoring configuration. Vendors the official herdr agent skill and complements it with workflow judgment.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Permanent coding companion for Claude Code — survives any update. MCP-based terminal pet with ASCII art, stats, reactions, and personality.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications