Skill

langfuse-monitoring

Monitors and analyzes LLM application data already in Langfuse — dashboards, metrics, and alerting for cost, latency, quality, and volume. Use whenever the user wants to observe or report on production Langfuse data: "monitor my LLM app", "build a Langfuse dashboard", "track cost / latency / quality over time", "Langfuse metrics API", "score analytics", "set up a spend alert", "alert me when costs spike", "dashboard for production monitoring", or interpreting usage/cost/quality trends. Owns operating-the-data (dashboards/metrics/alerting); defers instrumentation to the vendored `langfuse` skill and score/evaluator design to the `langfuse-evaluation` skill.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-langfuse-plugin:langfuse-monitoring

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

WebFetch(domain:langfuse.com)Bash(curl *langfuse.com/*)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill covers *operating the data* once it's in Langfuse: visualizing it (dashboards),

Supporting Files

references/alerting.mdreferences/dashboards.mdreferences/metrics-api.mdreferences/score-analytics.md

SKILL.md

75 lines · ~1k tokens

Stats

LanguagePython

Stars0

MaintenanceGood

Last CommitJun 16, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Langfuse Monitoring

This skill covers operating the data once it's in Langfuse: visualizing it (dashboards), extracting it (metrics API), analyzing eval scores (score analytics), and alerting. It does not cover emitting the data (that's instrumentation — vendored langfuse skill) or designing the scores (that's the langfuse-evaluation skill).

Operating principles

Distill judgment, fetch facts. This skill carries what to monitor and how to structure it. For exact UI steps and API schemas, fetch live docs by appending .md to the page URL (e.g. https://langfuse.com/docs/metrics/features/metrics-api.md).
You can only monitor what you instrumented. Cost, latency, userId, tags, release/version must be on the traces to slice by them. If a needed dimension is missing, the fix is upstream (instrumentation → vendored langfuse skill), not here.

Workflow

1. Frame the monitoring goal

Identify the metric family — cost, latency, quality, or volume — and the dimensions to slice by (trace name/feature, user, model, tags, release). See references/dashboards.md for the metric/dimension model.

2. Pick the surface

Visual, shareable, exploratory → dashboards. Start with curated (Latency/Cost/Usage), then build custom widgets. references/dashboards.md.
Programmatic / scheduled / billing / embed elsewhere → references/metrics-api.md (use v2).
Eval scores specifically (distributions, trends, judge-vs-human agreement) → references/score-analytics.md.

3. Build the standard dashboards

For a new production app, stand up the three durable dashboards (production health, cost optimization, quality/UX) from references/dashboards.md.

4. Set up alerting

Disambiguate first: Spend Alerts = your Langfuse Cloud bill, not app cost. For app-level cost/latency/quality alerts, build a Metrics-API-driven check. See references/alerting.md.

Bundled resources

references/dashboards.md — metrics & dimensions, curated vs custom dashboards, the widget model, and the three standard dashboards to build (health / cost / quality).
references/metrics-api.md — programmatic metrics: use v2, the query model (view/metrics/dimensions/filters/timeDimension), and v1→v2 migration gotchas.
references/score-analytics.md — zero-config eval-score analysis: distributions, trends, and judge-vs-human agreement metrics (MAE/RMSE, Cohen's Kappa/F1).
references/alerting.md — Spend Alerts (Cloud billing) vs application-level alerting (Metrics API + your own check); how to alert on cost/latency/quality.

Hand-off map

Need	Where
Dashboards, metrics extraction, score analytics, alerting	this skill
Emitting cost/latency/userId/tags on traces (instrumentation)	vendored `langfuse` skill
Designing the scores/evaluators being monitored	`langfuse-evaluation` skill
Onboarding-time spend-alert setup / production-readiness checklist	`langfuse-setup` skill
Formal judge calibration (vs lightweight score analytics)	vendored `langfuse` skill `judge-calibration.md`
Exact dashboard UI / metrics API schema	live docs (`.md`-append)

langfuse-monitoring

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

langfuse-monitoring

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Langfuse Monitoring

Operating principles

Workflow

1. Frame the monitoring goal

2. Pick the surface

3. Build the standard dashboards

4. Set up alerting

Bundled resources

Hand-off map

Similar Skills

Langfuse Monitoring

Operating principles

Workflow

1. Frame the monitoring goal

2. Pick the surface

3. Build the standard dashboards

4. Set up alerting

Bundled resources

Hand-off map

Similar Skills