Skill

LLM Dashboard

Open the MultiLLM dashboard showing sessions, token usage, costs, and backend status. Use when the user asks about LLM usage, costs, dashboard, or wants to see model statistics.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/multillm:llm-dashboard

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Open the MultiLLM dashboard and provide a usage summary.

SKILL.md

59 lines · ~778 tokens

Stats

LanguageShell

Parent stars0

MaintenanceGood

Last CommitMar 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

LLM Dashboard

Open the MultiLLM dashboard and provide a usage summary.

If the user asks for an hourly check, use hours=1 unless they specify another short window.

Steps

Fetch token usage summary:

curl -s 'http://localhost:8080/api/dashboard?hours=HOURS' | python3 -c "
import sys, json
d = json.load(sys.stdin)
t = d.get('totals', {})
total = (t.get('total_input',0) or 0) + (t.get('total_output',0) or 0) + (t.get('total_cache_read_input',0) or 0) + (t.get('total_cache_creation_input',0) or 0)
reqs = t.get('total_requests',0) or 0
sessions = d.get('session_count',0) or 0
cost = t.get('total_cost',0) or 0
derived = d.get('derived', {})
print(f'Window:   {d.get(\"hours\", \"HOURS\")}h')
print(f'Requests: {reqs:,}  |  Tokens: {total:,} ({t.get(\"total_input\",0):,} in / {t.get(\"total_output\",0):,} out / {t.get(\"total_cache_read_input\",0):,} cache read / {t.get(\"total_cache_creation_input\",0):,} cache write)  |  Cost: \${cost:.4f}')
print(f'Rates:    {derived.get(\"avg_requests_per_session\",0):.2f} req/session  |  {derived.get(\"avg_tokens_per_request\",0):.1f} tok/req  |  \${derived.get(\"avg_cost_per_request\",0):.6f}/req')
for m in d.get('by_model', [])[:10]:
    tok = (m.get('input_tokens',0) or 0) + (m.get('output_tokens',0) or 0) + (m.get('cache_read_input_tokens',0) or 0) + (m.get('cache_creation_input_tokens',0) or 0)
    reqs = m.get('requests',0) or 0
    print(f'  {m[\"model_alias\"]:25s} {reqs:4d} reqs  {tok:>10,} tok  cr {m.get(\"cache_read_input_tokens\",0):>8,}  cw {m.get(\"cache_creation_input_tokens\",0):>8,}  {(tok / reqs) if reqs else 0:>8.1f} tok/req  \${m.get(\"cost_usd\",0):.4f}')
"

Fetch recent sessions:

curl -s 'http://localhost:8080/api/sessions?hours=HOURS&limit=10' | python3 -c "
import sys, json
from datetime import datetime
for s in json.load(sys.stdin):
    started = datetime.fromtimestamp(s['started_at']).strftime('%b %d %H:%M')
    models = ', '.join(s.get('models_used', []))
    tok = (s.get('total_input_tokens',0) or 0) + (s.get('total_output_tokens',0) or 0)
    reqs = s.get('total_requests',0) or 0
    print(f'  {started} [{s[\"project\"]}] {reqs} reqs, {tok:,} tok, {(tok / reqs) if reqs else 0:.1f} tok/req, \${s.get(\"total_cost_usd\",0):.4f} -- {models}')
"

Present a formatted summary of:
- Total requests, tokens, and estimated costs per model
- Derived calculations such as request/session, token/request, and cost/request
- Cache-aware token classes: input, output, cache read, and cache write
- Active backends and available models
- Recent sessions with duration and models used
- Hourly rates when the user asks for a short window such as 1h, 3h, 6h, or 12h
Direct the user to the full dashboard: http://localhost:8080/dashboard
If the gateway is not running, inform them to start it: python -m multillm.gateway

LLM Dashboard

Invocation

Context Preview

SKILL.md

LLM Dashboard

Invocation

Context Preview

SKILL.md

LLM Dashboard

Steps

Similar Skills

LLM Dashboard

Steps

Similar Skills