Flightdeck is a self-hosted observability and control plane for production and coding agents.

Every LLM call, MCP event, and tool call your agents make streams to the dashboard as it happens, surfaced as a per-agent timeline and as a live fleet-wide feed.

Set token budgets, MCP allow/block rules, and live directives on your production agents.

Coding agents attach via the Claude Code plugin in this repo.

Production agents add the flightdeck-sensor Python package to their entrypoint - init() + patch(), no other code changes.

Live fleet view: every agent on a shared timeline streaming events as agents run.

Agents dashboard: every agent in your fleet with token, latency, error, and cost trends, plus a per-agent swimlane and feed.

Events search: filter every LLM call, tool use, and policy event by agent, type, framework, and MCP server.

Quickstart

Prerequisites: Docker Engine 28+ with Compose v2. Python 3.10+ for the sensor path; Claude Code for the plugin path.

Start the stack:

git clone https://github.com/flightdeckhq/flightdeck
cd flightdeck
make dev

Dashboard at http://localhost:4000. The dev stack seeds a test token tok_dev automatically.

Coding agents (Claude Code)

Launch Claude Code, then install the plugin from this repo's marketplace inside the REPL:

/plugin marketplace add flightdeckhq/flightdeck
/plugin install flightdeck@flightdeck-plugins

That's it for a local stack — the plugin defaults to http://localhost:4000 with the dev token tok_dev, so the Claude Code session shows up in the fleet view within seconds. Tool inputs and LLM call content are captured by default — unlike the Python sensor, which keeps capture_prompts=False until you opt in — so the Prompts tab is populated without extra setup.

To point the plugin at a different stack (production, a remote dev server, etc.) export the env vars in the shell before launching claude — the plugin reads them at every SessionStart:

export FLIGHTDECK_SERVER="https://flightdeck.example.com"
export FLIGHTDECK_TOKEN="ftd_..."
claude

To use a local checkout instead of the marketplace: claude --plugin-dir /path/to/flightdeck/plugin.

Production agents

Install the sensor and point your agent at it:

pip install flightdeck-sensor

import flightdeck_sensor

flightdeck_sensor.init(
    server="http://localhost:4000/ingest",
    token="tok_dev",
)
flightdeck_sensor.patch()

# Your existing agent code. Nothing changes.
import anthropic
client = anthropic.Anthropic()
client.messages.create(model="claude-sonnet-4-6", ...)

The agent shows up in the fleet view within seconds.

To run the sensor from source instead of PyPI: pip install -e sensor/ from the repo root.

Playground

Working examples for every supported framework live in playground/. Each script costs cents per run and exercises the sensor against real LLM APIs.

make playground-anthropic    # Anthropic direct
make playground-openai       # OpenAI direct
make playground-langchain    # LangChain + ChatAnthropic / ChatOpenAI
make playground-langgraph    # LangGraph agent loops
make playground-llamaindex   # LlamaIndex
make playground-crewai       # CrewAI multi-agent
make playground-mcp          # MCP tool calls
make playground-policies     # token policy enforcement

make playground-all          # everything (~$0.50/run)

Each script self-skips when its API keys aren't set, so make playground-all runs cleanly on any box and only exercises what you have credentials for. The flavor field on each session names the playground script that produced it, so you can find them on the dashboard. See playground/README.md for the full matrix.

Coverage

LLM SDKs

Provider	Chat	Embeddings	Streaming	Errors
Anthropic	`messages.create`, `messages.stream`, `beta.messages.*` (sync + async)	route via litellm to Voyage	sync + async	14-entry `llm_error` taxonomy
OpenAI	`chat.completions.create`, `responses.create` (sync + async)	`embeddings.create` (sync + async)	sync + async	same
litellm	`litellm.completion`, `litellm.acompletion` (chat path only)	`litellm.embedding`, `litellm.aembedding`	sync only	same

Streaming events expose payload.streaming = {ttft_ms, chunk_count, inter_chunk_ms, final_outcome, abort_reason}. Mid-stream aborts emit llm_error{error_type="stream_error"} with partial-chunk and partial-token data.

flightdeck

Popularity

What's Inside

README

Quickstart

Coding agents (Claude Code)

Production agents

Playground

Coverage

LLM SDKs

Frameworks

Confidence

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent