Flightdeck is a self-hosted observability and control plane for production and coding agents.
Every LLM call, MCP event, and tool call your agents make streams to the dashboard as it happens, surfaced as a per-agent timeline and as a live fleet-wide feed.
Set token budgets, MCP allow/block rules, and live directives on your production agents.
Coding agents attach via the Claude Code plugin in this repo.
Production agents add the flightdeck-sensor Python package to their entrypoint - init() + patch(), no other code changes.



Quickstart
Prerequisites: Docker Engine 28+ with Compose v2. Python 3.10+ for the sensor path; Claude Code for the plugin path.
Start the stack:
git clone https://github.com/flightdeckhq/flightdeck
cd flightdeck
make dev
Dashboard at http://localhost:4000. The dev stack seeds a test token tok_dev automatically.
Coding agents (Claude Code)
Launch Claude Code, then install the plugin from this repo's marketplace inside the REPL:
/plugin marketplace add flightdeckhq/flightdeck
/plugin install flightdeck@flightdeck-plugins
That's it for a local stack — the plugin defaults to http://localhost:4000 with the dev token tok_dev, so the Claude Code session shows up in the fleet view within seconds. Tool inputs and LLM call content are captured by default — unlike the Python sensor, which keeps capture_prompts=False until you opt in — so the Prompts tab is populated without extra setup.
To point the plugin at a different stack (production, a remote dev server, etc.) export the env vars in the shell before launching claude — the plugin reads them at every SessionStart:
export FLIGHTDECK_SERVER="https://flightdeck.example.com"
export FLIGHTDECK_TOKEN="ftd_..."
claude
To use a local checkout instead of the marketplace: claude --plugin-dir /path/to/flightdeck/plugin.
Production agents
Install the sensor and point your agent at it:
pip install flightdeck-sensor
import flightdeck_sensor
flightdeck_sensor.init(
server="http://localhost:4000/ingest",
token="tok_dev",
)
flightdeck_sensor.patch()
# Your existing agent code. Nothing changes.
import anthropic
client = anthropic.Anthropic()
client.messages.create(model="claude-sonnet-4-6", ...)
The agent shows up in the fleet view within seconds.
To run the sensor from source instead of PyPI: pip install -e sensor/ from the repo root.
Playground
Working examples for every supported framework live in playground/. Each script costs cents per run and exercises the sensor against real LLM APIs.
make playground-anthropic # Anthropic direct
make playground-openai # OpenAI direct
make playground-langchain # LangChain + ChatAnthropic / ChatOpenAI
make playground-langgraph # LangGraph agent loops
make playground-llamaindex # LlamaIndex
make playground-crewai # CrewAI multi-agent
make playground-mcp # MCP tool calls
make playground-policies # token policy enforcement
make playground-all # everything (~$0.50/run)
Each script self-skips when its API keys aren't set, so make playground-all runs cleanly on any box and only exercises what you have credentials for. The flavor field on each session names the playground script that produced it, so you can find them on the dashboard. See playground/README.md for the full matrix.
Coverage
LLM SDKs
| Provider | Chat | Embeddings | Streaming | Errors |
|---|
| Anthropic | messages.create, messages.stream, beta.messages.* (sync + async) | route via litellm to Voyage | sync + async | 14-entry llm_error taxonomy |
| OpenAI | chat.completions.create, responses.create (sync + async) | embeddings.create (sync + async) | sync + async | same |
| litellm | litellm.completion, litellm.acompletion (chat path only) | litellm.embedding, litellm.aembedding | sync only | same |
Streaming events expose payload.streaming = {ttft_ms, chunk_count, inter_chunk_ms, final_outcome, abort_reason}. Mid-stream aborts emit llm_error{error_type="stream_error"} with partial-chunk and partial-token data.
Frameworks