Expert guidance for instrumenting AI agents in production. Covers LLM call tracing, multi-agent coordination, tool execution, token/cost tracking, and evaluation quality for frameworks like LangChain, Claude Agent SDK, and custom agent loops.
Explores agent codebases to understand architecture, detect existing telemetry, and identify instrumentation opportunities
Reviews code changes for observability quality - anti-patterns, missing context, naming conventions
Trace agent decision-making, tool selection, and reasoning chains
Instrument error handling, retries, fallbacks, and failure patterns
Instrument evaluation metrics, quality scores, and feedback loops
Instrument safety checks, content filters, and guardrails for agent outputs
Instrument human approval workflows, feedback loops, and escalations
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Expert guidance for instrumenting AI agents in production. This Claude Code plugin provides best practices, hooks against anti-patterns, and ready-to-use templates for observing multi-agent systems and workflows.
/instrument for planning and /audit for assessment# From Claude Code
/plugin install agent-observability@caleb-davis-plugins
# Development mode
claude --plugin-dir /path/to/agent-observability
/audit
Scans your codebase for:
/instrument
Creates a tiered implementation plan:
/instrument langgraph --vendor=langfuse
/instrument langchain --vendor=langsmith
/instrument crewai
| Skill | Priority | Triggers |
|---|---|---|
instrumentation-planning | P1 | "what should I measure", "observability strategy" |
llm-call-tracing | P1 | "trace LLM calls", "token tracking" |
tool-call-tracking | P1 | "instrument tools", "tool execution spans" |
multi-agent-coordination | P1 | "multi-agent tracing", "agent handoffs" |
token-cost-tracking | P1 | "track tokens", "cost monitoring" |
prompt-versioning | P1 | "prompt A/B testing", "prompt versions", "compare prompts" |
guardrails-safety | P1 | "guardrails", "safety checks", "PII detection" |
decision-tracing | P1 | "agent decisions", "tool selection", "why did agent" |
production-eval-strategy | P1 | "production evaluation", "sampling strategy", "regression detection" |
memory-rag-instrumentation | P2 | "RAG tracing", "retrieval instrumentation" |
human-in-the-loop | P2 | "human approval tracking", "feedback loops" |
error-retry-tracking | P2 | "error tracking", "retry instrumentation" |
evaluation-quality | P2 | "agent evaluation", "quality metrics" |
session-conversation-tracking | P2 | "session tracking", "multi-turn tracing" |
The plugin's hooks warn you about:
| Framework | Detection | Guide |
|---|---|---|
| LangChain | from langchain | references/frameworks/langchain.md |
| LangGraph | from langgraph | references/frameworks/langgraph.md |
| Claude Agent SDK | from claude_agent_sdk | references/frameworks/claude-agent-sdk.md |
| OpenAI Agents | from openai | references/frameworks/openai-agents-sdk.md |
| CrewAI | from crewai | references/frameworks/crewai.md |
| AutoGen | from autogen | references/frameworks/autogen.md |
| Semantic Kernel | semantic_kernel | references/frameworks/semantic-kernel.md |
| Haystack | from haystack | references/frameworks/haystack.md |
| Pydantic AI | from pydantic_ai | references/frameworks/pydantic-ai.md |
npx claudepluginhub nexus-labs-automation/agent-observabilityExpert guidance and automation for mobile app observability: crash reporting, performance monitoring, session replay, and instrumentation for iOS, Android, React Native, and Flutter.
Expert guidance and automation for mobile app observability: crash reporting, performance monitoring, session replay, and instrumentation for iOS, Android, React Native, and Flutter.
Expert guidance and automation for backend observability: APM, distributed tracing, metrics, and logging for Go, Python, Node.js, Java/Kotlin, Rust, and .NET.
Expert guidance and automation for web application observability: Core Web Vitals, error tracking, performance monitoring, session replay, and instrumentation for React, Vue, Angular, Svelte, Next.js, Nuxt, Remix, SvelteKit, and Astro.
LLM observability tooling for agent development and Claude Code
Observability platform for Claude Code and Agent SDK — monitor, debug, and control AI coding agents
Send Claude Code session telemetry to Grafana Sigil
Real-time observability dashboard for Claude Code agents
Traces Claude Code conversations to LangSmith, including subagent and tool executions
Claude Code skill pack for Langfuse LLM observability (24 skills)