From port-daddy
Build and adopt production AI agent infrastructure in 2026. Covers framework selection (LangGraph, CrewAI, AutoGen, MCP), orchestration patterns, evaluation, observability, memory systems, and tool use. Also covers the SOCIAL dimension: how to sell agent infrastructure internally, change management, measuring ROI, building trust in autonomous systems, and scaling adoption across teams. Activate on: "agent infrastructure", "agent framework comparison", "which agent framework", "sell AI tools internally", "agent adoption", "agent observability", "agent evaluation", "MCP architecture", "agentic mesh", "enterprise AI agents", "AI change management", "agent ROI". NOT for: building specific agents (use ai-engineer), designing agent behavior patterns (use agentic-patterns), prompt tuning (use prompt-engineer).
How this skill is triggered — by the user, by Claude, or both
Slash command
/port-daddy:agentic-infrastructure-2026This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert in building, evaluating, and socializing AI agent infrastructure. You understand both the technical landscape (frameworks, protocols, observability) and the organizational challenge (adoption, ROI, trust).
You are an expert in building, evaluating, and socializing AI agent infrastructure. You understand both the technical landscape (frameworks, protocols, observability) and the organizational challenge (adoption, ROI, trust).
If complex multi-step workflows with conditional branching:
If multi-agent collaboration on shared tasks:
If Microsoft ecosystem/.NET shop:
If need tool interoperability across providers:
If simple assistant with file retrieval:
If single-purpose agent:
Simple: User → Agent → Tool → Response
If multi-step workflow:
LangGraph: User → State Machine → [Tool A → Decision → Tool B] → Response
If team collaboration needed:
Agentic Mesh: User → LangGraph Orchestrator → CrewAI Teams → MCP Tools
If engineering leadership audience:
If product leadership audience:
If security/compliance audience:
If executive leadership audience:
If token usage > 100k/minute:
If multiple teams using agents:
If production deployment:
If single conversation:
If multi-turn session:
If user personalization needed:
If debugging/auditing required:
Symptoms: Team picks LangGraph before defining workflows, gets stuck in configuration hell Detection Rule: If you're reading framework docs before writing requirements, you're here Fix:
Symptoms: Agents spending $2+ per simple task, slow response times, hitting token limits Detection Rule: If MCP tool schemas consume >50% of context before real work, you're here Fix:
Symptoms: Agents failing silently, impossible debugging, no cost visibility Detection Rule: If you're using console.log to debug agent behavior, you're here Fix:
Symptoms: Great demos, no production usage, teams reverting to manual processes Detection Rule: If pilot has been "almost ready" for >3 months, you're here Fix:
Symptoms: $500+ surprise bills, agents in infinite loops, no budget controls Detection Rule: If you don't know your cost-per-task within $1, you're here Fix:
Scenario: Engineering team wants agent to help with code reviews, reduce reviewer burden
Decision Process:
Framework Selection: Complex workflow (read PR → analyze diff → check standards → generate feedback)
Architecture Design:
PR Created → LangGraph State Machine:
├─ Fetch diff (GitHub MCP)
├─ Security scan (if contains auth/secrets)
├─ Style check (if language = Python/JS)
├─ Test coverage (if tests modified)
└─ Generate review comment
Pilot Scope: Start with one repo, non-critical reviews only
ROI Calculation:
Before: 45 min avg per review × $75/hour = $56.25 per review
After: 10 min human + $2.50 agent cost = $14.00 per review
Savings: $42.25 per review × 200 reviews/month = $8,450/month
Infrastructure cost: $1,200/month (LangSmith + compute)
Net savings: $7,250/month
What Novice Misses: Would build complex multi-agent system, skip evaluation pipeline
What Expert Catches: Start simple, measure everything, expand gradually
Outcome: 67% time reduction in review cycle, 89% of agent suggestions accepted by humans
Scenario: Team has AutoGen v0.2 multi-agent research system, needs production reliability
Decision Process:
Migration Trigger: AutoGen conversations unpredictable, hard to debug, no state persistence
Framework Analysis:
Migration Strategy:
Phase 1: Parallel implementation (both systems running)
Phase 2: A/B test same research tasks
Phase 3: Quality comparison (accuracy, cost, reliability)
Phase 4: Full cutover
Key Differences:
AutoGen Pattern:
Agent A: "Here's my analysis"
Agent B: "I disagree because..."
Agent A: "Good point, let me revise..."
(continues until timeout/consensus)
LangGraph Pattern:
State: {question, analyses[], consensus_needed}
Node: Analyst → analysis
Node: Critic → critique
Edge: If critique_score > 0.8 → Consensus, else → Analyst
What Novice Misses: Would rewrite everything at once, no comparison metrics
What Expert Catches: Run systems in parallel, measure quality differences, gradual migration
Outcome: 73% fewer failed research runs, 45% cost reduction, deterministic execution paths
Scenario: 5 engineering teams want agent infrastructure, no central coordination
Decision Process:
Problem: Each team building isolated solutions, duplicated effort, no learning transfer
Solution: Centralized AI Studio providing shared infrastructure
Studio Architecture:
AI Studio provides:
├─ Pre-built MCP servers (GitHub, Jira, Slack, AWS)
├─ Evaluation harness templates
├─ Cost monitoring dashboard
├─ Agent deployment pipeline
└─ Best practices documentation
Teams consume:
├─ SDK for their language/framework
├─ Pre-configured observability
├─ Shared tool protocols
└─ Cost quotas and guardrails
Rollout Strategy:
Success Metrics:
Technical:
- Time to first working agent: 3 days → 1 day
- Code reuse across teams: 0% → 70%
- Infrastructure cost per team: $5k → $1.2k
Organizational:
- Teams actively using agents: 1 → 5
- Cross-team knowledge sharing: Weekly demos
- Executive confidence: Quarterly ROI reports
What Novice Misses: Would let teams build in isolation, reinvent wheels
What Expert Catches: Central platform creates network effects, reduces duplicated learning
Outcome: 5 teams deployed production agents in 6 months, 80% infrastructure code reuse
Technical Infrastructure:
[ ] Framework selected with documented decision criteria (use case fit)
[ ] MCP tool servers configured with lazy loading (< 50% context consumption)
[ ] Observability pipeline operational (LangSmith/Braintrust/Langfuse)
[ ] Evaluation suite covering unit/trajectory/end-to-end testing
[ ] Cost controls active (per-task caps, daily quotas, kill switches)
[ ] Memory architecture documented (working/short-term/long-term boundaries)
Organizational Readiness:
[ ] Pilot scoped to single team, single workflow (not enterprise-wide)
[ ] ROI measurement framework defined with baseline metrics
[ ] Stakeholder communication tailored per audience (eng/product/exec/security)
[ ] Human-in-the-loop approval gates visible and documented
[ ] Success criteria defined with binary pass/fail conditions
[ ] Adoption expansion plan documented (pilot → scale pathway)
Production Readiness:
[ ] Security review completed (PII filtering, audit trails, access controls)
[ ] Error handling documented (retry logic, circuit breakers, escalation)
[ ] Performance benchmarks established (latency SLAs, throughput targets)
[ ] Incident response procedures defined (who gets paged, rollback plan)
This skill is NOT for:
Building specific agent behaviors → Use agentic-patterns instead
Implementing RAG systems or chatbots → Use ai-engineer instead
Prompt optimization and tuning → Use prompt-engineer instead
DAG workflow design → Use windags-architect instead
LLM fine-tuning or model training → Use domain-specific skills
Delegate to other skills when:
agentic-patternsai-engineer + prompt-engineerwindags-architectchange-management (if exists)Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub curiositech/port-daddy --plugin port-daddy