From agentic-qe-fleet
Displays AQE inference cost analysis comparing local (ruvllm) and cloud (anthropic, openai, openrouter) providers with savings estimates. Supports --period, --provider, --format, --detailed, --reset options.
How this command is triggered — by the user, by Claude, or both
Slash command
/agentic-qe-fleet:aqe-costsThe summary Claude sees in its command listing — used to decide when to auto-load this command
--- name: aqe-costs description: Display inference cost analysis and savings from local vs cloud providers --- # AQE Inference Costs Display comprehensive inference cost analysis showing local vs cloud inference costs and estimated savings. ## Usage ## Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `--period` | string | `24h` | Time period: 1h, 24h, 7d, 30d, all | | `--provider` | string | - | Filter by provider: ruvllm, anthropic, openrouter, openai | | `--format` | string | `text` | Output format: text, json | | `--...
Display comprehensive inference cost analysis showing local vs cloud inference costs and estimated savings.
aqe costs [options]
# or
/aqe-costs [options]
| Option | Type | Default | Description |
|---|---|---|---|
--period | string | 24h | Time period: 1h, 24h, 7d, 30d, all |
--provider | string | - | Filter by provider: ruvllm, anthropic, openrouter, openai |
--format | string | text | Output format: text, json |
--detailed | boolean | false | Show detailed per-request breakdown |
--reset | boolean | false | Reset cost tracking data |
aqe costs
Displays cost summary for the last 24 hours with savings analysis.
aqe costs --period 7d
Shows cost trends and savings over the past 7 days.
aqe costs --provider ruvllm
Displays costs for local ruvllm inference only.
aqe costs --detailed
Shows per-request cost breakdown with agent and task attribution.
aqe costs --format json > costs.json
Exports cost data in JSON format for integration with monitoring dashboards.
aqe costs --reset
Clears all tracked cost data (useful for testing or new billing periods).
// Use Claude Code's Task tool for cost monitoring
Task("Monitor inference costs", `
Analyze AQE inference costs and provide recommendations:
- Check cost trends over the past 24 hours
- Identify high-cost agents or tasks
- Calculate savings from local inference
- Recommend optimizations to reduce cloud costs
Store findings in memory: aqe/costs/analysis/{timestamp}
`, "qe-quality-gate")
// Daily cost report generation
[Single Message]:
Task("Generate cost report", "Create daily inference cost summary", "qe-quality-gate")
Task("Analyze cost trends", "Identify cost optimization opportunities", "qe-quality-gate")
TodoWrite({ todos: [
{content: "Fetch cost data from tracker", status: "in_progress", activeForm: "Fetching data"},
{content: "Calculate savings metrics", status: "in_progress", activeForm: "Calculating savings"},
{content: "Generate recommendations", status: "pending", activeForm: "Generating recommendations"},
{content: "Store report in memory", status: "pending", activeForm: "Storing report"}
]})
Inference Cost Report
====================
Period: 2025-12-15T00:00:00Z to 2025-12-15T23:59:59Z
Overall Metrics:
Total Requests: 1,248
Total Tokens: 3,456,789
Total Cost: $5.2340
Requests/Hour: 52.0
Cost/Hour: $0.2181
Cost Savings Analysis:
Actual Cost: $5.2340
Cloud Baseline Cost: $18.7650
Total Savings: $13.5310 (72.1%)
Local Requests: 892 (71.5%)
Cloud Requests: 356 (28.5%)
By Provider:
🏠 ruvllm:
Requests: 892
Tokens: 2,234,567
Cost: $0.0000
Avg Cost/Request: $0.000000
Top Model: meta-llama/llama-3.1-8b-instruct
☁️ anthropic:
Requests: 245
Tokens: 891,234
Cost: $4.5678
Avg Cost/Request: $0.018644
Top Model: claude-sonnet-4-6
☁️ openrouter:
Requests: 111
Tokens: 330,988
Cost: $0.6662
Avg Cost/Request: $0.006002
Top Model: meta-llama/llama-3.1-70b-instruct
{
"timestamp": "2025-12-15T23:59:59Z",
"period": {
"start": "2025-12-15T00:00:00Z",
"end": "2025-12-15T23:59:59Z"
},
"overall": {
"totalRequests": 1248,
"totalTokens": 3456789,
"totalCost": 5.234,
"requestsPerHour": 52.0,
"costPerHour": 0.2181
},
"savings": {
"actualCost": 5.234,
"cloudBaselineCost": 18.765,
"totalSavings": 13.531,
"savingsPercentage": 72.1,
"localRequestPercentage": 71.5,
"cloudRequestPercentage": 28.5,
"localRequests": 892,
"cloudRequests": 356,
"totalRequests": 1248
},
"byProvider": {
"ruvllm": {
"provider": "ruvllm",
"providerType": "local",
"requestCount": 892,
"inputTokens": 1489711,
"outputTokens": 744856,
"totalTokens": 2234567,
"totalCost": 0,
"avgCostPerRequest": 0,
"topModel": "meta-llama/llama-3.1-8b-instruct",
"modelCounts": {
"meta-llama/llama-3.1-8b-instruct": 892
}
},
"anthropic": {
"provider": "anthropic",
"providerType": "cloud",
"requestCount": 245,
"inputTokens": 594156,
"outputTokens": 297078,
"totalTokens": 891234,
"totalCost": 4.5678,
"avgCostPerRequest": 0.018644,
"topModel": "claude-sonnet-4-6",
"modelCounts": {
"claude-sonnet-4-6": 187,
"claude-haiku-4-5-20251001": 58
}
},
"openrouter": {
"provider": "openrouter",
"providerType": "cloud",
"requestCount": 111,
"inputTokens": 220659,
"outputTokens": 110329,
"totalTokens": 330988,
"totalCost": 0.6662,
"avgCostPerRequest": 0.006002,
"topModel": "meta-llama/llama-3.1-70b-instruct",
"modelCounts": {
"meta-llama/llama-3.1-70b-instruct": 111
}
}
}
}
Inference Cost Report (Detailed)
================================
Period: 2025-12-15T00:00:00Z to 2025-12-15T23:59:59Z
Recent Requests (Last 20):
[2025-12-15T23:58:45Z] ruvllm/meta-llama/llama-3.1-8b-instruct
Agent: qe-test-generator
Tokens: 1,234 input / 567 output = 1,801 total
Cost: $0.0000
[2025-12-15T23:57:23Z] anthropic/claude-sonnet-4-6
Agent: qe-quality-gate
Task: quality-check-456
Tokens: 3,456 input / 1,789 output = 5,245 total
Cost: $0.0372
[2025-12-15T23:56:12Z] ruvllm/meta-llama/llama-3.1-8b-instruct
Agent: qe-test-executor
Task: test-run-789
Tokens: 876 input / 432 output = 1,308 total
Cost: $0.0000
... (17 more)
Provider Summary:
🏠 Local (ruvllm, onnx): 892 requests (71.5%)
☁️ Cloud (anthropic, openrouter, openai): 356 requests (28.5%)
Cost Optimization Recommendations:
✓ Excellent local inference usage (71.5%)
✓ Saving $13.53 per day vs full cloud inference
💡 Consider migrating more quality-gate checks to local inference
💡 Estimated monthly savings: $405.93
# Retrieve stored cost data
npx claude-flow@alpha memory retrieve --key "aqe/costs/tracker-data"
# Retrieve previous cost reports
npx claude-flow@alpha memory retrieve --key "aqe/costs/reports/latest"
# Store cost report
npx claude-flow@alpha memory store \
--key "aqe/costs/reports/${timestamp}" \
--value '{"totalCost": 5.234, "savings": 13.531}'
# Store cost optimization recommendations
npx claude-flow@alpha memory store \
--key "aqe/costs/recommendations" \
--value '[{"action": "migrate-to-local", "potentialSavings": 13.53}]'
import { getInferenceCostTracker } from 'agentic-qe/core/metrics';
const tracker = getInferenceCostTracker();
// Track local inference (free)
tracker.trackRequest({
provider: 'ruvllm',
model: 'meta-llama/llama-3.1-8b-instruct',
tokens: {
inputTokens: 1000,
outputTokens: 500,
totalTokens: 1500,
},
agentId: 'qe-test-generator',
taskId: 'task-123',
});
// Track cloud inference
tracker.trackRequest({
provider: 'anthropic',
model: 'claude-sonnet-4-6',
tokens: {
inputTokens: 2000,
outputTokens: 1000,
totalTokens: 3000,
},
agentId: 'qe-quality-gate',
});
import { getInferenceCostTracker, formatCostReport } from 'agentic-qe/core/metrics';
const tracker = getInferenceCostTracker();
// Get report for last 24 hours
const report = tracker.getCostReport();
// Format as text
const textReport = formatCostReport(report);
console.log(textReport);
// Get savings
console.log(`Total savings: $${report.savings.totalSavings.toFixed(2)}`);
console.log(`Savings rate: ${report.savings.savingsPercentage.toFixed(1)}%`);
Route routine tasks to local inference:
Potential Savings: Up to 90% cost reduction
Reserve cloud inference for:
Balance: Quality vs Cost
Implement fallback strategy:
// Try local first, fallback to cloud if needed
async function generateTests(spec) {
try {
return await localInference(spec);
} catch (err) {
return await cloudInference(spec);
}
}
Result: Optimal cost-quality balance
Regular cost reviews:
# Weekly review
aqe costs --period 7d --detailed
# Identify high-cost agents
# Migrate eligible workloads to local
Target: >70% local inference ratio
Anthropic Claude Sonnet 4.5 (January 2025):
OpenRouter (99% savings vs Claude):
OpenAI GPT-4 Turbo:
aqe costs
# Quick check of daily costs and savings
aqe costs --period 30d --format json > monthly-costs.json
# Export for finance review
aqe costs --detailed
# Identify high-cost agents and tasks for optimization
aqe costs --period 1h --format json
# Track costs per CI/CD pipeline run
⚠️ No inference requests tracked in the specified period.
Use 'aqe costs --period all' to see all-time data.
Solution: Inference tracking may need to be enabled.
❌ Error: Invalid period '5y'
Valid periods: 1h, 24h, 7d, 30d, all
Solution: Use a supported time period.
⚠️ Warning: No requests found for provider 'unknown'
Available providers: ruvllm, anthropic, openrouter, openai, onnx
Solution: Check provider name spelling.
/aqe-fleet-status - View agent status with cost attribution/aqe-execute - Track execution costs/aqe-generate - Track generation costs/aqe-report - Include cost analysis in quality reports/aqe-fleet-status - Fleet health and status/aqe-report - Quality reports/aqe-benchmark - Performance benchmarkingnpx claudepluginhub proffesor-for-testing/agentic-qe --plugin agentic-qe-fleet/costAggregates cost and capacity health data: LLM router savings, run-rate, cost-per-deploy, and ROI. Also supports per-feature and per-agent cost analysis.
/costsBreaks down session costs by LLM provider and workflow, reading usage logs from ~/.claude-octopus/. Outputs ASCII tables with token counts, query counts, and estimated costs.
/costTracks and manages AI build costs per feature, task, and project. Shows spending breakdowns, model usage, cost trends, and optimization recommendations.
/cost-captureCaptures AI tool cost snapshot: guides through Anthropic/OpenAI dashboards, records spend/tokens, compares to previous, checks budget, updates MODEL_ROUTING.md, writes observability file.
/cost-trackerDisplays current session cost estimate, top cost drivers, optimization suggestions, and budget guidance for task types.