From builder-ai
Use before launching any LLM feature or when monthly API costs are growing unexpectedly. Requires token count measurement, call volume analysis, and cost projection at 10× scale. Blocks "it's cheap enough now" completions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/builder-ai:ai-cost-auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```
EVERY LLM FEATURE HAS A COST TRAJECTORY. DISCOVER IT BEFORE 10× SCALE DISCOVERS YOU.
"It's cheap enough now" is a claim about current volume, not future volume.
"The API has reasonable pricing" is not a projection.
Token counts + call volume + cost at 10× scale IS a cost audit.
Trigger:
Do not estimate. Count:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base") # cl100k for GPT/Claude
def count_tokens(text: str) -> int:
return len(enc.encode(text))
# Measure each segment separately
print("System prompt:", count_tokens(system_prompt))
print("Avg context:", count_tokens(avg_context_sample))
print("Avg user message:", count_tokens(avg_user_message_sample))
print("Avg output:", count_tokens(avg_output_sample))
Get real samples from logs or representative test data — not the "hello world" example.
Calls per user session: N
Sessions per day: M
Background/batch calls per day: K
Retry rate: R% (from logs or estimate)
Total calls per day: (N × M) + K × (1 + R/100)
COST_PER_1K_INPUT = 0.003 # $/1k tokens — replace with actual model pricing
COST_PER_1K_OUTPUT = 0.015
def cost_per_call(input_tokens, output_tokens):
return (input_tokens / 1000 * COST_PER_1K_INPUT
+ output_tokens / 1000 * COST_PER_1K_OUTPUT)
daily_cost = cost_per_call(avg_input, avg_output) * calls_per_day
monthly_cost = daily_cost * 30
Scale is never gradual — launches cause spikes. Always project at 10×:
| Scale | Calls/day | Monthly cost | Verdict |
|---|---|---|---|
| Current (1×) | N | $X | Baseline |
| 10× | 10N | $10X | Must be under budget |
| 100× | 100N | $100X | Directional awareness |
If 10× monthly cost exceeds your budget threshold, optimise before scaling.
Rank by token contribution:
1. Context injection (RAG chunks): 3,200 tokens avg — 68% of input cost
2. System prompt: 800 tokens — 17% of input cost
3. User message: 200 tokens — 4% of input cost
4. Output: 400 tokens — 11% of total cost
Optimise the top driver first. The others compound on the same multiplier.
| Lever | Typical Saving | Apply When |
|---|---|---|
| Prompt caching | 50–90% on cached input | System prompt > 500 tokens and stable |
| Reduce top_k in RAG | 20–50% input token reduction | Context injection is top cost driver |
| Downgrade model tier | 3–10× cost reduction | Quality meets threshold at lower tier (see model-benchmarking) |
| Semantic caching | 30–70% call reduction | High repeat query rate |
| Pipeline splitting | 5–10× cost reduction | Frontier model doing pre-processing tasks |
| Output cap | 10–30% output cost | max_tokens not set or set too high |
Store in cost-audit/<feature>/<date>.md:
## AI Cost Audit — <feature> — <date>
Token breakdown (averages from N samples):
System prompt: X tokens
Context: Y tokens
User message: Z tokens
Output: W tokens
Total per call: T tokens
Call volume: N calls/day (M sessions × P calls + K background)
Retry rate: R%
Cost at current volume: $X/month
Cost at 10× volume: $10X/month
Top driver: <context injection — Y tokens, Z%>
Reductions applied:
1. <lever> — estimated saving: X%
2. <lever> — estimated saving: Y%
Post-optimization projection: $X'/month at 10×
These thoughts mean no cost audit was done — stop:
When ai-cost-audit is satisfied, state it like this:
Cost audit complete.
Avg input: N tokens (system: A, context: B, user: C)
Avg output: M tokens
Calls/day: D (retry rate: R%)
Current cost: $X/month
Cost at 10× scale: $Y/month (budget threshold: $Z/month ✓/⚠️)
Top cost driver: <driver> (N% of input cost)
Reductions planned: <levers> — projected saving: X%
Post-reduction projection at 10×: $Y'/month
Token samples from: <N representative calls from logs / test data — not estimated>
Stored: cost-audit/<feature>/<date>.md ✓
LLM API costs are invisible until they aren't. A feature that costs $200/month at launch costs $20,000/month after a growth event. The audit takes two hours. The surprise does not.
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub rbraga01/a-team --plugin builder-ai