Controls LLM API costs by routing models by task complexity, tracking budgets with immutable data structures, retrying only transient errors, and caching prompt prefixes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/everything-claude-code:cost-aware-llm-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
控制 LLM API 成本同时保持质量的模式。将模型路由、预算跟踪、重试逻辑和提示词缓存组合成可组合的管道。
控制 LLM API 成本同时保持质量的模式。将模型路由、预算跟踪、重试逻辑和提示词缓存组合成可组合的管道。
自动为简单任务选择更便宜的模型,将昂贵的模型保留给复杂任务。
MODEL_SONNET = "claude-sonnet-4-6"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # 字符
_SONNET_ITEM_THRESHOLD = 30 # 项目数
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""根据任务复杂度选择模型。"""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD:
return MODEL_SONNET # 复杂任务
return MODEL_HAIKU # 简单任务(便宜 3-4 倍)
使用冻结数据类跟踪累计支出。每次 API 调用返回一个新的跟踪器 — 永远不修改状态。
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""返回添加了记录的新跟踪器(永远不修改 self)。"""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
仅在瞬时错误上重试。在认证或错误请求错误上快速失败。
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""仅在瞬时错误上重试,其他错误快速失败。"""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # 指数退避
# AuthenticationError、BadRequestError 等 → 立即抛出
缓存长系统提示词以避免每次请求都重新发送。
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # 缓存此项
},
{
"type": "text",
"text": user_input, # 可变部分
},
],
}
]
在单个管道函数中组合所有四种技术:
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. 路由模型
model = select_model(len(text), estimated_items, config.force_model)
# 2. 检查预算
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. 带重试 + 缓存的调用
response = call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. 跟踪成本(不可变)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
| 模型 | 输入($/1M token) | 输出($/1M token) | 相对成本 |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |
npx claudepluginhub aaione/everything-claude-code-zhOptimizes LLM API costs for Claude/GPT calls via task-complexity model routing, immutable budget tracking, narrow transient-error retries, and prompt caching. For batch tasks with budget limits.
Cost optimization patterns for LLM API usage: model routing by task complexity, budget tracking, retry logic, and prompt caching.
Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.