Optimizes LLM API costs for Claude/GPT calls via task-complexity model routing, immutable budget tracking, narrow transient-error retries, and prompt caching. For batch tasks with budget limits.
How this skill is triggered — by the user, by Claude, or both
Slash command
/everything-claude-code:cost-aware-llm-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
在保持质量的同时控制 LLM API 成本的模式。将模型路由 (Model Routing)、预算跟踪 (Budget Tracking)、重试逻辑 (Retry Logic) 和提示词缓存 (Prompt Caching) 组合成一个可复用的流水线。
在保持质量的同时控制 LLM API 成本的模式。将模型路由 (Model Routing)、预算跟踪 (Budget Tracking)、重试逻辑 (Retry Logic) 和提示词缓存 (Prompt Caching) 组合成一个可复用的流水线。
为简单任务自动选择更便宜的模型,将昂贵的模型留给复杂任务。
MODEL_SONNET = "claude-sonnet-4-6"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # 字符数阈值
_SONNET_ITEM_THRESHOLD = 30 # 项目数阈值
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""根据任务复杂度选择模型。"""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD:
return MODEL_SONNET # 复杂任务
return MODEL_HAIKU # 简单任务 (便宜 3-4 倍)
使用冻结的数据类 (Frozen Dataclasses) 跟踪累计支出。每次 API 调用都会返回一个新的跟踪器——绝不修改原始状态。
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""返回添加了新记录的新跟踪器 (绝不修改自身状态)。"""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
仅在瞬时错误 (Transient Errors) 时重试。对身份验证或错误请求执行快速失败 (Fail Fast)。
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""仅在瞬时错误时重试,其他错误立即报错。"""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # 指数退避 (Exponential backoff)
# AuthenticationError, BadRequestError 等 -> 立即抛出异常
缓存较长的系统提示词 (System Prompts),避免在每次请求时重复发送。
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # 缓存此内容
},
{
"type": "text",
"text": user_input, # 变量部分
},
],
}
]
在单个流水线函数中组合所有四项技术:
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. 路由模型
model = select_model(len(text), estimated_items, config.force_model)
# 2. 检查预算
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. 带重试与缓存的调用
response = call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. 跟踪成本 (不可变模式)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
| 模型 | 输入 ($/1M tokens) | 输出 ($/1M tokens) | 相对成本 |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |
npx claudepluginhub xu-xiang/everything-claude-code-zhControls LLM API costs by routing models by task complexity, tracking budgets with immutable data structures, retrying only transient errors, and caching prompt prefixes.
Cost optimization patterns for LLM API usage: model routing by task complexity, budget tracking, retry logic, and prompt caching.
Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.