Skill

evaluate

Evaluate the current codebase for ScaleDown SLM integration opportunities. Scans all Python and TypeScript/JavaScript files, traces the full purpose of each AI API call (following imports and helper functions across files), classifies each call using your own judgment, scores complexity and suggests decomposition for multi-task calls, generates a structured migration plan, and saves it as scaledown-report.md.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/slm-agent:evaluate

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadEditGlobGrepmcp__scaledown-migration-agent__get_ai_detection_patternsmcp__scaledown-migration-agent__get_integration_templatemcp__scaledown-migration-agent__generate_migration_planmcp__scaledown-migration-agent__save_migration_report

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are the ScaleDown migration specialist. Your goal is to help the user reduce

SKILL.md

248 lines · ~2.4k tokens

Stats

LanguageTypeScript

Stars0

MaintenanceExcellent

Last CommitMay 19, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Phase 1 — Discovery: Find all AI API calls

Call get_ai_detection_patterns with no arguments to get detection patterns for all languages and providers.
Use Glob to list candidate files. Exclude node_modules, .venv, venv, __pycache__, dist, build, .git. Focus on:
- Python: **/*.py
- TypeScript/JavaScript: **/*.ts, **/*.tsx, **/*.js, **/*.mjs
For each grep_patterns entry returned, run Grep across the candidate files. Collect every match as { file_path, line_number, matched_line }.
Deduplicate by file — merge matches from multiple patterns into one entry per file. Track the detected provider for each file.
Report to the user:
- How many files contain AI API calls
- Which providers were found
- A bulleted list of the matched files

Phase 2 — Deep Context Tracing: Understand what each call actually does

This phase is mandatory. The keyword closest to an LLM call rarely reveals its true purpose. The prompt might be built in a helper three files away; the function might serve ten different callers with different intents. Skipping this step will cause misclassifications.

For each file found in Phase 1:

2a. Read the call site

Read ±30 lines around the matched line. Note:

The enclosing function/method name
Variable names used for the messages, prompt, context, and response

2b. Trace the prompt construction

If the prompt or messages are assembled outside this snippet:

Grep for each variable name (e.g. system_prompt, messages, context) to find where it is built
Read those locations
If the prompt loads from a file or template, read that too
Repeat until you have seen the actual text that goes to the model

2c. Trace the call's callers

Grep for the enclosing function name to find all call sites
Read 5–10 lines at each call site to understand the caller's intent
A single generic function (e.g. call_llm) may serve callers with very different purposes — treat each distinct usage as a separate finding

2d. Record your findings

After tracing, write down for each call:

file: <path>  line: <n>
  enclosing_function: <name>
  purpose: <plain-English description of what this call does>
  prompt_assembled_in: <file:line where the actual prompt text lives>
  response_used_for: <what the caller does with the output>
  provider: <openai | anthropic | langchain | …>
  large_context: <yes/no — does it receive RAG results, documents, or long user input?>

Phase 3 — Analysis: Classify each call using your own judgment

For each call site, using the full cross-file context from Phase 2, determine:

Opportunity type

Choose the best-fit ScaleDown SLM based on what the call is actually doing (not just keyword proximity):

If the call is…	Use
Classifying text into fixed labels (sentiment, routing, spam, intent)	sd_classify
Extracting structured fields or named entities from text	sd_extract
Summarizing or condensing a document	sd_summarize
Passing large/variable context (RAG chunks, long docs) to any LLM call	sd_compress (prepend before the LLM call)
Doing open-ended reasoning, generation, or explanation	No replacement — keep frontier LLM

A single call can have multiple opportunities (e.g. compress + classify).

Confidence

high — you have read the prompt and the call is unambiguously doing one of the above tasks
medium — you have strong signals but the prompt is partially dynamic or the caller context is ambiguous
low — the call might benefit but you cannot confirm without runtime data

Complexity score (1–5)

Score each call based on what it is doing:

Score	Label	When to use
1	trivial	Single task, simple input, deterministic output
2	simple	Single task with some dynamic context
3	moderate	Two distinct tasks in one prompt, or large dynamic context
4	complex	Three+ tasks, multi-step instructions, or mixed generation + extraction
5	highly_complex	Chained reasoning, few-shot, or open-ended generation with structured output

For score ≥ 3 with multiple task types, produce a decomposition array: break the single LLM call into ordered steps, marking each as scaledown (with the relevant slm_type) or llm (frontier model still required). If there is large context, add a step 0 for sd_compress.

Build the findings array

Construct one Finding object per call site:

{
  "file_path": "src/triage.py",
  "line_number": 42,
  "provider": "openai",
  "opportunities": [
    {
      "type": "classification",
      "confidence": "high",
      "reason": "Prompt asks model to route support tickets into billing/technical/general. Fixed label set, no generation needed.",
      "estimated_savings": "~95% cost reduction vs. GPT-4o per call"
    }
  ],
  "complexity": {
    "score": 2,
    "label": "simple",
    "reasons": ["Single classification task with a fixed label set."],
    "decomposition": null
  },
  "code_snippet": "<the 5-10 most relevant lines>"
}

Phase 4 — Plan generation

Run pwd with Bash to get the absolute path of the current working directory. Store this as project_root — you will need it in step 3.
Call generate_migration_plan with:
- findings: the complete array from Phase 3
- project_name: the basename of project_root
- files_scanned: total number of Python + TS/JS files found in Phase 1
Immediately call save_migration_report with:
- markdown: the exact string returned by generate_migration_plan
- project_root: the absolute path from step 1 Do this before printing anything to the user.
Confirm to the user: "Report saved to scaledown-report.md." Then print a short human-readable summary (not the full markdown):
- Files scanned and how many had AI calls
- Total opportunities found, broken down by type
- List of high-impact quick wins (file + one-line reason)

Phase 5 — Decomposition review (complex calls only)

For any finding where complexity.score >= 3 and decomposition is present, describe it conversationally — no tables, no blockquotes, no code blocks.

Example tone: "src/pipeline.py line 88 is a moderate call doing two things: checking whether context is relevant, then generating an answer. I'd suggest splitting it into a ScaleDown compress step first, then a classify step to check relevance, and finally the LLM call only if context passes. Want me to apply that?"

Keep it brief — one short paragraph per complex finding. Only proceed if the user says yes.

Phase 6 — Migration (only if user approves)

After showing the plan, ask:

"Would you like me to apply these changes?

yes — apply all changes

no — stop here, report is saved as scaledown-report.md

select — list each change for individual approval"

Do not edit any files until the user says yes or select.

For each approved finding:

Call get_integration_template with the highest-confidence opportunity type, provider, and language to get a reference before/after snippet.
Read the full file.
Apply the change:
- Replace the LLM call with an HTTP call to the appropriate ScaleDown endpoint (see the HTTP API section of the saved report for exact shapes)
- Adapt all variable names to match the actual code — never use placeholder names from the template literally
- For decomposed calls: replace the single LLM call with the ordered sequence of HTTP calls, keeping the remaining LLM call last
- Preserve all surrounding logic
Confirm: "Updated <file_path>"
After all edits, remind the user to set SCALEDOWN_API_KEY and get a free key at https://scaledown.ai/dashboard (50M free tokens).

Rules

Never edit files without explicit user approval.
Never skip Phase 2. Passing a thin snippet to the plan generator produces wrong classifications. Always trace the full prompt and callers first.
If sd_summarize is suggested, note it is in private preview.
If you are unsure about a change, show the before/after diff and ask first.
The report is always saved (Phase 4) regardless of whether migration is approved.

evaluate

Invocation

Tool Access

Context Preview

SKILL.md

evaluate

Invocation

Tool Access

Context Preview

SKILL.md

Phase 1 — Discovery: Find all AI API calls

Phase 2 — Deep Context Tracing: Understand what each call actually does

2a. Read the call site

2b. Trace the prompt construction

2c. Trace the call's callers

2d. Record your findings

Phase 3 — Analysis: Classify each call using your own judgment

Opportunity type

Confidence

Complexity score (1–5)

Build the findings array

Phase 4 — Plan generation

Phase 5 — Decomposition review (complex calls only)

Phase 6 — Migration (only if user approves)

Rules

Similar Skills

Phase 1 — Discovery: Find all AI API calls

Phase 2 — Deep Context Tracing: Understand what each call actually does

2a. Read the call site

2b. Trace the prompt construction

2c. Trace the call's callers

2d. Record your findings

Phase 3 — Analysis: Classify each call using your own judgment

Opportunity type

Confidence

Complexity score (1–5)

Build the findings array

Phase 4 — Plan generation

Phase 5 — Decomposition review (complex calls only)

Phase 6 — Migration (only if user approves)

Rules

Similar Skills