From image-tools
Use this skill WHENEVER the user attaches, pastes, drops, or otherwise shares one or more images (screenshots, photos, scans of documents/licenses/certificates, UI captures, terminal/log captures, architecture or network diagrams, ER diagrams, charts) and wants them understood, described, transcribed, or used as the basis for any follow-up work. Trigger it even when the user does not say the word "image" — e.g. "вот скрин ошибки, помоги", "разбери эту схему", "что тут на картинке", "по этому документу заведи задачу", or simply drops a picture with a terse instruction. The skill reads every image meticulously so nothing is missed, builds a self-contained HTML report (images embedded as base64, with a detailed recognition write-up under each) for the user to verify, and then keeps the recognized content in context to solve whatever the user dictates next. Do NOT skip this skill just because the image "looks simple" — the whole point is to catch the small detail a glance would miss.
How this skill is triggered — by the user, by Claude, or both
Slash command
/image-tools:image-recognitionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The user's workflow: they hand you images, you read them with extreme care, you
The user's workflow: they hand you images, you read them with extreme care, you produce a verification report they can eyeball, they confirm (or correct) what you saw, and only then do you act on the content. This skill exists because decisions built on a misread image are silently wrong — a single transposed digit in an error code, a checkbox that was actually unchecked, an arrow pointing the other way in a diagram, or a license number off by one character can derail everything downstream. The verification report is the safety net: it lets the user catch a miss before work is built on top of it.
Follow these steps in order. Do not start solving the user's actual task until the recognition report exists and the user has had a chance to verify it.
Use the file paths the user provided or that the harness attached. If it's ambiguous which files are the images in question, ask. Handle one image or many.
ReadOpen each image with the Read tool — it renders the image visually at full resolution.
Read the whole image in one pass first. Claude's vision handles dense, high-resolution
screenshots, tables, and mockups well, so a single attentive pass usually captures even
small text. Read it the way you'd proofread a contract, not the way you'd skim a
thumbnail: see every element, understand the composition (how things are laid out
and how they relate), and miss nothing.
Don't reflexively slice the image. Cropping or zooming into a region costs real time and tokens, so reach for it only when a specific spot is genuinely too small or blurry to read with confidence after the full-resolution pass — a tiny toggle, a truncated hostname, the fine print inside a stamp. Crop that one spot, not the whole image. Meticulousness lives in your attention and systematic scanning (use the checklist below), not in mechanically chopping every screenshot into pieces — that just burns budget without seeing more.
Transcribe text and numbers exactly as shown — verbatim, including apparent mistakes.
Copy them character-for-character: keep typos (Postgress with a double s), unusual
casing, odd spacing, mixed Cyrillic/Latin (РОСС RU.0001.11АВ29), and truncation (keep
the …). Do not "correct", normalize, translate, or expand them. This matters because
the user is often verifying exactly those details — silently fixing a typo or tidying a
code hides what is really on screen, which defeats the entire purpose of the report.
Paraphrase is where the decisive detail dies: "an error about a license" is useless; the
literal ERROR 4012: license key ZX-7731-XX expired 2026-05-18 is what the user needs.
Use the recognition checklist below so you scan systematically instead of fixating on whatever caught your eye first.
For each image, write a thorough description in Russian (the language the user works
in). It should be detailed enough that someone who can't see the image could rebuild a
faithful mental picture from your words alone. Lead with the overall composition, then
enumerate the elements. Keep all transcribed text/numbers verbatim, set in code or
quotes so they stand out.
Hand the images and your write-ups to the bundled script — it embeds each image as
base64 (a self-contained file the user can open with no external dependencies) and
renders your descriptions underneath. See "Building the report" below for the exact
spec format and command. The report goes into <project-root>/docs/image-recognition/.
Do not auto-open the file. Reply with the clickable absolute path and a short note asking the user to check whether everything was recognized correctly and nothing small was missed. Then wait for their confirmation or corrections.
Once verified (and corrected, if needed), keep everything you recognized in working memory. When the user dictates a task, operate on that recognized data — don't make them re-explain what's in the images. If the user corrects a detail during verification, update your understanding accordingly before proceeding.
Scan for all of these. Not every category applies to every image; skip the ones that genuinely don't, but don't skip a category just because it's tedious.
Always:
Screenshots / UI captures: window or app title; menus and breadcrumbs; buttons and their state (enabled/disabled/active); every form field and its current value; checkbox and radio states (checked vs unchecked — state matters); table headers and cell contents; error/warning/info banners with their full text and any code; the element that's selected or focused.
Terminal / log captures: transcribe the visible output verbatim, preserving the sequence; note the command if shown; call out stack traces, error codes, file paths, timestamps, and exit statuses.
Documents / scans (licenses, certificates, invoices, contracts): transcribe all text and numbers verbatim — license/certificate numbers, registration IDs, dates, sums, counterparty and organization names, product names, validity periods. Note stamps, seals, signatures, logos, letterheads, and any handwriting. Describe table structure and contents. Flag anything illegible.
Diagrams / schemas / ER diagrams / network maps: enumerate every node/box with its exact label; every connection/edge/arrow with its direction and any label on it; groupings, clusters, swimlanes, zones; the legend; and the overall topology or flow in plain words ("requests enter at A, fan out to B and C, both write to D").
Charts / graphs: chart type; axis titles, units, and ranges; each series with its label and color; notable values, peaks, trends; the legend; any annotations.
Write a spec JSON, then run the script. The script reads each image, base64-embeds it, renders your markdown description beneath it, and writes a single self-contained HTML file.
Spec format (items are rendered in order):
{
"title": "Распознавание изображений — <короткий контекст>",
"items": [
{
"image": "/absolute/path/to/screenshot.png",
"title": "Картинка 1 — скриншот ошибки выпуска лицензии",
"description": "Markdown-описание. Поддерживаются заголовки (#, ##), списки (- ...),\n**жирный**, *курсив*, `код`. Дословный текст с картинки бери в `обратные кавычки`."
}
]
}
Command (run from anywhere; paths are absolute):
python3 "$SKILL_DIR/scripts/build_report.py" \
--spec /path/to/spec.json \
--output "<project-root>/docs/image-recognition/recognition-$(date +%Y%m%d-%H%M%S).html"
$SKILL_DIR is this skill's directory. The script prints the absolute path of the HTML
it wrote — relay that to the user. If --output is omitted, it defaults to
./docs/image-recognition/recognition-<timestamp>.html under the current working
directory.
A convenient place for the spec file is next to the report (e.g. write it to
<project-root>/docs/image-recognition/.spec-<timestamp>.json). It's a harmless build
artifact; no need to delete it.
Reports live in <project-root>/docs/image-recognition/. Create the folder if it
doesn't exist. Each batch of images produces a new timestamped HTML file — don't
overwrite earlier reports. These files embed images as base64 and can get large; if you'd
rather not track them in git, add docs/image-recognition/ to .gitignore — but
leave that version-control decision to the user.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub torchingloom/claude-tools --plugin image-tools