From gpt-image-2
Use when generating, editing, composing, or iterating on images — illustrations for reports/web, posters with Chinese or English typography, pitch-deck slides, UI mockups, infographics, pixel art, game sprites, character reference sheets, app icons, logo concepts, photoreal product shots, or photo edits with precise local changes. Symptoms include the user asking for "图", "图片", "插图", "海报", "封面", "图标", "ppt素材", "游戏素材", "改图", "修图", "logo", "draw me", "make an image", "生成一张", or attaching an image they want modified. Calls the gpt-image-2 model (April 2026 release; near-perfect text rendering including Chinese, custom resolutions up to 3840px, precise edits with preserve/change pattern) via OpenAI-compatible /images/generations and /images/edits endpoints.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gpt-image-2:gpt-image-2The summary Claude sees in its skill listing — used to decide when to auto-load this skill
GPT Image 2 — released April 2026 — is the first generation that handles long instructional prompts cleanly, renders text correctly (Chinese / Japanese / Korean too), supports custom resolutions (max side < 3840px, ratio ≤ 3:1), and does precise local edits via the `change ONLY X / keep Y exactly` pattern.
GPT Image 2 — released April 2026 — is the first generation that handles long instructional prompts cleanly, renders text correctly (Chinese / Japanese / Korean too), supports custom resolutions (max side < 3840px, ratio ≤ 3:1), and does precise local edits via the change ONLY X / keep Y exactly pattern.
This skill teaches you the prompt grammar, the workflow, and gives you a zero-dep CLI that handles auth, parallel batching, retries, and file IO.
Always go through scripts/gpt_image.py. Don't curl the API directly — the script handles auth, retries, b64-vs-URL responses, multipart for edits, parallel single-image calls when batching, and writes files to disk for you. The user can't see images that only exist in the API response.
Two environment variables:
export OPENAI_IMAGE_API_KEY="sk-..." # required
export OPENAI_IMAGE_BASE_URL="https://jmrai.net/v1" # default; override only if user has a different host
If OPENAI_IMAGE_API_KEY is missing, the script exits with a clear message — ask the user for it, then suggest they add the export to ~/.zshrc.
Use the absolute path so it works from any working directory:
SKILL_DIR="$HOME/.claude/skills/gpt-image-2" # adjust if installed elsewhere
GPT_IMG="python3 $SKILL_DIR/scripts/gpt_image.py"
Claude Code's Bash tool runs non-interactive shells that do NOT auto-source ~/.zshrc. If the user has the credentials in their shell rc and you get ERROR: set OPENAI_IMAGE_API_KEY, prefix every call with a source:
source ~/.zshrc 2>/dev/null && python3 $SKILL_DIR/scripts/gpt_image.py generate -p "..." -o ./out.png
Or, if the user provided the key in-conversation, pass it inline (don't write it to disk yourself):
OPENAI_IMAGE_API_KEY="sk-..." OPENAI_IMAGE_BASE_URL="https://jmrai.net/v1" \
python3 $SKILL_DIR/scripts/gpt_image.py generate -p "..." -o ./out.png
Test once at the start of a session: source ~/.zshrc && echo "${OPENAI_IMAGE_API_KEY:0:10}". If that prints a key prefix, you can safely use the source-prefix pattern for the rest of the session.
$GPT_IMG generate \
-p "<prompt>" \
--size 1536x864 \
-o ./hero.png
Defaults: --quality high (cost is identical across tiers on this host), --size 1024x1024.
$GPT_IMG edit \
-i ./input.png \
-p "Edit the input image: change ONLY <X>. Preserve exactly: <Y>. Do not: <Z>." \
-o ./edited.png
White pixels in the mask = region to regenerate; transparent = keep.
$GPT_IMG edit -i ./photo.png --mask ./mask.png \
-p "Fill the masked area with continuation of the cobblestone street, matching perspective and lighting." \
-o ./inpainted.png
-n N fires N parallel single-image requests. Faster wall-clock than serial, and works regardless of host n>1 support.
$GPT_IMG generate -p "..." -n 4 --concurrency 4 -o ./out
# writes out-1.png … out-4.png
-i more than once.--background transparent returns HTTP 400). For sprites / icons / cutouts, use a chroma-key color in the prompt (solid magenta #FF00FF background) and remove it client-side after with rembg or ImageMagick (see references/post-process.md).Run python3 $SKILL_DIR/scripts/gpt_image.py --help (or <subcommand> --help) for every flag.
| Use case | Size | Aspect |
|---|---|---|
| PPT slide / web hero / YouTube thumbnail | 1536x864 (true 16:9) or 1792x1024 | 16:9 / 7:4 |
| Square: app icon, social, logo, character portrait | 1024x1024 | 1:1 |
| High-res square / print poster | 2048x2048 | 1:1 |
| Mobile poster, story, vertical infographic, book cover | 1024x1536 | 2:3 |
| Tall mobile-first hero | 1024x1792 | 9:16ish |
| Wide cinematic banner | 1792x1024 | 16:9ish |
Wrong aspect ratio = wasted generation. Decide BEFORE writing the prompt.
references/prompting.md./tmp.After every generation, before reporting success to the user, Read the PNG yourself and verify against the prompt:
If anything fails, iterate. Change ONE dimension (per references/prompting.md §6) and regenerate. Don't ship the user a bad result and ask them to verify — your vision is faster than their patience.
If you fired multiple variants (-n 4), pick the best one yourself before showing the user. Don't dump four files on them and ask them to choose unless they explicitly want options.
Read these on demand — don't preload them.
| User wants | Read this |
|---|---|
| Prompt-writing technique, style vocabulary, intent-first framework, anti-patterns to avoid | references/prompting.md |
| Concrete templates: pitch decks, UI mockups, posters, character sheets, infographics, logos, photoreal, edits | references/use-cases.md |
| Compress, resize, convert, combine, add text post-hoc | references/post-process.md |
| Full parameter list, error codes, custom-resolution constraints | references/api.md |
The full guide is in references/prompting.md; the irreducible minimum:
黑体, 楷书, 思源黑体).--quality high (cost is the same on this host). Use lower tiers only for fast throwaway iteration.4K, 8K, masterpiece, trending on artstation, ultra detailed? If yes — delete them.| Mistake | Fix |
|---|---|
| "Make me an image of a cat" | Open with intent, add style, composition, lighting. See prompting.md. |
Generating in /tmp | Save next to user's project so they can find it. |
| Not Reading the result yourself | Visual self-verify EVERY time before showing the user. |
Showing all -n 4 variants without picking | Pick the best one yourself unless user asked for options. |
| Asking "what style?" when the user said "a poster" | Make a strong first attempt, then iterate. |
| Curl-ing the API directly | Use the script — handles retries, b64, multipart, parallel. |
| Forgetting Chinese text in quotes | Quote it exactly or model writes nonsense. |
| Stacking magic words | "Ultra detailed 8K masterpiece" makes outputs WORSE on GPT Image 2. |
| Re-describing the whole image in an edit | Describe ONLY the change + what to preserve. |
| Trying to compose two input images in one /edits call | Not supported on this host — chain two edit calls. |
These rules guide quality, not creativity. When the user says "surprise me" or leaves room:
-n 4 --concurrency 4) — same wall-clock as oneA result is ready to show the user when:
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub muselinn/gpt-image-2