Skill

azure-gpt-image

Generate or edit images with Azure OpenAI gpt-image-2 (also works with gpt-image-1/1.5/1-mini). Use this skill whenever the user wants to create, render, illustrate, mock up, or edit an image, or asks for icons, posters, hero shots, product shots, avatars, concept art, banners, thumbnails, or any visual asset — and whenever they mention gpt-image, gpt-image-2, DALL·E, Azure OpenAI image, or "generate/make/produce an image" on Azure. Also use when the user wants to inpaint, mask out, or modify part of an existing image, or composite multiple images. Always invoke this skill before reaching for any other image-generation approach in an Azure environment.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/azure-gpt-image:azure-gpt-image

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use the bundled CLI at `scripts/azure-image.ts` to generate or edit images via Azure OpenAI. It uses Node's built-in `fetch`/`FormData` — no `npm install` needed. The shebang invokes `npx -y tsx`, so the first run downloads `tsx` (cached afterward).

Supporting Files

evals/evals.jsonreferences/api.mdscripts/azure-image.ts

SKILL.md

127 lines · ~2k tokens

Stats

LanguageTypeScript

Stars0

MaintenanceGood

Last CommitMay 15, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Azure gpt-image-2 (generations + edits)

Use the bundled CLI at scripts/azure-image.ts to generate or edit images via Azure OpenAI. It uses Node's built-in fetch/FormData — no npm install needed. The shebang invokes npx -y tsx, so the first run downloads tsx (cached afterward).

Setup

The CLI reads credentials from environment variables:

Variable	Required	Default	Notes
`AZURE_OPENAI_ENDPOINT`	yes	—	e.g. `https://my-resource.openai.azure.com`
`AZURE_OPENAI_API_KEY`	yes	—	resource key from Azure portal. Sent as both `Authorization: Bearer <key>` (Foundry endpoints) and `api-key: <key>` (classic Azure OpenAI) so both endpoint families work.
`AZURE_OPENAI_DEPLOYMENT`	no	`gpt-image-2`	your deployment name
`AZURE_OPENAI_API_VERSION`	no	`preview`	the new v1 endpoint accepts `preview`

If a variable is missing, the script exits non-zero with a clear message — surface that error to the user rather than guessing. Don't try to read keys from .env files or other locations; the user is expected to have them in their shell environment.

Rate limits. Azure resources have per-minute caps (commonly 5–10 requests/minute on lower tiers). Don't fan out parallel image calls — sequence them with a few seconds between each. If you get HTTP 429, wait 30–60s and retry.

When the user asks for an image, do this

Read the prompt carefully. If anything important is ambiguous (subject, style, aspect ratio, where to save), ask one short clarifying question — but only if it really matters. Reasonable defaults are: 1024x1024, quality=high, n=1, output saved as ./image.png in the current working directory.

Run the script via Bash. Always pipe stderr through so the user sees timing/usage:

~/.claude/skills/azure-gpt-image/scripts/azure-image.ts generate \
  --prompt "<carefully crafted prompt>" \
  --size 1024x1024 --quality high --out ./out.png

The stdout of the script is the path(s) of the written file(s) — quote them back to the user so they can find the result.

Crafting prompts

gpt-image-2 rewards specific, visually concrete prompts. Describe:

Subject — what is in the image
Composition / framing — close-up, wide shot, overhead, eye level, rule-of-thirds
Lighting — soft daylight, golden hour, studio softbox, neon
Style / medium — photorealistic, watercolor, isometric 3D, line art, etc.
Color / mood — pastel palette, high contrast, moody, vibrant
Negative cues sparingly — instead of "no text", describe what you do want

If the user gives a brief prompt, you may expand it before sending. If they give a detailed prompt, pass it through unchanged.

Sizes

For gpt-image-2 the common useful sizes:

1024x1024 — fastest square
1536x1024 — landscape
1024x1536 — portrait
2048x2048 — large square (slower, more tokens)

Arbitrary sizes are allowed if all of:

both edges are multiples of 16,
total pixels are 655,360–8,294,400,
aspect ratio ≤ 3:1,
longest edge ≤ 3840 px.

If the user asks for a non-conforming size, round to the nearest legal size and tell them what you used.

Quality

low → fastest draft (good for exploration), medium → balanced, high → final assets, auto → model decides. Default to high unless the user is clearly iterating quickly. Token costs grow steeply with quality — for batch generation (--n 4+), consider medium first.

Editing and inpainting

When the user gives you an existing image and asks to change it, use edit:

~/.claude/skills/azure-gpt-image/scripts/azure-image.ts edit \
  --image ./photo.png --prompt "replace the sky with a sunset" \
  --out ./photo-sunset.png

For precise localized edits, supply a PNG mask: same dimensions as the image, alpha channel where transparent pixels (alpha=0) mark the editable region. Opaque pixels are kept unchanged. The mask may be made with Photoshop, GIMP, or programmatically (e.g. a small sharp/pillow script).

--input-fidelity high preserves features (faces, logos, fine details) of the original more strongly. Use it for face edits and brand work; skip it for generic background changes.

To composite multiple sources into one image, repeat --image:

azure-image.ts edit --image hero.png --image product.png \
  --prompt "place the product on the desk in the hero shot" --out composite.png

Input images must be PNG or JPG, < 50 MB each.

Batches

--n 2 through --n 10 generates multiple variations in one call. With --n > 1, pass --out-dir ./out/ instead of --out; the script writes image_0.png, image_1.png, ... If the user asks for "a few options" or "show me variations", default to --n 4.

Streaming (and the 408 gateway timeout)

Azure's apim gateway closes idle connections after ~60–90 seconds. Long-running renders (--quality high, especially edit --input-fidelity high) regularly exceed that and the request fails with HTTP 408 even though the model is still working. The documented fix is to stream — SSE events keep the connection warm, and the same final image arrives via the completed event.

The CLI handles this automatically: on any HTTP 408, it retries once with --stream enabled. You don't need to opt in. If you see warn: HTTP 408 from gateway — retrying with --stream, that's the recovery — don't downgrade quality.

You can still pass --stream up front for slow renders you expect to be long (typically quality=high edits). Add --save-partials to persist the intermediate frames (<stem>.partial_0.png, etc.) if you want to show progress; otherwise the CLI only writes the final image.

Output format

--output-format png (default) for graphics, transparency-capable, lossless
--output-format jpeg with --output-compression 80 for photos / web use

WebP is not supported on Azure (only on OpenAI direct).

Error handling

The script exits non-zero with the API's error body on failure. Common ones:

contentFilter — prompt or output was flagged. Reword more neutrally and try again. Do not bypass safety filters or coach the user around them.
DeploymentNotFound — AZURE_OPENAI_DEPLOYMENT doesn't match a deployment in this resource. Ask the user to confirm the name.
401/403 — bad/expired API key or wrong resource.
408 — apim gateway timeout on a long render. The CLI auto-retries once with --stream and that almost always succeeds. If it 408s a second time, the render itself is hanging; reduce --quality or --n, or try again later.
429 — rate limited (commonly 10 RPM on this resource). Wait 30–60s and retry. Don't issue parallel calls.

When an error fires, show the user the raw message (don't paraphrase loosely) and propose a concrete next step.

Reference

See references/api.md for the full Azure REST contract — endpoint shapes, parameter table, response schema. Read it if the user asks about the API directly, wants to call it without this CLI, or needs a feature the CLI doesn't expose.

azure-gpt-image

Invocation

Context Preview

Supporting Files

SKILL.md

azure-gpt-image

Invocation

Context Preview

Supporting Files

SKILL.md

Azure gpt-image-2 (generations + edits)

Setup

When the user asks for an image, do this

Crafting prompts

Sizes

Quality

Editing and inpainting

Batches

Streaming (and the 408 gateway timeout)

Output format

Error handling

Reference

Similar Skills

Azure gpt-image-2 (generations + edits)

Setup

When the user asks for an image, do this

Crafting prompts

Sizes

Quality

Editing and inpainting

Batches

Streaming (and the 408 gateway timeout)

Output format

Error handling

Reference

Similar Skills