From gemini-media
This skill should be used when the user asks to "generate an image", "create an image", "make a picture", "draw something", "edit an image", "modify this image", "change the background", "text to image", "generate with Gemini", "create a visual", "refine the image", "continue editing", "make it more", "add something to this image", or needs AI image generation, image editing, or multi-turn image refinement using the Gemini API.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gemini-media:generate-imageThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Wrap the Gemini image generation REST API to produce, edit, and iteratively refine images via a Python script (stdlib only, no pip dependencies). Support text-to-image generation, image editing with reference images, multi-turn conversational editing, Google Search grounding, and automatic thinking mode. All output is saved to `./generated-images/` and auto-opened on macOS.
Wrap the Gemini image generation REST API to produce, edit, and iteratively refine images via a Python script (stdlib only, no pip dependencies). Support text-to-image generation, image editing with reference images, multi-turn conversational editing, Google Search grounding, and automatic thinking mode. All output is saved to ./generated-images/ and auto-opened on macOS.
Before any generation, verify the environment:
$GEMINI_API_KEY is set. If missing, instruct the user:
export GEMINI_API_KEY='your-key-here'python3 is available (Python 3.7+). The script uses only stdlib modules — no pip install needed.When the user requests a new image (no existing session or explicitly new subject):
Restate the user's request as a clear generation prompt. If the request is vague, ask for clarification before proceeding.
Use API defaults unless the user explicitly requests specific settings. Only pass --aspect-ratio and --resolution flags when the user asks for them. When omitted, the API applies its own per-model defaults (typically 1:1 aspect ratio and 1K resolution).
If the user asks about available options:
ALLOW_ALL, ALLOW_ADULT (default), ALLOW_NONE--seed and/or low --temperature for reproducible resultsMap model choices:
gemini-3.1-flash-image-previewgemini-3-pro-image-previewgemini-2.5-flash-imageAssess prompt complexity using the rubric in references/advanced-features.md. Count signal categories (multiple subjects, spatial words, text rendering, photo-realism, named styles, technical rendering, complex composition). Map the score:
--thinking-level none--thinking-level minimal--thinking-level highNote: Thinking is only supported on Flash 3.1 and Pro 3. For Flash 2.5, always use --thinking-level none.
CRITICAL — Thinking vs. Resolution incompatibility: When thinkingConfig is present in the API request, the Gemini API silently ignores the imageSize parameter, producing images at a lower default resolution (~1376x768 for 16:9). If the user requests a specific resolution (2K, 4K, or any explicit size like "Full HD"), you MUST use --thinking-level none to ensure the resolution is respected. The generate_image.py script enforces this automatically — if both --resolution and a non-none --thinking-level are provided, it forces thinking to none and logs a warning.
Enable --grounding when the prompt references real-world information: current events, real people, specific brands, named locations, or factual content. Otherwise omit.
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" generate \
--prompt "the final prompt" \
--model "gemini-3.1-flash-image-preview" \
--thinking-level none \
--output-dir "./generated-images"
Add --aspect-ratio "RATIO" and/or --resolution "RES" only if the user explicitly requested them. Add --grounding if grounding was decided.
The script outputs JSON to stdout. Parse it and report:
When the user provides file paths to existing images for editing or as reference:
test -f).--input-image for each file (up to 10 for Flash 3.1, 6 for Pro 3):python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" generate \
--prompt "editing instruction" \
--input-image "/path/to/image1.jpg" \
--input-image "/path/to/image2.png" \
--model "gemini-3.1-flash-image-preview" \
--output-dir "./generated-images"
The script handles base64 encoding internally.
For iterative refinement of a previously generated image:
Before each generation, check for an active session:
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" session status
The output is JSON with exists, turn_count, last_prompt, and updated_at.
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" session reset
updated_at), ask: "Continue editing the previous image or start a new one?"On the first generation that should start a session:
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" session \
create --model "gemini-3.1-flash-image-preview"
Add --aspect-ratio and --resolution only if the user explicitly requested them.
Then invoke generate_image.py generate with the --session-file:
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" generate \
--prompt "initial prompt" \
--session-file "~/.cache/claude-generate-image/.session.json" \
--output-dir "./generated-images"
Do NOT re-ask settings — inherit from the session. Invoke:
python3 "${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate_image.py" generate \
--prompt "refinement instruction" \
--session-file "~/.cache/claude-generate-image/.session.json" \
--output-dir "./generated-images"
New input images can be added with --input-image on any turn.
| Model | Status | Best For | Limitations |
|---|---|---|---|
gemini-3.1-flash-image-preview (default) | Preview | Fast iteration, all features, Image Search grounding | Preview only |
gemini-3-pro-image-preview | Preview | Final deliverables, complex scenes, highest quality | No Image Search, no extreme ratios, no 512px |
gemini-2.5-flash-image | GA | Stable, production-ready generation | No thinking, no extreme ratios, no 512px |
Always use Flash 3.1 by default. Only switch models when the user explicitly requests a specific model (e.g., "use Pro 3", "use Flash 2.5"). Do not infer model choice from prompt complexity or quality keywords. Note: switching models mid-session requires a session reset.
Important: responseModalities must always be ["TEXT", "IMAGE"]. Image-only output ["IMAGE"] is not supported.
| Exit Code | Meaning | Action |
|---|---|---|
| 0 | Success | Report image path and text |
| 10 | Missing $GEMINI_API_KEY or dependency | Tell user what to set/install |
| 11 | Invalid input (bad path, unsupported format, >14 images) | Report the specific validation error |
| 20 | HTTP 400 — content policy or bad request | Show API error message, suggest rephrasing |
| 21 | HTTP 401/403 — auth failure | "API key is invalid or expired" |
| 22 | HTTP 429 — rate limited | Wait 10 seconds, retry once automatically. If still failing, tell user to wait. |
| 23 | HTTP 500+ — server error | Retry once automatically. If still failing, report. |
| 30 | No image in response | "Model didn't return an image — try rephrasing the prompt" |
On exit codes 22 and 23, retry the same command once before reporting failure.
generate_image.py generateCore API caller. Flags:
--prompt (required) — generation or editing prompt--model — model ID (default: gemini-3.1-flash-image-preview)--aspect-ratio — aspect ratio (optional; API default when omitted)--resolution — image size: 512px, 1K, 2K, 4K (optional; API default when omitted)--thinking-level — none, minimal, high (default: none). Not supported on Flash 2.5.--grounding — enable Google Search + Image Search grounding--person-generation — ALLOW_ALL, ALLOW_ADULT, or ALLOW_NONE--output-mime-type — image/png (default) or image/jpeg--compression-quality N — JPEG quality (1-100)--seed N — seed for deterministic generation--temperature F — creativity control (0.0-2.0)--input-image PATH — input image file (repeatable, max 14)--session-file PATH — session file for multi-turn--output-dir DIR — output directory (default: ./generated-images)generate_image.py sessionSession lifecycle. Subcommands: create, append, read, reset, status, set-last-output. See references/advanced-features.md for session schema.
references/api-reference.md — Full Gemini REST API schema: endpoint, request/response format, all aspect ratios and resolutions, error codes, MIME types.references/advanced-features.md — Thinking auto-detection rubric, thought signature handling, session schema, grounding attribution, model-specific behaviors, edge cases.npx claudepluginhub christian-schlichtherle/cs7-claude-plugins --plugin gemini-mediaGenerates AI images from text prompts, edits images, and composes from multiple references using Gemini models. Supports t2i, i2i, product mockups, and stickers.
Generates images from text, edits images with references, performs product placement, style transfer, and multi-image composition using OpenAI DALL-E or Google Gemini.
Generates and edits images via OpenRouter using FLUX and Gemini models. Use for photos, illustrations, artwork, and visual assets, not technical diagrams.