From sc-skills
Generates and edits images via Gemini API Python SDK. For text-to-image, editing, style transfers, logos, stickers, mockups, multi-turn refinement, and image composition.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sc-skills:sc-gemini-imagegenThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate and edit images using Google's Gemini API. The SDK reads `GOOGLE_API_KEY` by default (`GEMINI_API_KEY` as fallback). Or pass a key explicitly to `genai.Client(api_key=...)`.
Generate and edit images using Google's Gemini API. The SDK reads GOOGLE_API_KEY by default (GEMINI_API_KEY as fallback). Or pass a key explicitly to genai.Client(api_key=...).
| Model | Codename | Best For |
|---|---|---|
gemini-2.5-flash-image | Nano Banana | Most use cases, fast, good quality (default) |
gemini-3-pro-image-preview | Nano Banana Pro | High-res (2K/4K), Google Search grounding, precise text |
gemini-3.1-flash-image-preview | Nano Banana 2 | High volume, extended aspect ratios, 512 size |
Start with gemini-2.5-flash-image. Upgrade to Pro for high-res output or search grounding.
gemini-2.5-flash-imageAll models: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
3.1 Flash only: 1:4, 4:1, 1:8, 8:1
All models: 1K (default), 2K, 4K
3.1 Flash only: 512
from google import genai
from google.genai import types
client = genai.Client() # Reads GOOGLE_API_KEY (or GEMINI_API_KEY fallback)
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents="Your prompt here",
)
for part in response.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = part.as_image()
image.save("output.jpg") # save() takes path only, writes raw bytes
Note: response_modalities is optional. Omit it to let the model decide. Set ['IMAGE'] for image-only output, or ['TEXT', 'IMAGE'] for interleaved text and images.
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=prompt,
config=types.GenerateContentConfig(
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="2K",
),
),
)
Chat mode is recommended for editing. The SDK handles thought signatures automatically across turns.
from PIL import Image
client = genai.Client()
image = Image.open("input.png")
chat = client.chats.create(model="gemini-2.5-flash-image")
# First edit
response = chat.send_message(["Add a sunset to this scene", image])
for i, part in enumerate(response.candidates[0].content.parts):
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = part.as_image()
image.save(f"edited_{i}.jpg")
# Continue refining
response = chat.send_message("Make the colors warmer")
PIL Image objects, base64 bytes, and file URIs (via client.files.upload()) all work as image inputs.
Generate images informed by real-time data. Requires Pro model.
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents="Visualize today's weather in Tokyo as an infographic",
config=types.GenerateContentConfig(
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="1K",
),
tools=[types.Tool(google_search=types.GoogleSearch())],
),
)
Image search grounding (searching for reference images) is only available on gemini-3.1-flash-image-preview.
Combine elements from multiple sources. Pass PIL Image objects directly in the contents list.
from PIL import Image
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=[
"Create a group photo of these people in an office",
Image.open("person1.png"),
Image.open("person2.png"),
Image.open("person3.png"),
],
)
Limits differ by model:
Include camera details: lens type, lighting, angle, mood.
"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
Specify style explicitly:
"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
Be explicit about font style and placement:
"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
Describe lighting setup and surface:
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
The API returns JPEG in practice. image.save(path) writes raw bytes from the API response. It takes only a path string (no format kwarg).
# Save as-is (JPEG bytes from the API)
image.save("output.jpg")
To convert formats, use PIL on the raw bytes:
from PIL import Image
import io
for part in response.parts:
if part.inline_data is not None:
pil_img = Image.open(io.BytesIO(part.inline_data.data))
pil_img.save("output.png") # PIL handles the conversion
save(path) writes raw bytes; no format kwarg exists. Use PIL for format conversionresponse_modalities is optional; omit to let the model decide output formatimage_config (only modality config)person_generation parameter exists on ImageConfig for controlling person depiction in outputsnpx claudepluginhub kylesnowschwartz/simpleclaude --plugin sc-skillsGenerates and edits images using Google's Gemini API via Python. Supports text-to-image, image editing, style transfers, logos, stickers, mockups, custom resolutions, aspect ratios, multi-turn refinement.
Generates and edits images via the Gemini API with configurable resolution (1K-4K) and aspect ratios. Useful for text-to-image, image editing, style transfers, logos, stickers, or product mockups.
Generate and edit images using Google's Gemini API via Python scripts. Supports text-to-image, image editing, multi-turn refinement, custom resolutions, and aspect ratios.