Skill

ai-studio-prototype-review

From skillry-optional-specialist

Use when you need to review AI studio prototypes, prompt tooling, agent playgrounds, and rapid product experiments.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/skillry-optional-specialist:80-ai-studio-prototype-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Review AI studio prototypes — built in Google AI Studio, OpenAI Playground, Anthropic Console, Claude.ai Projects, Vertex AI Studio, Amazon Bedrock console, or similar environments — for reproducibility gaps, API key exposure, rate limit risks, prompt completeness, data handling issues, and the delta between prototype behavior and what a production implementation would actually require. Prevent...

SKILL.md

155 lines · ~3.8k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

AI Studio Prototype Review

Purpose

When to use

A prototype built in an AI studio is being evaluated before a decision to productionize it.
Prototype demo results cannot be consistently reproduced and you need to identify why.
API credentials used in the prototype need review for scope and rotation risk before the prototype is shared externally.
A stakeholder wants a concrete estimate of what it would take to move from prototype to production.
Rate limits or token costs are becoming a concern during prototype testing at scale.
The prototype is being used in a presentation or shared with non-engineers who may inadvertently trigger rate limits or expose credentials.

When not to use

The system is already in production — use prompt-systems-review or ai-security-review for production reviews.
The prototype is a local Python script or Jupyter notebook, not built in an AI studio environment.
The question is about model evaluation methodology rather than prototype configuration (use llm-evaluation-review).
The prototype is purely exploratory research with no path to productionization and no real data involved.

Procedure

Document the prototype configuration. Record everything needed to reproduce the prototype exactly:

Platform: which AI studio (Google AI Studio / OpenAI Playground / Anthropic Console / other)
Model: exact name and version (gpt-4o-2024-08-06, claude-3-5-sonnet-20241022 — not just "GPT-4o" or "Claude")
System prompt: full text (or indicate if it is the platform default)
Temperature, top-p, max output tokens
Tool or function definitions enabled, if any
Any example conversations saved and used to influence responses This configuration is the baseline for the review. If it cannot be reproduced from these parameters, the prototype is not ready for any downstream decision.

Audit API key scope and exposure. Identify which API key is in use. Verify:

The key is scoped to minimum required permissions (read-only generation if the prototype only calls completion endpoints)
The key is not visible in any shared URL parameters, browser history, presentation screenshots, or recorded demo videos
The key has a usage limit set: a monthly token budget or request count cap that would prevent unexpected overage charges
The key has an expiry date or rotation schedule
If the key is in a shared notebook or tool used by multiple people, it must be rotated after the review because it is effectively compromised Flag any key without a usage limit: an infinite-limit key used in a prototype that goes viral can generate thousands of dollars in charges within hours.

Test reproducibility. Select 5-10 representative prompts that were used in the demo. Run each at least 3 times with the same configuration. Record:

Is the output schema consistent across runs (same fields, same structure)?
Are key facts and numbers consistent (or does the model produce different values on different runs)?
Is response length within a predictable range?
At temperature > 0.5, expect variability — document which aspects vary and confirm the variation is acceptable for the use case A prototype that relied on a specific lucky output from a high-temperature run is not a reproducible prototype and is not ready for productionization evaluation.

Identify model-specific behaviors. Note any behavior in the prototype that depends on the specific model version. Confirm:

Is this exact model version available via the production API, or only in the studio UI?
Is this model in general availability, or preview/experimental (preview models are removed without notice)?
What is the model's deprecation policy and timeline? AI studio UIs frequently expose preview and experimental models before they are available on stable production endpoints. A prototype built on gemini-2.0-flash-thinking-exp will break when that model is retired. If the prototype uses a preview model, the productionization plan must account for model substitution and include a re-validation step.

Review the system prompt for production readiness. Apply these checks to the system prompt:

Is it versioned? (A prompt in an AI studio UI with no version history cannot be rolled back)
Does it specify an output schema or format?
Are there instruction conflicts? (Test with edge-case inputs that could trigger conflicting instructions)
Does it rely on the AI studio's default context additions — like today's date, user name, or session metadata — that will not be present when the API is called directly?
Does it have a linked eval dataset that validates it produces the correct output format? AI studio platforms often add implicit context that is not present when the same prompt is called via the raw API. What works in the studio may produce different output when called from code.

Assess rate limit exposure. Calculate the expected token consumption for the intended test or demo load: total_tokens_per_run = (average_prompt_tokens + average_output_tokens) × number_of_concurrent_users × requests_per_user_per_minute

Compare against the API key's rate limit tiers:

Requests per minute (RPM)
Tokens per minute (TPM)
Tokens per day (TPD) If a stakeholder demo involves multiple people using the prototype simultaneously, calculate the token consumption for that scenario. Document what the user experience will be when the rate limit is hit (error message, blank response, retry behavior).

Identify productionization gaps. For each capability demonstrated in the prototype, document what is needed in a production implementation that the prototype does not have:

Capability	Prototype approach	Production requirement	Effort
Error handling	None — errors show as blank	Retry with backoff, user-friendly error message	Medium
Input validation	Accepts any input	Schema validation, length limits, content filtering	Medium
Output parsing	Human reads the output	Structured JSON parser with schema validation	Small
Latency	Single query, acceptable wait	Streaming responses or async queuing for concurrent users	Large
Cost	Pay per query, no tracking	Budget caps, per-user quotas, cost dashboards	Medium
Authentication	Shared API key	Per-user authentication, credential isolation	Large

Check for data handling issues. Confirm whether any real user data, PII, confidential business information, or regulated data was input into the AI studio during prototyping. AI studios retain conversation history for varying periods and may use it for model improvement depending on the platform's data agreement:

Google AI Studio: data used for Google model improvement by default (opt-out available)
OpenAI Playground: data may be used for model improvement without an enterprise data agreement
Anthropic Console: check current data handling terms for the active tier
Vertex AI Studio / Amazon Bedrock: enterprise agreements generally provide stronger data protection If real sensitive data was used, document what data, when, which platform, and what the platform's retention and use policy is for that tier.

Produce a go/no-go recommendation. Based on the review findings:

PROTOTYPE READY: configuration is reproducible, model version is production-available, no data handling violations, system prompt is production-ready, productionization gaps are documented with effort estimates
PROTOTYPE CONDITIONAL: specific issues must be resolved (listed) before any productionization investment
PROTOTYPE INVALID: fundamental design or data problem — rebuild before investing in productionization

Checklist

Common issues & anti-patterns

Demo model not available on production API. The prototype uses gemini-2.0-flash-thinking-exp or gpt-4-vision-preview. The team plans to ship using that model. Experimental and preview models are removed without notice — sometimes within weeks. Always confirm that the exact model version is available on the production API endpoint before making any productionization decision. If it is not available, the prototype must be re-validated on a stable model.

High-temperature cherry-picking. Temperature is set to 1.2 during prototyping to get impressive, creative outputs. The demo shows one outstanding response. In testing, 70% of outputs at temperature 1.2 are inconsistent, off-format, or hallucinatory. The prototype "worked" because the team ran it 30 times and showed the best result. Document temperature in the configuration record and run multiple outputs in the review.

API key in the demo URL. The prototype uses a browser-based integration tool that accepts an API key as a URL parameter for convenience. The demo URL https://tool.example.com?api_key=sk-proj-abcdef is shared in the meeting chat, in the recording, and in a screenshot in the meeting notes. The key is now compromised and must be rotated immediately. Never use URL parameters for API keys.

No output schema means no parser. The prototype works because a human reads the output and extracts the relevant information by eye. The team plans to productionize. The prompt produces slightly different JSON structures — sometimes with a wrapper object, sometimes without — depending on the phrasing of the input. Writing a parser that handles all variations is a significant engineering effort. Productionization requires both a defined schema and a prompt that reliably produces it — this must be tested as part of the prototype, not deferred to the productionization phase.

Real patient or user data in the studio. The team builds a medical notes summarization prototype in Google AI Studio using de-identified but still sensitive clinical notes "just to see if it works." Google AI Studio's free tier data policy allows using conversation data for model improvement. The prototype conversation containing clinical notes is now subject to that policy. Review data handling policy before any prototype work with any real-world data, including data that has been de-identified.

Rate limit math not done before a multi-user demo. Fifteen stakeholders are invited to test the prototype simultaneously. The API key has a 60 RPM rate limit. Each person makes 3 requests in the first 2 minutes. That is 45 requests in the first 2 minutes — within the limit. But on minute 3, everyone tries at once: 45 requests in 60 seconds, which exceeds 60 RPM. The demo fails for 30% of users during the most important demonstration of the project. Calculate rate limit exposure before any multi-user demo.

Platform-injected context not accounted for. The Anthropic Console and Claude.ai Projects inject the current date, user name, and in some tiers, project-specific context into the model's context automatically. The prototype relies on the model knowing today's date (injected by the platform) to perform date calculations. When the same prompt is called via the raw API without that injection, the date calculations fail. Test every prototype by calling the same prompt via the raw API (not the studio UI) to identify dependencies on platform-injected context.

Required output

Produce an AI studio prototype review report with:

Prototype configuration — platform, exact model version, all parameters, system prompt status (full / summarized / default)
API key assessment — scope, exposure risk (clean/compromised), usage limit status, rotation date
Reproducibility results — prompts tested, runs per prompt, schema consistency verdict, key fact consistency verdict
Model version status — preview/experimental/GA, production API availability, deprecation timeline
System prompt readiness — versioned (yes/no), output schema (yes/no), platform-context dependencies found (yes/no), key issues
Rate limit assessment — current limits, calculated demand for demo and production loads, risk level (safe/at risk/will fail)
Productionization gap table — capability, prototype approach, production requirement, effort estimate (S/M/L/XL)
Data handling findings — whether real/sensitive data was used, platform tier, data policy summary, risk level
Go/no-go recommendation — READY / CONDITIONAL (conditions listed with acceptance criteria) / INVALID (reason and rebuild guidance)

Safety

Do not enter real PII, regulated data, or confidential business information into any AI studio environment during the review process itself.
If you discover the prototype was built using real user data, treat this as a data handling incident and escalate before completing the review.
Rotate any API key confirmed to be exposed in shared URLs, screenshots, presentation materials, or recorded demos — document the exposure scope before rotation.
Do not recommend productionizing a prototype that has an unresolved data handling violation, even under time pressure from stakeholders.
Do not use the prototype's AI studio session to test adversarial inputs or injection scenarios — use a separate isolated session with a restricted API key.

ai-studio-prototype-review

Invocation

Context Preview

SKILL.md

ai-studio-prototype-review

Invocation

Context Preview

SKILL.md

AI Studio Prototype Review

Purpose

When to use

When not to use

Procedure

Checklist

Common issues & anti-patterns

Required output

Safety

Similar Skills

AI Studio Prototype Review

Purpose

When to use

When not to use

Procedure

Checklist

Common issues & anti-patterns

Required output

Safety

Similar Skills