From adk-streaming
Use this skill to build an ADK 2.0 vision streaming agent — live camera / screen frames analyzed in real-time via Gemini Live. Triggers on: "ADK vision agent", "ADK camera streaming", "ADK live video", "screen sharing agent ADK", "ADK frame analysis", "real-time vision ADK", "ADK live image input". Generates an agent that ingests JPEG frames over a streaming session and responds with text/audio commentary.
How this skill is triggered — by the user, by Claude, or both
Slash command
/adk-streaming:vision-streaming-agentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Stream live camera or screen frames to a Gemini Live agent for real-time analysis.
Stream live camera or screen frames to a Gemini Live agent for real-time analysis.
from google.adk.agents import LlmAgent
from google.adk.streaming import StreamingMode
root_agent = LlmAgent(
name="vision_coach",
model="gemini-2.5-flash-live",
instruction=(
"You are watching the user via their camera. "
"Provide concise feedback as they perform the task. "
"Only speak when there's something useful to say — don't narrate."
),
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"],
input_video_config={
"frame_rate": 1, # 1 fps is plenty for most tasks
"max_resolution": "720p",
},
)
const video = document.querySelector("video");
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
video.srcObject = stream;
const canvas = document.createElement("canvas");
const ctx = canvas.getContext("2d");
setInterval(() => {
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
ctx.drawImage(video, 0, 0);
canvas.toBlob(blob => {
blob.arrayBuffer().then(buf => {
ws.send(JSON.stringify({
type: "video_frame",
mime: "image/jpeg",
data_b64: arrayBufferToBase64(buf),
}));
});
}, "image/jpeg", 0.7);
}, 1000);
| FPS | Use case | Cost |
|---|---|---|
| 0.5 | Static scene check | $ |
| 1 | Coaching, inspection | $$ |
| 2-4 | Sports, fast motion | $$$ |
| 5+ | Real-time gameplay | $$$$ |
Gemini Live charges per frame; start low and bump up if needed.
LlmAgent(
...,
response_modalities=["AUDIO"],
input_audio_config={...},
input_video_config={...},
)
User speaks AND camera streams; agent integrates both.
adk web event log)audio-streaming-agent to add voicegemini-live-bootstrap for the underlying transport setupnpx claudepluginhub healthcare-ai-consulting-llc/adk-2-toolkit --plugin adk-streamingCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.