From dora-skills
Generates text and understands images using Qwen2.5 and InternVL language models within dora dataflow pipelines.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dora-skills:hub-llmThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> LLMs and Vision-Language Models for text generation and image understanding
LLMs and Vision-Language Models for text generation and image understanding
| Node | Install | Description | Type |
|---|---|---|---|
| dora-qwen | pip install dora-qwen | Qwen2.5 text LLM | LLM |
| dora-qwen2-5-vl | pip install dora-qwen2-5-vl | Qwen2.5-VL multimodal | VLM |
| dora-internvl | pip install dora-internvl | InternVL multimodal | VLM |
Qwen2.5 large language model for text generation.
- id: llm
build: pip install dora-qwen
path: dora-qwen
inputs:
text: input/text
outputs:
- text
Input: StringArray with prompt text Output: StringArray with generated response
# Sending prompt
node.send_output("text", pa.array(["What is the capital of France?"]))
# Receiving response
response = event["value"][0].as_py()
Qwen2.5-VL vision-language model for image understanding.
- id: vlm
build: pip install dora-qwen2-5-vl
path: dora-qwen2-5-vl
inputs:
image:
source: camera/image
queue_size: 1 # Process latest image only
text: whisper/text # Question/prompt
outputs:
- text # Response
env:
DEFAULT_QUESTION: "Describe the image in a very short sentence."
image: UInt8Array with metadata
metadata = {"width": 640, "height": 480, "encoding": "bgr8"}
text: StringArray with question (optional)
# text: StringArray
response = event["value"][0].as_py()
metadata = {"primitive": "text"} # for dora-rerun
InternVL vision-language model.
- id: internvl
build: pip install dora-internvl
path: dora-internvl
inputs:
image: camera/image
text: input/text
outputs:
- text
nodes:
# Camera
- id: camera
build: pip install opencv-video-capture
path: opencv-video-capture
inputs:
tick: dora/timer/millis/100
outputs:
- image
env:
IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480
# Microphone
- id: microphone
build: pip install dora-microphone
path: dora-microphone
inputs:
tick: dora/timer/millis/100
outputs:
- audio
# Voice activity detection
- id: vad
build: pip install dora-vad
path: dora-vad
inputs:
audio: microphone/audio
outputs:
- audio
# Speech to text
- id: whisper
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
input: vad/audio
outputs:
- text
env:
TARGET_LANGUAGE: english
# Vision Language Model
- id: vlm
build: pip install dora-qwen2-5-vl
path: dora-qwen2-5-vl
inputs:
image:
source: camera/image
queue_size: 1
text: whisper/text
outputs:
- text
env:
DEFAULT_QUESTION: "What do you see in this image?"
# Text to speech
- id: tts
build: pip install dora-kokoro-tts
path: dora-kokoro-tts
inputs:
text: vlm/text
outputs:
- audio
# Speaker
- id: speaker
build: pip install dora-pyaudio
path: dora-pyaudio
inputs:
audio: tts/audio
# Visualization
- id: rerun
build: pip install dora-rerun
path: dora-rerun
inputs:
image: camera/image
vlm_response:
source: vlm/text
metadata:
primitive: "text"
nodes:
# Terminal input
- id: terminal
build: pip install terminal-input
path: terminal-input
outputs:
- text
# LLM
- id: llm
build: pip install dora-qwen
path: dora-qwen
inputs:
text: terminal/text
outputs:
- text
# Visualization
- id: rerun
build: pip install dora-rerun
path: dora-rerun
inputs:
user_input:
source: terminal/text
metadata:
primitive: "text"
llm_response:
source: llm/text
metadata:
primitive: "text"
nodes:
- id: camera
build: pip install opencv-video-capture
path: opencv-video-capture
inputs:
tick: dora/timer/millis/100
outputs:
- image
- id: yolo
build: pip install dora-yolo
path: dora-yolo
inputs:
image: camera/image
outputs:
- bbox
- id: vlm
build: pip install dora-qwen2-5-vl
path: dora-qwen2-5-vl
inputs:
image:
source: camera/image
queue_size: 1
outputs:
- text
env:
DEFAULT_QUESTION: "Describe what you see and any notable objects."
- id: rerun
build: pip install dora-rerun
path: dora-rerun
inputs:
image: camera/image
detections: yolo/bbox
description: vlm/text
import pyarrow as pa
# Send text prompt
text = "What is in this image?"
node.send_output("text", pa.array([text]), {"primitive": "text"})
if event["type"] == "INPUT":
text = event["value"][0].as_py()
print(f"Received: {text}")
For VLMs processing images, use queue_size: 1 to process only the latest frame:
inputs:
image:
source: camera/image
queue_size: 1 # Drop old frames, process latest only
This prevents processing backlogs when inference is slower than frame rate.
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub zhanghandong/dora-skills --plugin dora-skills