Skill

dspy-adapters-multimodal

Guides selection of DSPy adapters (Chat, JSON, XML, TwoStep) and handling of image, audio, and file inputs using typed primitives.

Python

ai-ml

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dspy-skills:dspy-adapters-multimodal

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteGlobGrep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Choose an adapter deliberately and model image, audio, and file inputs with DSPy's typed primitives.

SKILL.md

108 lines · ~898 tokens

Stats

LanguagePython

Stars78

Forks10

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

DSPy Adapters and Multimodal I/O

Goal

Choose an adapter deliberately and model image, audio, and file inputs with DSPy's typed primitives.

Adapter Selection

Adapter	Use it for
`dspy.ChatAdapter()`	Default, human-readable field markers, broad model compatibility
`dspy.JSONAdapter()`	Structured JSON output and native function calling where supported
`dspy.XMLAdapter()`	XML-tagged fields when XML is easier for the target LM to follow
`dspy.TwoStepAdapter()`	A separate extraction pass when parsing needs extra help

Configure globally or for a limited scope:

import dspy

dspy.configure(
    lm=dspy.LM("openai/gpt-4o-mini"),
    adapter=dspy.JSONAdapter(),
)

with dspy.context(adapter=dspy.XMLAdapter()):
    result = dspy.Predict("question -> answer")(question="What is DSPy?")

Native Function Calling

JSONAdapter enables native function calling by default. ChatAdapter keeps text parsing by default. Override either behavior explicitly:

chat_native = dspy.ChatAdapter(use_native_function_calling=True)
json_manual = dspy.JSONAdapter(use_native_function_calling=False)

DSPy falls back to manual parsing when the configured LM does not support native function calling.

Image Inputs

class DescribeImage(dspy.Signature):
    image: dspy.Image = dspy.InputField()
    description: str = dspy.OutputField()

describe = dspy.Predict(DescribeImage)
result = describe(image=dspy.Image("./diagram.png"))

Pass a local path, HTTP URL, bytes, PIL image, or existing data URI directly to dspy.Image(...).

Audio and File Inputs

class SummarizeAudio(dspy.Signature):
    audio: dspy.Audio = dspy.InputField()
    summary: str = dspy.OutputField()

audio = dspy.Audio.from_file("./meeting.wav")
summary = dspy.Predict(SummarizeAudio)(audio=audio)

class SummarizeFile(dspy.Signature):
    file: dspy.File = dspy.InputField()
    summary: str = dspy.OutputField()

document = dspy.File.from_path("./research.pdf")
summary = dspy.Predict(SummarizeFile)(file=document)

Provider capabilities vary. Verify that the selected model accepts the media type before deployment.

Best Practices

Start with ChatAdapter; switch only for a measured reason.
Use typed signatures for structured output.
Test adapter behavior against the exact production model.
Avoid deprecated Image.from_file() and Image.from_url() helpers; call dspy.Image(...).
Keep local file handling and uploaded file IDs within provider policy.

Related Skills

Design signatures: dspy-signature-designer
Build tool agents: dspy-react-agent-builder

Official Documentation

Adapters guide: https://dspy.ai/learn/programming/adapters/
Tools guide: https://dspy.ai/learn/programming/tools/
XMLAdapter API: https://dspy.ai/api/adapters/XMLAdapter/
Image API: https://dspy.ai/api/primitives/Image/
Audio API: https://dspy.ai/api/primitives/Audio/

dspy-adapters-multimodal

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

dspy-adapters-multimodal

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

DSPy Adapters and Multimodal I/O

Goal

Adapter Selection

Native Function Calling

Image Inputs

Audio and File Inputs

Best Practices

Related Skills

Official Documentation

Similar Skills

DSPy Adapters and Multimodal I/O

Goal

Adapter Selection

Native Function Calling

Image Inputs

Audio and File Inputs

Best Practices

Related Skills

Official Documentation

Similar Skills