From nanoclaw-skills
Adds image vision to NanoClaw WhatsApp agents: downloads attachments, resizes with Sharp, base64-encodes, and sends to Claude as multimodal content blocks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nanoclaw-skills:add-image-visionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.
Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.
src/image.ts exists — skip to Phase 3 if already appliedsharp is installable (native bindings require build tools)Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This merges in:
src/image.ts (image download, resize via sharp, base64 encoding)src/image.test.ts (8 unit tests)src/channels/whatsapp.tssrc/index.ts and src/container-runner.tscontainer/agent-runner/src/index.tssharp npm dependency in package.jsonIf the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
npm install
npm run build
npx vitest run src/image.test.ts
All tests must pass and build must be clean before proceeding.
Rebuild the container (agent-runner changes need a rebuild):
./container/build.sh
Sync agent-runner source to group caches:
for dir in data/sessions/*/agent-runner-src/; do
cp container/agent-runner/src/*.ts "$dir"
done
Restart the service:
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
tail -50 groups/*/logs/container-*.log
npm ls sharp to verify.npx claudepluginhub nanocoai/nanoclaw-skills --plugin nanoclaw-skillsEnables ClaudeClaw agents to see and understand images sent via WhatsApp. Automatically downloads, resizes (sharp), and passes as base64 multimodal content to Claude.
Adds PDF text extraction to NanoClaw agents using pdftotext CLI from poppler-utils. Handles WhatsApp attachments, URLs, and local files.
Generates and edits images via Google's Gemini 3.1 Flash (gemini-3.1-flash-image-preview). Supports 512/1K/2K/4K resolutions. Use for thumbnails, cover art, or photo edits.