From claude-skills
Extracts transcripts from short-form video URLs and files using Supadata MCP (primary) or local whisper (fallback). Use when the user shares a video URL from Instagram Reels, Facebook videos/reels, Threads video posts, TikTok, Twitter/X, or uploads any video file (mp4, mov, webm). Trigger on "transcribe this reel", "extract transcript from reel", "what does this reel say", "transcribe this video", or any instagram.com/reel/*, facebook.com/watch/*, facebook.com/reel/*, threads.net/*/post/*, tiktok.com/*, x.com/*/status/* URL. Pairs with knowledge-capture to push transcripts to the GitHub repo.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-skills:reel-transcriptThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract transcripts from Instagram Reels, Facebook videos, Threads posts, TikToks, X/Twitter videos, and uploaded video files.
Extract transcripts from Instagram Reels, Facebook videos, Threads posts, TikToks, X/Twitter videos, and uploaded video files.
ffmpeg, faster-whisper, yt-dlp in the containerinstagram.com/reel/* or instagram.com/p/*)facebook.com/watch/*, facebook.com/reel/*, fb.watch/*)threads.net/@*/post/*)tiktok.com/@*/video/*)x.com/*/status/*, twitter.com/*/status/*)URL provided -> Go to Step 2 (Supadata path) File uploaded -> Go to Step 4 (Local whisper path)
Use the Supadata:supadata_metadata tool to get video context:
Tool: Supadata:supadata_metadata
Parameters:
url: "<video_url>"
This returns: platform, author (username, displayName), description, tags, duration, stats (views, likes, comments), thumbnail URL, and creation date.
Use the Supadata:supadata_transcript tool:
Tool: Supadata:supadata_transcript
Parameters:
url: "<video_url>"
By default (no text param), this returns timestamped segments. Set text: true only when you want flat text.
Parameters:
url (required): The video URLtext (optional): Set to true to get plain text string. Omit or set false to get timestamped segments.chunkSize (optional): Max characters per segment. Higher values merge small segments together. Cannot split below the platform's native caption granularity (~5s segments).lang (optional): Language code (e.g., "en", "vi") to request a specific language transcriptmode (optional): Set to "auto" to force AI transcription when platform captions are unavailableResponse formats:
Timestamped segments (default, text omitted or false):
{
"lang": "en",
"availableLangs": ["en"],
"content": [
{"text": "First sentence...", "offset": 142, "duration": 4681},
{"text": "Second sentence...", "offset": 4883, "duration": 4761}
]
}
offset: start time in milliseconds from video startduration: segment length in millisecondsPlain text (text: true):
{
"lang": "en",
"availableLangs": ["en"],
"content": "The full transcript text..."
}
Async (longer videos, returns a job ID):
{
"jobId": "abc123"
}
If you get a jobId, poll with Supadata:supadata_check_transcript_status:
Tool: Supadata:supadata_check_transcript_status
Parameters:
id: "<jobId>"
Poll every 3-5 seconds until status is "completed" or "failed".
Credit usage: 1 credit per transcript. AI-generated transcripts (when no captions exist) cost 2 credits per minute of video. The free tier has 100 credits/month.
If Supadata fails, fall through to Step 4 (local whisper). Common failure reasons:
For uploaded files or when Supadata fails:
If the input is a URL and Supadata failed, try yt-dlp:
yt-dlp --no-check-certificates -f "best[ext=mp4]" -o "/tmp/reel-%(id)s.%(ext)s" "URL" 2>&1
Note: yt-dlp works for TikTok, X/Twitter, Reddit. It will fail (403) for all Meta platforms (Instagram, Facebook, Threads).
If the input is an uploaded file, find it at /mnt/user-data/uploads/.
Run the transcription script:
python3 /home/claude/reel-transcribe.py <video_path> [model_size]
If the script doesn't exist, create it (see "Helper Script" section below).
If both Supadata and local download fail (Meta URLs without file upload): Tell the user: "I couldn't fetch the transcript remotely. Save the video from the app and upload it here, and I'll transcribe it locally."
Default: use timestamped segments (omit text param). Format with human-readable timestamps:
**Transcript** (@username on Instagram, 14.7s)
[0:00] First sentence of the reel...
[0:04] Second sentence continues here...
[0:09] And so on through the video...
---
Language: en | Words: ~52 | Duration: 14.7s
Tags: #letthem #melrobbins
Convert offset from milliseconds to M:SS format: Math.floor(offset/60000) for minutes, Math.floor((offset%60000)/1000) for seconds.
When to use text: true instead:
Include metadata when available (author, platform, duration, tags).
If the user says "capture this" or "save this", use the Github MCP Worker:push_note tool:
Tool: Github MCP Worker:push_note
Parameters:
title: "[Reel topic/summary]"
content: "[formatted transcript with metadata and source attribution]"
tags: ["reel", "platform-name", "topic-tag"]
source: "[Platform] video by @[username]"
If /home/claude/reel-transcribe.py doesn't exist, create it:
#!/usr/bin/env python3
"""
Reel/video transcription pipeline (local fallback).
Input: video file path (mp4, mov, etc.)
Output: JSON with transcript segments + full text
"""
import sys
import json
import subprocess
import tempfile
import os
def extract_audio(video_path, audio_path):
cmd = [
"ffmpeg", "-i", video_path,
"-vn", "-acodec", "pcm_s16le",
"-ar", "16000", "-ac", "1", "-y",
audio_path
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"ffmpeg failed: {result.stderr}")
def transcribe(audio_path, model_size="tiny"):
from faster_whisper import WhisperModel
model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe(audio_path, beam_size=5)
result_segments = []
full_text_parts = []
for segment in segments:
result_segments.append({
"start": round(segment.start, 2),
"end": round(segment.end, 2),
"text": segment.text.strip()
})
full_text_parts.append(segment.text.strip())
return {
"language": info.language,
"language_probability": round(info.language_probability, 3),
"duration_seconds": round(info.duration, 1),
"segment_count": len(result_segments),
"word_count": len(" ".join(full_text_parts).split()),
"segments": result_segments,
"full_text": " ".join(full_text_parts)
}
def main():
if len(sys.argv) < 2:
print(json.dumps({"error": "Usage: python3 reel-transcribe.py <video_path> [model_size]"}))
sys.exit(1)
video_path = sys.argv[1]
model_size = sys.argv[2] if len(sys.argv) > 2 else "tiny"
if not os.path.exists(video_path):
print(json.dumps({"error": f"File not found: {video_path}"}))
sys.exit(1)
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
audio_path = tmp.name
try:
extract_audio(video_path, audio_path)
result = transcribe(audio_path, model_size)
print(json.dumps(result, indent=2))
finally:
if os.path.exists(audio_path):
os.unlink(audio_path)
if __name__ == "__main__":
main()
Supadata also supports AI-powered structured extraction from videos. Use Supadata:supadata_extract to pull structured data beyond just the transcript:
Tool: Supadata:supadata_extract
Parameters:
url: "<video_url>"
prompt: "Extract the main advice, who is speaking, and any books or resources mentioned"
schema: {
"type": "object",
"properties": {
"speaker": { "type": "string" },
"main_advice": { "type": "string" },
"resources_mentioned": { "type": "array", "items": { "type": "string" } },
"tone": { "type": "string", "enum": ["motivational", "educational", "casual", "professional"] }
}
}
This is async. Use Supadata:supadata_check_extract_status to poll for results.
No speech detected: Supadata returns minimal or empty content. Tell the user the video appears to have no speech (music-only, ambient, etc.).
Non-English audio: Supadata auto-detects language. Report the detected language. Use lang parameter if the user wants a specific language.
Supadata rate limit: Free tier is 1 req/sec, paid is 10/sec. If rate-limited, wait briefly and retry, or fall back to local whisper.
Private or deleted videos: Supadata can only access public videos. Tell the user the video isn't accessible and suggest uploading the file directly.
Long videos (>5 min): Supadata returns a jobId for async processing. Poll supadata_check_transcript_status until complete. For local fallback on long videos, suggest base model over tiny for better accuracy.
| Layer | Platform | Status |
|---|---|---|
| Supadata (primary) | Instagram, Facebook, Threads, TikTok, X, YouTube | Works for public videos |
| yt-dlp (fallback download) | TikTok, X, Reddit | Works |
| yt-dlp (fallback download) | Instagram, Facebook, Threads | Blocked (403) |
| Local whisper (fallback transcription) | Any uploaded file | Works |
This skill pairs with:
Github MCP Worker:push_note)Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub dwarvesf/claude-skills