From productivity
Extract plaintext WebVTT transcripts and detect slide transitions from Microsoft Stream (on SharePoint) video recordings. Use when the user wants to download, extract, or retrieve a transcript/captions from a Microsoft Stream video, Teams meeting recording, or SharePoint-hosted video. Also supports detecting slide changes and capturing screenshots. Triggers: "get the transcript", "download transcript", "extract captions", "stream transcript", "meeting recording transcript", "detect slides", "capture slides", "slide transitions". Requires the Playwright MCP server for browser authentication.
How this skill is triggered — by the user, by Claude, or both
Slash command
/productivity:stream-transcriptThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract plaintext WebVTT transcripts and detect visual slide transitions from Microsoft
Extract plaintext WebVTT transcripts and detect visual slide transitions from Microsoft Stream videos hosted on SharePoint.
The CDN endpoint encrypts transcripts, but the REST API returns them in plaintext. Slide detection uses canvas-based frame comparison within the Playwright browser context.
.vtt file exists on disk at the expected output pathWEBVTT headerhead and wc -c on the saved fileheadless: false), not chrome extension tools — the user needs to see the browser window to complete SharePoint SSO authenticationGet the Stream video URL from the user. It will look like:
https://{tenant}-my.sharepoint.com/personal/{user}/_layouts/15/stream.aspx?id={filePath}
If the user provides a driveId and itemId directly, skip to Step 3.
Use Playwright to navigate to the Stream URL with headless: false so the user can authenticate.
playwright_navigate({ url: streamUrl, headless: false, timeout: 90000 })
Wait for the page to load, then inject scripts/resolve-drive-item.js and resolve the IDs:
// After injecting the script via playwright_evaluate:
await window.ResolveDriveItem.run()
// Returns JSON: { driveId, itemId, filePath, method }
The script tries four resolution methods in order of reliability:
drives/xxx/items/xxx patterns in page JSGetFileByServerRelativeUrl for partial metadata_spPageContextInfo hydration dataCall the transcript listing endpoint from the authenticated browser:
(async () => {
const r = await fetch(
`/_api/v2.1/drives/${driveId}/items/${itemId}/media/transcripts`,
{ credentials: 'include', headers: { 'Accept': 'application/json' } }
);
const data = await r.json();
return JSON.stringify(data.value.map(t => ({
id: t.id,
displayName: t.displayName,
languageTag: t.languageTag,
size: t.size,
temporaryDownloadUrl: t.temporaryDownloadUrl
})));
})();
Primary approach: Fetch via REST API (most reliable — returns plaintext VTT):
(async () => {
const r = await fetch(
`/_api/v2.1/drives/${driveId}/items/${itemId}/media/transcripts/${transcriptId}/content`,
{ credentials: 'include', headers: { 'Accept': '*/*' } }
);
// Returns plaintext WebVTT (text/vtt content type)
const vtt = await r.text();
window.__transcript = vtt;
return 'Transcript fetched: ' + vtt.length + ' bytes';
})();
Why not temporaryDownloadUrl? The
temporaryDownloadUrlfrom Step 3 usually points to a/streamContentendpoint that returns AES-encrypted content, not plaintext VTT. The REST API/contentendpoint above is the reliable plaintext path.
Fallback: temporaryDownloadUrl with curl (only if REST API fails):
If the REST API returns errors, try the temporaryDownloadUrl with curl. It expires quickly, so use it immediately after Step 3. Test with head -1 to confirm the file starts with WEBVTT:
curl -s -o "<output_path>/transcript.vtt" "<temporaryDownloadUrl>"
When saving from browser memory, do not extract raw text through the conversation (wastes context tokens). Instead:
window.__transcript.substring(0, 100))btoa(unescape(encodeURIComponent(window.__transcript)))playwright_evaluateimport json, base64, os
# Read base64 from tool-result JSON files + any Write-tool-created chunk files
# Concatenate all base64 strings, then:
decoded = base64.b64decode(b64_all).decode('utf-8')
open('transcript.vtt', 'w', encoding='utf-8').write(decoded)
See references/api_endpoints.md for the full API endpoint reference.
_api/v2.1 (not _api_cached). The _api_cached/.../cdnmedia/transcripts endpoint returns AES-encrypted content that requires a client-side key. Always use the REST API path instead.Detect visual changes in Teams recordings and capture screenshots of each slide/screen state.
scripts/slide-detector.js via playwright_evaluate// Start scan
window.SlideDetector.run({ coarseInterval: 60, threshold: 10 })
.then(r => { window._result = r; window._done = true; });
// Poll progress (0-100)
window.SlideDetector.progress
// Abort if needed
window.SlideDetector.abort()
window._sdResult (data URLs stored separately in window._sdScreenshots)scripts/screenshot-saver.js (see below)| Parameter | Default | Description |
|---|---|---|
coarseInterval | 60 | Seconds between samples. Use 60 to avoid Stream session timeout. |
threshold | 10 | Min diff score for a change. 10 works well for slides. |
captureRegion | {x:0, y:0, w:0.75, h:1.0} | Left 75% of frame (slide area in Teams presenter mode). |
After saving screenshots to disk, use sub-agents (Task tool with subagent_type: "general-purpose") to classify and describe each slide. This avoids loading all images into the main context.
Launch one sub-agent per batch of ~5 screenshots. Each sub-agent should:
slide, screen-share, gallery-view, transition-artifact, duplicate), slide title, brief description, and whether it duplicates an earlier slideAfter all sub-agents complete, merge their results into the final manifest. Filter out entries where isContent is false (webcam-only frames) or contentType is transition-artifact.
See references/slide_detection.md for the full LLM post-processing prompt template, field definitions, and enhanced output format.
After slide detection completes, inject scripts/screenshot-saver.js and loop:
// 1. Inject screenshot-saver.js via playwright_evaluate
// 2. Initialize:
window.ScreenshotSaver.init() // -> { status: "ready", total: N }
// 3. Loop: call next(), then screenshot:
window.ScreenshotSaver.next() // -> { filename, index, total, done }
// playwright_screenshot({ selector: "#ss-img", savePng: true, downloadsDir: "<dir>", name: "<filename_stem>" })
// 4. Repeat until done === true
// 5. Cleanup:
window.ScreenshotSaver.destroy()
After all screenshots are saved, rename to strip Playwright's timestamp suffix:
Get-ChildItem -Path "<output_dir>" -Filter "change_*-202*Z.png" | ForEach-Object {
$newName = $_.Name -replace '-\d{4}-\d{2}-\d{2}T\d{2}-\d{2}-\d{2}-\d+Z', ''
Rename-Item $_.FullName $newName
}
| Script | Purpose | Key API |
|---|---|---|
slide-detector.js | Detect visual transitions in video | SlideDetector.run(opts), .progress, .abort() |
screenshot-saver.js | Display and capture transition PNGs | ScreenshotSaver.init(), .next(), .showTransition(i), .destroy() |
resolve-drive-item.js | Extract driveId/itemId from Stream page | ResolveDriveItem.run() |
npx claudepluginhub dstreefkerk/claude-skills --plugin productivityDownloads videos from YouTube, Instagram, X/Twitter, Vimeo, TikTok, or local paths, extracts frames and transcripts (via captions or on-device mlx-whisper), and lets Claude answer questions about the video content.
Extracts transcript and key slides from a local video file using mlx-whisper, then creates a vault-formatted lecture note with embedded screenshots. Works with any language.
Downloads YouTube video transcripts as timestamped text files via youtube-transcript-api. Use for extracting transcripts from videos for analysis, documentation, or content review.