From claude-commands
Validates browser UI changes by extracting text from DOM via Chrome Superpowers auto-capture or using Claude's built-in vision on screenshots, with Tesseract OCR as a fallback.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:browser-testing-ocr-validationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**CRITICAL**: Always validate browser screenshots with OCR when testing UI changes, especially for bugs where visual elements may or may not be present.
CRITICAL: Always validate browser screenshots with OCR when testing UI changes, especially for bugs where visual elements may or may not be present.
page.md)Chrome Superpowers automatically captures page.md on every DOM-changing action (navigate, click, type, select, eval). This extracts text directly from the DOM - more accurate than image OCR.
/tmp/chrome-session-{id}/{action-number}-{action}-{timestamp}/
├── page.md # DOM text extraction (primary OCR)
├── page.html # Full rendered DOM
├── screenshot.png # Visual page state
└── console-log.txt # Browser console output
# 1. Navigate (triggers auto-capture)
mcp__chrome-superpower__use_browser(action="navigate", payload="https://example.com")
# 2. Read the auto-captured page.md for text validation
Read(file_path="/tmp/chrome-session-xxx/001-navigate-yyy/page.md")
# Contains all text content extracted from the DOM
Advantages:
Claude Code is a multimodal LLM that can directly see and analyze images. When you use the Read tool on a PNG file, Claude sees the image content directly.
# Simply read the image file - Claude sees it directly
Read(file_path="/tmp/screenshots/test-screenshot.png")
# Claude will describe: user emails, button text, form content, UI state, etc.
Advantages over external OCR:
Use Tesseract as an additional method for cross-validation when needed.
This system has Tesseract installed at /opt/homebrew/bin/tesseract with Python bindings.
which tesseract # Should return /opt/homebrew/bin/tesseract
tesseract --version
python3 -m pip install --user pillow pytesseract
# Using Chrome Superpowers MCP
mcp__chrome-superpower__use_browser(action: "screenshot", payload: "/tmp/screenshots/test-name.png")
python3 - <<'PY'
from PIL import Image
import pytesseract
image_path = "/tmp/screenshots/test-name.png"
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
print(text)
PY
python3 - <<'PY'
from PIL import Image
import pytesseract
image_path = "/tmp/screenshots/after-send.png"
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
# Check for specific content
if "expected user message" in text:
print("✅ SUCCESS: Message IS visible")
else:
print("❌ FAILURE: Message NOT visible")
print("\n=== Full OCR Output ===")
print(text)
PY
Problem: Messages appeared to disappear after clicking Send, but visual inspection of screenshots was unreliable.
Solution: Used OCR to definitively prove the bug:
python3 - <<'PY'
from PIL import Image
import pytesseract
image_path = "/tmp/screenshots/after-send-before-response.png"
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
# Look for the user's message text
if "verify bug fix test" in text or "VERIFY FIX" in text:
print("✅ Message IS visible (optimistic UI working)")
else:
print("❌ Message NOT visible (bug confirmed)")
print("\nOCR Output:")
print(text)
PY
Result: OCR showed the message was NOT in the OCR output, confirming the bug was still present even though the screenshot "looked fine" to human eyes.
MANDATORY for:
OPTIONAL for:
#!/bin/bash
# Test: Verify user message stays visible after Send
# 1. Navigate and type message
python3 - <<'PY'
# (use Chrome Superpowers to type message)
PY
# 2. Click Send
python3 - <<'PY'
# (use Chrome Superpowers to click Send button)
PY
# 3. Immediate screenshot
python3 - <<'PY'
# (use Chrome Superpowers to capture screenshot)
PY
# 4. OCR validation
python3 - <<'PY'
from PIL import Image
import pytesseract
img = Image.open("/tmp/screenshots/after-send.png")
text = pytesseract.image_to_string(img)
if "expected message text" in text:
print("✅ TEST PASSED")
exit(0)
else:
print("❌ TEST FAILED")
print(text)
exit(1)
PY
Add OCR validation to Cypress/Playwright tests:
// Cypress example
cy.task('ocrScreenshot', '/tmp/screenshots/test.png').then((text) => {
expect(text).to.include('expected message')
})
Golden Rule: If you're testing whether UI elements are visible in a browser, ALWAYS use OCR to validate. Don't trust your eyes or memory - trust the OCR output.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsAutomates headless browser via agent-browser CLI: open/navigate sites, snapshot interactive elements for refs, click/fill forms, verify UI, scrape data, e2e test web apps.
Provides browser-level verification using Claude-in-Chrome (primary) or chrome-devtools-mcp (fallback). Includes provider detection, dev server detection, and graceful degradation.
Controls a live Chrome browser via puppeteer-core for automation, testing, and performance auditing. Use for clicking, typing, screenshots, DOM/AX tree, network interception, HAR export, Lighthouse audits, and device emulation.