From business-intelligence-skills
Fetches any URL via Chrome CDP and renders JavaScript-heavy pages to clean markdown. Supports auto-capture on page load or wait mode for login-required pages.
How this skill is triggered — by the user, by Claude, or both
Slash command
/business-intelligence-skills:canghe-url-to-markdownThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Fetches any URL via Chrome CDP and converts HTML to clean markdown.
Fetches any URL via Chrome CDP and converts HTML to clean markdown.
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
SKILL_DIR${SKILL_DIR}/scripts/<script-name>.ts${SKILL_DIR} in this document with the actual pathScript Reference:
| Script | Purpose |
|---|---|
scripts/main.ts | CLI entry point for URL fetching |
Use Bash to check EXTEND.md existence (priority order):
# Check project-level first
test -f .canghe-skills/canghe-url-to-markdown/EXTEND.md && echo "project"
# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.canghe-skills/canghe-url-to-markdown/EXTEND.md" && echo "user"
┌────────────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ .canghe-skills/canghe-url-to-markdown/EXTEND.md │ Project directory │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.canghe-skills/canghe-url-to-markdown/EXTEND.md │ User home │ └────────────────────────────────────────────────────────┴───────────────────┘
┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ Use defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘
EXTEND.md Supports: Default output directory | Default capture mode | Timeout settings
# Auto mode (default) - capture when page loads
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
# Wait mode - wait for user signal before capture
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
# Save to specific file
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
| Option | Description |
|---|---|
<url> | URL to fetch |
-o <path> | Output file path (default: auto-generated) |
--wait | Wait for user signal before capturing |
--timeout <ms> | Page load timeout (default: 30000) |
| Mode | Behavior | Use When |
|---|---|---|
| Auto (default) | Capture on network idle | Public pages, static content |
Wait (--wait) | User signals when ready | Login-required, lazy loading, paywalls |
Wait mode workflow:
--wait → script outputs "Press Enter when ready"YAML front matter with url, title, description, author, published, captured_at fields, followed by converted markdown content.
url-to-markdown/<domain>/<slug>.md
<slug>: From page title or URL path (kebab-case, 2-6 words)<slug>-YYYYMMDD-HHMMSS.md| Variable | Description |
|---|---|
URL_CHROME_PATH | Custom Chrome executable path |
URL_DATA_DIR | Custom data directory |
URL_CHROME_PROFILE_DIR | Custom Chrome profile directory |
Troubleshooting: Chrome not found → set URL_CHROME_PATH. Timeout → increase --timeout. Complex pages → try --wait mode.
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
npx claudepluginhub freestylefly/canghe-skills --plugin content-skillsFetches any URL and converts to clean markdown using baoyu-fetch CLI (Chrome CDP with site-specific adapters). Built-in support for X/Twitter, YouTube transcripts, Hacker News, and generic pages. Handles login/CAPTCHA via interaction wait modes.
Extracts clean Markdown from any URL using ezycopy CLI. Handles JS-rendered pages with headless Chrome, retries on failure, and auto-installs tool if needed.
Extracts clean markdown from any URL, including JavaScript-rendered SPAs. Supports concurrent scraping, JS wait times, and content filtering.