rawdoc

Fetch web pages as clean markdown for AI coding agents.

Single Go binary. One dependency (x/net/html). Fetches HTML, strips noise, outputs markdown. Works as a CLI, MCP server, and Claude Code plugin.

Install

Claude Code Plugin (recommended)

/install-plugin RandomCodeSpace/rawdoc

Adds /rawdoc and /rawdoc-crawl slash commands plus rawdoc_fetch and rawdoc_crawl MCP tools. The setup hook builds the binary automatically — requires Go 1.25+.

CLI

go install github.com/RandomCodeSpace/rawdoc@latest

MCP Server

rawdoc --serve

Runs as a JSON-RPC stdio server implementing the Model Context Protocol. Exposes rawdoc_fetch and rawdoc_crawl tools. See Manual MCP Setup below for configuration.

What It Does

Fetches HTML via plain HTTP with browser-like headers
Strips noise — scripts, styles, navbars, footers, ads, cookie banners, hidden elements
Extracts main content using site-specific selectors or readability scoring
Converts to clean markdown (headings, code blocks, tables, lists)
Crawls linked pages when given a depth > 0

95%+ token reduction vs raw HTML. Works on server-rendered sites. JS-only SPAs are not supported.

Usage

# Single page → stdout
rawdoc https://kubernetes.io/docs/concepts/workloads/pods/

# Just the code blocks
rawdoc https://www.baeldung.com/spring-kafka --code-only

# JSON output with metadata
rawdoc https://pkg.go.dev/fmt -f json

# YAML output
rawdoc https://pkg.go.dev/fmt -f yaml

# Save to file
rawdoc https://example.com -o docs.md

# Crawl docs to a directory (depth=2, max 50 pages)
rawdoc https://kubernetes.io/docs/concepts/workloads/ -d 2 -o ~/docs/k8s/

# Verbose — see fetch decisions and token stats
rawdoc https://www.baeldung.com/spring-kafka -v

# MCP server mode (stdio JSON-RPC)
rawdoc --serve

Verbose Output

[tier1] https://pkg.go.dev/fmt → fetching
[stats] input: 139.2KB (35634 tokens) → output: 43.5KB (11135 tokens) | 69% saved
[output] wrote json to docs.json

All verbose output goes to stderr. stdout stays clean for piping.

Flags

Output

Flag	Default	Description
`-o, --output`	stdout	File or directory
`-f, --format`	`markdown`	`markdown` `text` `json` `yaml`
`--code-only`	—	Extract only code blocks
`--no-links`	—	Strip link URLs, keep text only

Crawling

Flag	Default	Description
`-d, --depth`	`0`	Crawl depth (0 = single page)
`-c, --concurrency`	`5`	Parallel fetches
`--max-pages`	`50`	Page limit
`--delay`	`1s`	Delay between requests
`--include`	—	URL path glob to include
`--exclude`	—	URL path glob to exclude
`--sitemap`	—	Parse sitemap.xml for URL discovery

HTTP

Flag	Default	Description
`--timeout`	`15s`	Per-request timeout
`--max-time`	`10m`	Total runtime ceiling
`--max-retries`	`3`	Per-URL retries with exponential backoff
`--header K=V`	—	Extra header (repeatable)

Info

Flag	Default	Description
`-v, --verbose`	—	Fetch log and token stats to stderr
`-q, --quiet`	—	Suppress all stderr
`--serve`	—	Run as MCP stdio server
`--version`	—	Print version

Crawl Mode

rawdoc https://kubernetes.io/docs/concepts/workloads/ -d 2 --max-pages 50 -o ~/docs/k8s/

Writes one .md file per page plus an index.md:

~/docs/k8s/
├── index.md
├── workloads.md
├── workloads-pods.md
├── workloads-controllers-deployment.md
└── ...

Stays on the same domain. Respects --include/--exclude globs and --max-pages limit.

Output Formats

Format	Description
`markdown`	Headings, code blocks, tables, lists (default)
`text`	Plain text, no markup
`json`	Structured: url, title, content, code_blocks, fetch_tier, token count
`yaml`	Same fields as JSON
`--code-only`	Only fenced code blocks from the page

Site-Specific Selectors

Built-in content selectors for: Baeldung, Docusaurus, GitBook, ReadTheDocs, MkDocs, Spring.io, GitHub, MDN, Go pkg.dev, StackOverflow, Medium, Dev.to, Confluence, Notion.

Falls back to readability scoring when no selector matches.

rawdoc

Popularity

What's Inside

README

rawdoc

Install

Claude Code Plugin (recommended)

CLI

MCP Server

What It Does

Usage

Verbose Output

Flags

Output

Crawling

HTTP

Info

Crawl Mode

Output Formats

Site-Specific Selectors

Claude Code Plugin

What You Get

Confidence

Similar Plugins

docpull

indexandria

import

md-anything

claude-utilities

archcore

Popularity

Health & Quality

Similar Plugins

docpull

indexandria

import

md-anything

claude-utilities

archcore