From agent-browser
This skill should be used when the user asks to "browse a web page", "open a URL in the browser", "click an element", "fill a form field", "take a screenshot of a page", "inspect page elements", "get a page snapshot", "interact with a website", "automate a browser", "scrape a web page", "use agent-browser", or mentions the agent-browser CLI. Provides the ref-based browser automation workflow for AI agents.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-browser:agent-browserThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
agent-browser is a CLI that gives AI agents direct control over headless browsers. It uses a **ref-based** interaction model: every interactive element receives a unique reference identifier (`@e1`, `@e2`, etc.) derived from the accessibility tree. This enables deterministic, token-efficient browser automation without fragile CSS selectors.
agent-browser is a CLI that gives AI agents direct control over headless browsers. It uses a ref-based interaction model: every interactive element receives a unique reference identifier (@e1, @e2, etc.) derived from the accessibility tree. This enables deterministic, token-efficient browser automation without fragile CSS selectors.
Ensure agent-browser is installed and the browser binary is available:
# Check installation
agent-browser --version
# Install globally
npm install -g agent-browser
agent-browser install
# On Linux, include system dependencies
agent-browser install --with-deps
Every browser interaction follows a strict cycle:
open URL → snapshot → read refs → interact → re-snapshot → repeat
openclick @e1, fill @e2 "text")Critical rule: Refs are invalidated after navigation or significant DOM changes. Always take a fresh snapshot before interacting with elements after any page change.
| Command | Syntax | Purpose |
|---|---|---|
open | agent-browser open <url> | Navigate to URL |
snapshot | agent-browser snapshot [-i] [-c] [-d N] [-s "sel"] | Get accessibility tree with refs |
click | agent-browser click @eN [--new-tab] | Click element |
dblclick | agent-browser dblclick @eN | Double-click element |
fill | agent-browser fill @eN "text" | Clear field and type text |
type | agent-browser type @eN "text" | Type without clearing |
focus | agent-browser focus @eN | Focus element |
select | agent-browser select @eN "option" | Select dropdown option |
check | agent-browser check @eN | Check checkbox |
uncheck | agent-browser uncheck @eN | Uncheck checkbox |
hover | agent-browser hover @eN | Hover over element |
press | agent-browser press <key> | Press keyboard key |
scroll | agent-browser scroll <dir> <amount> | Scroll page |
screenshot | agent-browser screenshot [--annotate] | Capture screenshot |
set headers | agent-browser set headers '{...}' | Set request headers |
The snapshot command is the primary inspection tool. Choose the right mode:
snapshot -i — Interactive elements only (buttons, inputs, links). Use as default.snapshot -i -C (uppercase C) — Include cursor-interactive elements (divs with onclick). Use when expected elements are missing from -i. Note: -C (cursor-interactive) is distinct from -c (compact).snapshot -c — Compact output. Reduces token usage on large pages.snapshot -d N — Limit tree depth to N levels. Use for deeply nested DOMs.snapshot -s "selector" — Scope to a CSS selector. Focus on a specific page section.snapshot --annotate — Overlay numbered labels [N] on a screenshot matching refs @eN.Combine flags freely: agent-browser snapshot -i -c -d 3 for compact, shallow, interactive-only output.
fill @eN "text" — Clears existing content, then types. Use for replacing field values.type @eN "text" — Appends without clearing. Use for adding to existing content.click @eN — Standard click. If the page navigates, re-snapshot immediately.click @eN --new-tab — Opens in new tab. Original page state is preserved.press Enter — Submit forms.press Tab / press Shift+Tab — Navigate focus.press Escape — Close modals or dialogs.press Control+a, press Control+c.scroll down 3 — Scroll down 3 viewport heights.scroll up 1 — Scroll up 1 viewport height.screenshot — Capture current viewport for visual inspection.screenshot --annotate — Overlay [N] labels on interactive elements. Use to visually confirm ref assignments before performing destructive actions.Always re-snapshot after:
open, clicking a link)Safe to skip re-snapshot after:
screenshot (read-only)hover (unless hover triggers a dropdown or tooltip DOM change)focus (usually no DOM change)When uncertain, snapshot. The cost of an extra snapshot is far less than using a stale ref.
Set custom headers for authenticated pages:
agent-browser set headers '{"Authorization": "Bearer <token>", "Cookie": "session=abc123"}'
Headers persist for the browser session and apply to matching domains. Set headers before navigating to authenticated pages.
snapshot -i — Interactive elements are usually sufficient. Expand to -i -C or full snapshot only when needed.screenshot --annotate to confirm the target element before delete, submit, or irreversible actions.-c (compact) and -d N (depth limit) for large pages. Scope with -s "selector" for targeted sections.For detailed command documentation with all flags and usage examples:
references/commands.md — Complete command reference with flag details and worked examplesnpx claudepluginhub musingfox/cc-plugins --plugin agent-browserReference for agent-browser commands to navigate pages, snapshot elements, interact (click/fill/type), extract data. For web testing, form automation, screenshots.
Internal skill for browser automation via agent-browser CLI. Use only when actionbook MCP lacks selectors for a target site or interactive testing is needed.
Controls a headless browser via Vercel's agent-browser CLI for navigation, form filling, screenshots, and scraping using accessibility refs.