From agent-webbridge
Drive the user's REAL Chrome — multiple profiles with their LIVE logins, and MULTIPLE TABS PER PROFILE, all IN PARALLEL — through agent-webbridge. Clean-room, open-source (MIT), no account, no telemetry. Automates the user's actual Chrome with their real logged-in sessions (not headless/scrape like Playwright or Firecrawl). Use for any task needing a real browser across one or more logged-in Chrome profiles: multi-account workflows, acting as the user across several accounts at once, or driving N tabs in one profile concurrently.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-webbridge:agent-webbridgeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Clean-room, open-source (MIT) browser automation for AI agents. Drives the user's **actual**
Clean-room, open-source (MIT) browser automation for AI agents. Drives the user's actual
Chrome — multiple profiles with their live logins, and multiple tabs per profile, all
in parallel. A lightweight Node daemon runs per profile on a deterministic hashed port; a
router on http://127.0.0.1:10086 proxies /command to the right daemon by a top-level
"profile" field. macOS-first (Google Chrome).
agent-webbridge is clean-room and standalone: its own daemon (src/daemon/, only runtime
dep is ws) and its own MV3 Chrome extension (agent-webbridge-extension/, stable id
ifodkkbkmngjlkhiphcjmbceeolhpfeo). No closed-source dependency, no account, no
telemetry, and no curl|bash installer. It is localhost-only: the extension connects
only to a daemon you run on 127.0.0.1; no data leaves the machine and no remote server is ever
contacted.
Use this whenever a task needs a real browser with the user's real logins, especially across more than one account or with several pages in flight at once:
chrome.debugger per tab, so N tabs in one profile run in parallel
(the killer feature vs typical browser bridges, which funnel every call through one
global "current tab").Prefer this over headless/scrape tools when login state, real cookies, multiple simultaneous profiles, or per-profile tab parallelism matter.
awb status
Then act on the result:
daemonUp: true and extensionConnected: true for the profile(s) you want — healthy.
Proceed with the tool calls below.Full, copy-pasteable setup is in AGENTS.md. Quick version:
npm i -g agent-webbridge # daemon + `awb` CLI
awb setup "Work" "Personal" # install the extension (guided Load unpacked) + bring the fleet up
awb connect "Work" "Personal" # (re)point each extension at its daemon (zero clicks; closes Chrome)
awb up "Work" "Personal" # start each profile's daemon + router on :10086, open windows
awb status # verify: extensionConnected:true per profile
awb down # tear down the fleet
(Pre-publish, run node bin/awb.mjs <cmd> from the repo. Run awb doctor first for a read-only
environment self-check.)
The extension installs via Chrome's built-in "Load unpacked" — there is no Chrome Web
Store listing, and none is needed. awb setup <profile…> automates everything except the one
click Chrome reserves for a human. When you (the agent) run it, guide the user through that
click and poll for success rather than waiting blindly:
awb setup "<profile>". It prints the exact extension folder and opens
chrome://extensions.awb check "<profile>" --json. For each profile it reports developerMode, loaded,
enabled, daemonUp, connected, a ready boolean, and a single nextStep hint. Relay
nextStep to the user and loop until ready: true. (awb setup also polls and continues on
its own once the load lands, then connects + brings the fleet up.)Zero-click connect: awb connect points each profile's extension at its daemon by writing
local_url directly into the extension's on-disk storage.local — no popup, no clicks. It
quits Chrome to write (LevelDB is single-writer) and the value persists, so later awb up runs
just reconnect. If Chrome is already quit, awb up alone does the write for you.
POST to the router on http://127.0.0.1:10086/command. The body is the normal command body
plus a top-level "profile" field (name / email / directory) selecting the target profile.
Omit "profile" to hit the default profile. Every command also carries a top-level "session"
naming the current task — see Sessions.
curl -s -X POST http://127.0.0.1:10086/command \
-H 'Content-Type: application/json' \
-d '{"action":"navigate","args":{"url":"https://mail.google.com"},"session":"s1","profile":"Work"}'
Parallelism is cross-profile (N profiles) × per-tab (N tabs/profile): fire several
/command requests at once — across different profiles, or at different tabs within the same
profile — and they run concurrently.
For a large worklist (hundreds–thousands of items — scrape N companies, check N accounts, fill N forms), drive it as a fleet fan-out:
awb status / awb check --json / awb profiles): use
2 profiles if at least two are set up + connected, otherwise fall back to 1 profile.
Within each profile keep 5 concurrent tabs. If the user asks for a specific
tabs-per-profile number, use that instead (per-profile parallelism is proven safe up to
10 tabs; 5 is the gentler default).awb up "P1" "P2" …), split the worklist into one chunk per profile, and within each
profile keep up to M /command requests (M tabs) in flight at once — e.g.
5 profiles × 10 tabs = 50 items processed concurrently. Per-tab parallelism is real
(chrome.debugger is attached per tab), so tabs in one profile don't block each other.awb down,
machine sleep) then resumes where it stopped instead of redoing everything — and you can
safely stop a run to re-tune concurrency, then relaunch.13 tools, full parity, all verified live in a real browser:
| Tool | Args | Returns | Note |
|---|---|---|---|
navigate | url, newTab(bool), group_title | {success, url, tabId} | First call opens a tab — see Tabs. group_title sets the group's visible label |
find_tab | url, active(bool) | {success, url, tabId} | Select an already-open tab as the current one — see Tabs |
snapshot | — | {url, title, tree} with @e refs | Accessibility tree (text) — use this to read page content and locate elements |
click | selector (@e ref or CSS) | {success, tag, text} | Synthetic el.click() |
fill | selector, value | {success, tag, mode} | Works on <input>/<textarea> AND [contenteditable] (ProseMirror/Lexical/Slate). mode is "value" or "contenteditable" |
evaluate | code (supports async/await) | {type, value} | |
screenshot | format(png|jpeg), quality(0-100), optional selector (@e/CSS), optional path | {format, path, sizeBytes, mimeType} | Returns a file path, not base64 — see Screenshots |
network | cmd(start|stop|list|detail), filter, requestId | request/response data | Capture requests/responses |
upload | selector, files(string[]) | {success, fileCount} | |
save_as_pdf | paper_format, landscape, scale, print_background, optional path | {path, sizeBytes, mimeType, pageTitle} | Render current page → PDF, returns a file path — see Save as PDF |
list_tabs | — | {success, tabs:[{tabId, url, title, active, groupTitle}]} | Inspect tabs in the current session |
close_tab | — | {success, closed: bool} | Close the current tab in the session |
close_session | — | {success, closed: int} | Close all tabs in the session — closed is the count. See Sessions |
Single-tab tools (snapshot, click, fill, screenshot, save_as_pdf) act on the current
tab — the one you most recently opened with navigate or selected with find_tab. Because the
daemon attaches chrome.debugger per tab, multiple tabs in the same profile can be driven
concurrently from parallel /command requests.
newTab:true when pages should coexist (comparing, cross-referencing,
or running in parallel); omit it to send the current tab to a new URL.find_tab to make a tab you already opened the current
one again. Pass the tab's full URL — take it from list_tabs or the earlier navigate
result. A bare root domain may miss a www. tab, so prefer the exact URL. active:true picks
the tab the user is currently viewing; otherwise the leftmost match wins.find_tab returns "no open tab found", the page isn't open — navigate with newTab:true
instead.curl -s -X POST http://127.0.0.1:10086/command \
-d '{"action":"find_tab","args":{"url":"https://www.example.com","active":true},"session":"research","profile":"Work"}'
Every command carries a top-level session naming the current task — see Sessions.
Add a top-level profile to target a specific profile (omit for the default). The examples in
later sections omit them only for brevity; in real calls always include session (and profile
when driving a non-default profile).
curl -s -X POST http://127.0.0.1:10086/command \
-H 'Content-Type: application/json' \
-d '{"action":"navigate","args":{"url":"https://example.com","newTab":true,"group_title":"My task"},"session":"my-task","profile":"Work"}'
One task = one session = one tab group. A session collects every tab this task opens into a
single Chrome tab group, so the user sees one group representing "what the agent is doing right
now". Pass it as a top-level field of the request body (not inside args).
Rules:
camping-research,
phone-compare.group_title is the human-readable label shown on the group in the browser. Pass it on the
first navigate; later calls in the same session don't need it.awb groups, or list_tabs and read each tab's groupTitle). If a group for this task
already exists (agent:<session>), reuse that exact session name instead of starting a new
one. The daemon already re-joins a group by session name across service-worker restarts —
fragmentation comes from the caller picking a new name, not from the fleet.probe-x, probe-y, …) scatters the work across many groups — and orphans them if a worker
crashes mid-run. (One task = one group still holds, no matter how many workers.)close_session
when done and on error/early-exit paths. List strays with awb groups and prune any with
awb close <session> (the name shown after agent:).# First tab of the task: set session + human-readable label
curl -s -X POST http://127.0.0.1:10086/command \
-d '{"action":"navigate","args":{"url":"https://www.google.com/search?q=tents","newTab":true,"group_title":"Camping gear research"},"session":"camping-research","profile":"Work"}'
# Another SITE, SAME task → same session → joins the same group automatically
curl -s -X POST http://127.0.0.1:10086/command \
-d '{"action":"navigate","args":{"url":"https://www.example.com/search?q=tents","newTab":true},"session":"camping-research","profile":"Work"}'
# Every later command carries the same session
curl -s -X POST http://127.0.0.1:10086/command \
-d '{"action":"snapshot","args":{},"session":"camping-research","profile":"Work"}'
When the task is finished and the user no longer needs these pages, close_session clears the
whole group. If they might want to look further, deliver your answer first and leave the tabs
open — closing too eagerly throws away work the user can still see.
The daemon writes the image to disk and returns {format, path, sizeBytes, mimeType} — never
base64, since the model can't read raw image bytes. Take the .path and open it with the Read
tool to actually see it.
# Default: PNG of the visible viewport, daemon picks a temp path
curl ... -d '{"action":"screenshot","args":{}}'
# Options (each independent): JPEG quality, element-only via @e/CSS selector, custom output path
curl ... -d '{"action":"screenshot","args":{"format":"jpeg","quality":60}}'
curl ... -d '{"action":"screenshot","args":{"selector":"@e123"}}'
A caller-supplied path is honored verbatim (parent dirs created, existing file overwritten) —
use a unique name to avoid clobbering. save_as_pdf follows the same rule.
snapshot returns interactive elements with stable @e refs based on semantic role/name.
Use them directly with click/fill — they survive CSS class hash changes that break
manually-written selectors.
Fall back to evaluate (JS) only when:
@e ref in the snapshothref)JSON.stringify(data) — never add null, 2 formatting. Indentation and
newlines can inflate the response several times over, causing truncation during transmission.evaluate calls share the page's JS realm — re-declaring the same const/let across two
calls throws SyntaxError. Wrap in an IIFE for a fresh scope:
(() => { const x = ...; return x; })().fillfill handles native inputs and contenteditable. Pass selector (CSS or @e ref) + value:
| Target | What fill does | Returned mode |
|---|---|---|
<input> / <textarea> | Sets .value via native setter, fires input/change. | "value" |
[contenteditable] (ProseMirror / TipTap / Lexical / Slate / Quill etc.) | Focuses, selects all existing content, calls document.execCommand('insertText', ...) which fires beforeinput/input with inputType:'insertText' and data:value. | "contenteditable" |
| Other element | Best-effort .value + events. | "value" |
fill is clear-and-insert: existing content is replaced. For "append to existing text", read
the current value via evaluate, concatenate, then fill with the result.
There's no separate "press Enter" tool. To submit a form, click the submit button directly (its
@e ref or selector). To dispatch a key event programmatically (e.g. Escape to close a modal):
{"action":"evaluate","args":{"code":"document.activeElement.dispatchEvent(new KeyboardEvent('keydown',{key:'Escape',bubbles:true}))"}}
save_as_pdf renders the current page to PDF and returns the file path. All args optional:
paper_format: letter (default) | a4 | legal | a3 | tabloidlandscape: false (default)scale: 1.0 (default), range [0.1, 2.0]print_background: true (default) — keep background colorspath: caller-supplied output path; if absent, daemon picks a default under OS temp dir using
the page title as the filenamepath semantics match screenshot: written verbatim, parent dirs auto-created, existing files
overwritten.
event.isTrusted (some banking portals, captcha challenges)
reject fill and click because both go through DOM-level synthetic events
(isTrusted=false). This is a product boundary, not a bug.fill, click, evaluate, and snapshot operate on the top frame.
If a target element lives in a same-page iframe from a different origin, navigate to the
iframe's URL directly instead.:10086 is reserved for the router — it is never assigned to a profile.127.0.0.1 daemon you run. No remote
server is ever contacted, no analytics, no account.>=18.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub jeet-dhandha/agent-webbridge --plugin agent-webbridge