ato-mcp
Local search and retrieval over the Australian Taxation Office legal corpus.
Ships as a local MCP command with plugin metadata, a Rust binary, and a
one-shot corpus download. The MCP command starts or reuses one local HTTP
backend so the SQLite corpus and semantic model are shared per user.
Retrieval infrastructure, not tax advice. Verify cited ATO material and
apply professional judgment before relying on an answer.
What you get
- A pre-built local corpus of ~158k ATO documents and ~467k chunks, queryable
with hybrid BM25 + Granite vector search.
- Live retrieval for ATO documents the corpus doesn't carry.
- Statutory-definition lookup with an ordinary-meaning fallback.
- All of the above as MCP tools the agent can call directly.
Tools
| Tool | Purpose |
|---|
search | Hybrid semantic-plus-lexical search over the corpus. Defaults exclude edited private advice and pre-2000 non-legislation content. |
get_chunks | Fetch chunk bodies by chunk_id, with optional neighbour context. [doc:X] markers point into the corpus and resolve via get_chunks / get_doc_anchors; [fetch:URI] markers point outside the corpus and resolve via fetch. |
get_doc_anchors | In-document anchors, related documents, historical-version URLs, and reverse citations for a corpus document. |
get_definition | Statutory definitions with a labelled ordinary-meaning fallback. |
get_asset | Resolve a retained image data-asset-ref to an MCP image content item plus caption. |
fetch | Live-fetch an ATO document by URI: ato:<doc_id>[?pit=...&view=...]. Returns chunks of the same shape as get_chunks. |
stats | Index version, counts, and default search policy. |
Document bodies are exposed as cleaned HTML fragments so agents navigate the
source structure directly. Search chunks are plain text derived from that
HTML; heading paths live in metadata, links and images contribute only their
visible text.
Install For An Agent
Agent flow from a fresh checkout:
git clone https://github.com/gunba/ato-mcp.git
claude plugin install ./ato-mcp
Pi uses the same MCP command through pi-mcp-adapter. Install the Pi MCP
adapter once, then install this checkout as a Pi package so the ATO skills are
available in Pi:
pi install npm:pi-mcp-adapter
pi install ./ato-mcp
The repository .mcp.json registers ato-mcp mcp for project-local MCP
clients that read standard MCP config. For user-global Pi access from any
project, add the same mcpServers.ato entry to ~/.config/mcp/mcp.json or
~/.pi/agent/mcp.json.
The plugin/package registers ato-mcp mcp as a stdio MCP command. On MCP startup it
starts or reuses a local loopback HTTP backend, records the chosen endpoint in
the user data dir, and proxies MCP messages to that backend. There is no
first-run port edit and no required session restart for a generated URL.
The binary location and corpus location are separate. Enterprise installs may
place ato-mcp.exe under the user's local app-data area. The installer must
choose one corpus data directory and use it consistently for ato-mcp update,
ato-mcp mcp, the backend server, and verification commands.
Default mode: leave ATO_MCP_DATA_DIR unset, and the corpus installs into the
default user data directory. Portable/co-located mode: set ATO_MCP_DATA_DIR
to a stable data directory next to the binary for every ato-mcp command and
server start. Do not run update with a non-default data dir and later start
mcp or serve without the same setting.
After install or update, the agent verifies:
ato-mcp stats
ato-mcp search "research and development tax incentive eligibility" --k 1
If the corpus is missing, the agent explains the large one-time download,
runs the update, restarts the MCP host or ato-mcp mcp, and verifies again:
ato-mcp update
The plugin includes two agent skills:
ato-mcp-server: small research skill loaded for ordinary ATO/tax queries.
setup-ato-mcp: detailed install, timeout, port, and corpus-update recovery
skill loaded only when setup or repair is needed.
For manual MCP clients, register ato-mcp mcp as the stdio MCP command. Do
not configure ato-mcp serve as a stdio MCP command; it is the backend HTTP
server.
{
"mcpServers": {
"ato": {
"command": "ato-mcp",
"args": ["mcp"]
}
}
}
Updates
ato-mcp update
Full corpus replacement: the binary finds the newest release that includes a
corpus manifest.json, downloads ato.db.zst, verifies its sha256, and
atomic-renames it into the live data dir. The MCP server reads its corpus
once at startup, so restart the MCP client and the local backend process for a
new corpus to take effect.
When a newer corpus is published, the server's initialize instructions tell
the agent — the agent surfaces the suggestion to the user and runs the update
when the user agrees.
Search defaults