Teach the system to recognize a new supplier invoice format, or finetune an existing one. Receives the PDF path as argument (e.g. /learn-document path/to/file.pdf). If no parser exists, creates one from scratch. If a parser exists but has low confidence or wrong values, enters review/finetune mode to fix the regex patterns. Triggers on: learn document, teach document, new supplier, fix parser, finetune, corrigir parser, afinar parser, aprender documento, ensinar documento.
How this skill is triggered — by the user, by Claude, or both
Slash command
/supplier-invoice-service:learn-documentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create or finetune a deterministic parser plugin from a PDF file, test it, get user validation, and register it in Supabase.
Create or finetune a deterministic parser plugin from a PDF file, test it, get user validation, and register it in Supabase.
Before calling ANY MCP tool, verify the server is reachable by checking if parse_invoice exists as an available tool.
If the tool is NOT available, the MCP server is configured by the plugin via plugin.json but needs the REQUEST_MCP_TOKEN environment variable set in ~/.claude/settings.json.
Run the automatic token setup:
Check if the token already exists: read ~/.claude/settings.json and look for env.REQUEST_MCP_TOKEN
If it exists → the issue is something else (server down, plugin not enabled). Tell the user to check /mcp.
If it does NOT exist → ask the user using AskUserQuestion: "Para usar o MCP server request, preciso do teu token de autenticação. Qual é o token?"
Once the user provides the token, save it to ~/.claude/settings.json:
{} if it doesn't exist){"env": {"REQUEST_MCP_TOKEN": "<TOKEN>"}} into the existing JSON (preserve all other settings)Tell the user: "Token guardado! Reinicia o Claude Code para ativar (claude de novo neste terminal)."
Stop immediately — do NOT attempt any MCP operations until the user restarts.
Do NOT attempt to parse, create, update, or perform any operation without the MCP server running.
The token lives in ~/.claude/settings.json → env.REQUEST_MCP_TOKEN (used by the plugin MCP server automatically and for HTTP uploads).
Never use curl, $(), or python3 -c inline blobs — all trigger permission prompts in Claude Code. Always use the upload script.
If the token key does NOT exist in settings, run the token setup from the Pre-flight check section above.
The plugin includes a standalone upload script at scripts/upload.py (relative to the plugin root). Find it with:
find ~/.claude -path "*/supplier-invoice-service/scripts/upload.py" -print -quit 2>/dev/null
This script reads the token from settings automatically and uploads via Python urllib — no curl, no permission prompts.
The MCP server runs remotely. PDFs must be uploaded first via HTTP, then parsed via MCP tool.
python3 /path/to/supplier-invoice-service/scripts/upload.py /path/to/fatura.pdf
Output: fatura.pdf\t<file_id> (tab-separated filename and file_id).
parse_invoice(file_id="<file_id>")
NEVER use pdf_path — the server is remote and cannot access local files. Always use the upload → file_id flow.
/learn-document <path/to/file.pdf>
The ARGUMENTS passed to this skill contain the PDF file path.
Upload and parse the PDF first (upload → file_id → parse_invoice), then decide:
| Result | Action |
|---|---|
| No match | → Create workflow |
| Confidence < 1.0 | → Finetune workflow |
| Confidence = 1.0 but values wrong | → Finetune workflow |
| Confidence = 1.0, values correct | → Nothing to do |
| Operation | MCP Tool |
|---|---|
| Parse a PDF | parse_invoice(file_id) — upload first via HTTP |
| View parser source | get_parser_source(name) |
| Create new parser | create_parser(name, source) |
| Update existing parser | update_parser(name, source) |
| Disable a parser | disable_parser(name) |
| Re-enable a parser | enable_parser(name) |
Upload the PDF (upload script → file_id), then run parse_invoice(file_id=...). If no match, the raw extracted text is returned. Capture it.
Identify: supplier name, NIF/VAT, invoice number pattern, date format, period (if applicable), monetary values, currency, VAT exemption notes.
import re
from .base import InvoiceParser
class SupplierParser(InvoiceParser):
"""Parser determinístico para faturas de Supplier."""
NIF = "123456789"
def can_parse(self, text: str, filename: str = "") -> bool:
t = text.lower()
return "supplier keyword" in t or self.NIF in text.replace(" ", "")
def parse(self, text: str, filename: str = "") -> dict:
result = self.empty_result()
result["fornecedor"] = "Supplier"
result["nif_fornecedor"] = self.NIF
result["ficheiro"] = filename
result["moeda"] = "EUR"
warnings = []
# Invoice number
m = re.search(r"Fatura\s+n[.ºo°]\s*(\S+)", text)
if m:
result["numero"] = m.group(1)
else:
warnings.append("numero não encontrado")
# Date
m = re.search(r"Data[:\s]+(\d{2})[/.-](\d{2})[/.-](\d{4})", text)
if m:
result["data_emissao"] = f"{m.group(1)}-{m.group(2)}-{m.group(3)}"
else:
warnings.append("data_emissao não encontrada")
# Subtotal, IVA, Total — adapt regex to supplier layout
# ...
# Confidence
campos_chave = ["numero", "data_emissao", "subtotal", "total"]
preenchidos = sum(1 for c in campos_chave if result[c] is not None)
result["confidence"] = round(preenchidos / len(campos_chave), 2)
result["warnings"] = warnings
return result
Use create_parser(name, source). If parser already exists, use update_parser(name, source) (auto-archives previous version).
Upload and run parse_invoice(file_id=...) again. Verify JSON output.
MANDATORY: Show results and ask with AskUserQuestion:
If user rejects → ask what's wrong, fix, update, re-test, ask again.
parse_invoice(file_id=...) — note wrong/missing fieldsget_parser_source(name)pdftotext <file.pdf> - to debug regexupdate_parser(name, source)parse_invoice(file_id=...)1.234,56 and EN format 1,234.56 correctlyDD-MM-YYYY@staticmethod
def _parse_pt(val: str) -> float:
"""Parse PT format: 1.234,56 → float"""
if not val or not any(c.isdigit() for c in val):
return 0.0
return float(val.replace(".", "").replace(",", "."))
@staticmethod
def _parse_dot(val: str) -> float:
"""Parse EN format: 1,234.56 → float"""
return float(val.replace(",", ""))
Mixed format heuristic (OCR): dot + ≤2 digits after → EN decimal. >2 digits after dot → PT thousands separator.
can_parse — add common OCR variantstext.replace(" ", "") when matching NIFs (OCR inserts spaces)[\s\S]*? instead of .*? when label and value span multiple lines_parse_pt/_parse_dot for empty/non-digit valuessubtotal = total€ patterns on OCR receipts — can match capital socialsubtotal + iva) over OCR'd total{6, 13, 23}%subtotal = total / 1.23-layout flag or match values by pattern-). Ex: JUB5974F-000012335 → 000012335A.?T?\s*CUD[;:\s]+([A-Za-z0-9]+)\s*-\s*(\d+)_date_from_filename(filename) extracts from xxx-DDMMYYYY.pdfcampos_chave = ["numero", "data_emissao", "subtotal", "total"]
preenchidos = sum(1 for c in campos_chave if result[c] is not None)
result["confidence"] = round(preenchidos / len(campos_chave), 2)
npx claudepluginhub request-labs/request-plugins-marketplace --plugin supplier-invoice-serviceExtracts structured data from Japanese invoice images/PDFs using OCR, dual verification, and PDF text extraction. Outputs vendor, amounts, date, items in fixed format.
Parses complex documents with PaddleOCR to extract text, tables, formulas, charts, and layout structure. Use for invoices, academic papers, multi-column layouts, or any document needing structured understanding.
Organizes invoices, receipts, and financial documents from PDFs, images, scans, and emails into categorized folders by vendor, expense type, date, or tax category. Extracts vendor, date, amount, and description for renaming and CSV export.