Extracts structured FF&E product specs from PDFs like price books, fact sheets, and spec sheets using PyMuPDF, structuring into standardized Google Sheets, CSV, or markdown schedules.
How this skill is triggered — by the user, by Claude, or both
Slash command
/06-materials-research:product-spec-pdf-parserThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract structured FF&E data from product PDF files — price books, fact sheets, configurator sheets, and spec sheets. Uses PyMuPDF for text extraction and Claude's reasoning to parse wildly varying PDF layouts into a standardized schedule.
Extract structured FF&E data from product PDF files — price books, fact sheets, configurator sheets, and spec sheets. Uses PyMuPDF for text extraction and Claude's reasoning to parse wildly varying PDF layouts into a standardized schedule.
The user provides PDFs in one of these ways:
.pdf files)Also ask (or use defaults):
expand (one row per variant/SKU, default) or summarize (comma-separated variants in one row)Products are written to the master Google Sheet — the same 33-column schema used by all product skills, plus PDF-specific extra columns. When writing to CSV, use the same column order.
Read ../../schema/product-schema.md (relative to this SKILL.md) for the full column reference, field formats, and category vocabulary. Read ../../schema/sheet-conventions.md for CRUD patterns with MCP tools.
Skill-specific column values:
pdf-parsersavedPDFs contain fields that don't have dedicated master columns. Append these to Notes using | as delimiter:
Variant: Diamond, BlackPrice adder: +$130 (PostureFit SL)Origin: SwedenSource: alphabeta-fact-sheet.pdfExample Notes cell: Variant: Diamond, Black | Origin: Sweden | Source: alphabeta-fact-sheet.pdf
Different PDF types require different approaches:
expand vs summarize modeParse the user's input to identify PDF file(s) and output preferences.
.pdf files and report countexpand unless the user says otherwiseUse PyMuPDF (fitz) to extract text from each PDF. Run this Python script via Bash:
import fitz
import sys
import json
pdf_path = sys.argv[1]
doc = fitz.open(pdf_path)
pages = []
for i, page in enumerate(doc):
text = page.get_text()
pages.append({"page": i + 1, "text": text})
doc.close()
print(json.dumps({"filename": pdf_path.split("/")[-1], "total_pages": len(pages), "pages": pages}))
For each PDF, extract all pages and save the JSON output.
Read the extracted text and identify all products, variants, and specifications. This is the core intelligence step — Claude reasons over the text to structure it.
For small PDFs (≤20 pages): Process all pages at once.
For large PDFs (>20 pages): Process in chunks of 10 pages at a time. After each chunk:
Parsing instructions:
Show a summary markdown table with the parsed products. Include:
Ask: "Does this look correct? Should I adjust anything before saving?"
Ask the user (if not already specified): "Where should I save this?"
Options:
./ffe-pdf-parse-YYYY-MM-DD.csv)When saving to CSV, use the CSV header from ../../schema/product-schema.md.
Append rows to the master Google Sheet using the same 33-column schema. Set Clipped At to current timestamp and Source to pdf-parser. PDF-specific data (variant, price adder, country of origin, source filename) goes in the Notes column.
After processing, always report:
Parsed: X products from Y PDF(s)
- filename.pdf: N products extracted
- filename2.pdf: M products extracted
Issues: [list any problems]
npx claudepluginhub alpacalabsllc/skills-for-architects --plugin 06-materials-researchBulk fetches FF&E product specs from URL lists or files/Google Sheets, extracting name, brand, dimensions, materials, price, images into a standardized 33-column schema for design/procurement.
Generates a structured extraction plan and spreadsheet template for extracting tabular data from PDFs, specifying column headers, data types, and common pitfalls.
Parses EPD PDF documents to extract structured environmental impact data—GWP, life cycle stages, certifications, compliance metrics—into 42-column schema for Sheets/CSV.