From data-liberation
Front door for website migration: detects platform, inventories content, then asks the user which reconstruct path to take (blocks+products or theme replication) before extracting and dispatching the matching sub-skill.
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-liberation:liberateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The single front door for the whole migration pipeline. It captures the site **once** (detect → discover → extract → capture → products), then asks which reconstruct path to take and **dispatches the matching sub-skill inline** (shared context):
The single front door for the whole migration pipeline. It captures the site once (detect → discover → extract → capture → products), then asks which reconstruct path to take and dispatches the matching sub-skill inline (shared context):
replicate-with-blocks — project the source onto editable WordPress core blocks + WooCommerce. Best launchpad for a redesign.replicate-theme — carry the source markup near-verbatim + scope its own CSS into a high-fidelity, non-block-editable theme.Each sub-skill owns its own reconstruct → install → QA → report; this skill owns capture + the path decision. Idempotent: re-running /liberate <url> on an already-captured site skips straight to the path question (so you can try the other path later with zero re-capture).
Headless extraction-only (CI/batch): data-liberation <url> runs steps 1–5 (capture only). The reconstruct (blocks or theme) is agent-only, via the dispatched sub-skill.
/liberate <url> ── front door, shared context ────────────────────────────────
│
├─ idempotent check: extraction already on disk? (.discovery-complete / session.json stage / output.wxr + html/* + manifest.json)
│ ├─ YES → load cached inventory ──────────────────────────────────────┐
│ └─ NO → 1 detect → discover platform · sitemap · features · archetype inventory (CHEAP)
│ │
│ ┌────────────────────────────────────────────────────────────────────────┘
│ ▼
├─ CONFIRM + PATH CHECKPOINT ◀── MANDATORY HARD STOP, BEFORE EXTRACTION — never skip, never auto-select
│ show discovery inventory + scope/cost estimate + a platform-informed recommendation,
│ then ALWAYS AskUserQuestion: blocks+products vs theme replication.
│ The operator's answer is the ONLY thing that authorizes the rest of the run.
│ Nothing expensive (extract/capture) runs until they have answered.
│ ▼
├─ EXTRACTION (deterministic — only after the path is chosen):
│ 2 extract pages/posts/products content + media refs
│ 3 media: dedup+upload → uploaded WP-library URLs (reused downstream)
│ 4 capture desktop+mobile screenshots · palette/type/breakpoints · html/<slug>.html
│ 5 products → products.csv WooCommerce import format
│ ▼
└─ RECONSTRUCT — dispatch the chosen sub-skill INLINE (shared context):
blocks+products → replicate-with-blocks (core blocks + WooCommerce + QA ladder → run-report.json)
theme → replicate-theme (carry-and-scope islands + scoped CSS → compare → run-report-carry.json)
Capture is still shared across both paths, so the "try the other path with zero re-capture" property holds: re-running /liberate <url> after a full run hits the idempotent check, loads the cached inventory, and lands you straight back on the path question.
Each sub-skill owns its own reconstruct → install → QA → report, plus its own budget guard (checkBudget in src/lib/replicate/budget-guard.ts) and run-report (buildRunReport in src/lib/replicate/run-report.ts). This skill's deliverable is the captured output directory + the dispatch; the chosen sub-skill produces the replica + its run-report*.json.
Call liberate_paths({ url }) to resolve the output directory (siteDir). Do not hardcode output/<site>/ relative to cwd — the default output base is now ~/Studio/_liberations/<host>. If extraction is already complete — any of .discovery-complete, a session.json stage past extraction, or all of output.wxr + html/*.html + screenshots/manifest.json present in the resolved siteDir — skip Steps 1 and 3–6, load the cached inventory (session.json / discovery output), and jump straight to the Step 2 — Confirm + path checkpoint below. Otherwise run Step 1, hit the checkpoint, then run Steps 3–6. (For a partial capture, prefer resume: true; see Resuming.)
liberate_detect to identify the platform.liberate_discover to inventory the site. Show the counts and platform features to the operator.
platformFeatures flags: stores, bookings, forms, members areas, scheduling, forums, events.transferable: true (e.g. stores) are handled during extraction.transferable: false include a wpRecommendation (suggested WP plugin).This checkpoint is NOT optional and NOT skippable, and it fires RIGHT HERE — immediately after discovery, before extraction/capture. You ask while the operator is still present and paying attention; that is the whole point of placing it before the long deterministic extraction, not after. You MUST stop and ask the operator to choose the reconstruct path via AskUserQuestion before running Step 3 (extract) or anything downstream. There is no "default path." You do not auto-select on the operator's behalf, no matter how strong the inventory signal is — a recommendation is a hint inside the question, never a decision. Starting extraction (or dispatching a sub-skill) without having asked this question is a defect. The only thing that authorizes the rest of the run is the operator's answer to this AskUserQuestion.
Red flags — if you catch yourself thinking any of these, STOP and ask:
Show the discovery inventory (pages · archetypes · products · platform features) and a scope/cost/time estimate. Make a platform-informed recommendation (mark it (Recommended) as the first option), then call AskUserQuestion with these two options:
Recommendation examples: a store with many products → recommend (1); a fixed-layout Wix marketing site, no store, where pixel-fidelity matters → recommend (2). The operator's selection is the sole go-ahead (this replaces the old proceed/confirm gate). Only after they answer do you run Steps 3–6 (extraction/capture) and then Dispatch.
Call liberate_extract with an appropriate outputDir. Narrate per-URL progress.
0 pages: "No extractable pages found at <url>. The site may be behind auth or bot-protection — try CDP/admin extraction (/diagnose)." Stop.
Media references are deduped and uploaded to the WP media library. Uploaded URLs are the canonical media references used everywhere downstream (specs, templates, post_content).
Runs automatically during or after extract: desktop+mobile screenshots, palette.json / typography.json / breakpoints.json, and html/<slug>.html per URL. Clustering in the blocks path runs off the already-saved html/<slug>.html — no re-navigation.
Capture scope depends on the path you just chose (a second reason the checkpoint fires before capture, not after):
html/<slug>.html and live-fetches the rest on a cache miss, so partial capture is fine. Sample across archetypes (homepage + a few of each kind), not the first N sitemap URLs — and make sure the homepage is in the sample.html/<slug>.html, so capture full HTML for every page in the carry set up front. A sample forces a re-capture once replicate-theme runs. For a blog-dominant site where the theme replica carries only the custom/marketing pages (posts/news staying native), capture those custom pages in full and skip per-post capture — but you must know that before capturing, which is why the path is chosen first.Default concurrency: 6. Configure via --screenshots-concurrency N or --concurrency N.
If products were extracted, compile products.jsonl → products.csv (WooCommerce import format). Report: "Also extracted N products → products.csv."
Both reconstruct sub-skills are disable-model-invocation: true by design — they only ever run from this front door (post-capture), never spontaneously. That means the Skill tool cannot invoke them: a Skill({ skill: 'replicate-theme' }) call is rejected with cannot be used with Skill tool due to disable-model-invocation. So dispatch = Read the chosen sub-skill's SKILL.md and execute its workflow inline in this same shared context (each sub-skill reads the resolved output directory from disk — use the siteDir returned by liberate_paths — and owns its own install → QA → report):
skills/replicate-with-blocks/SKILL.mdskills/replicate-theme/SKILL.mdThe reconstruct phase (clustering, foundations, theme, build, validate, install, visual-QA for blocks; carry-and-scope + compare for theme) lives entirely in the dispatched sub-skill — this front door ends here.
| Stage | State | Response |
|---|---|---|
| extraction | 0 pages | Stop + "No extractable pages found at <url>. Try CDP/admin extraction (/diagnose)." |
| extraction | adapter fail | Log + pointer to /diagnose |
| checkpoint | operator picks a path | Dispatch the chosen sub-skill (replicate-with-blocks / replicate-theme) inline |
| reconstruct | gate fail · clusters failed · QA divergence · budget ceiling | Owned by the dispatched sub-skill — see its SKILL.md (replicate-with-blocks's validate-artifacts + QA-ladder gates + budget guard; replicate-theme's parity compare) |
Progress is the agent's own narration — no Ink TUI in agent mode. The headless extraction CLI keeps its existing Ink surfaces (discover.tsx, screenshot.tsx).
Each reconstruct path emits its own report — run-report.json from replicate-with-blocks, run-report-carry.json from replicate-theme (each carries a mode field). The blocks-path run-report.json is verdict-first; read top-down to answer "is this good?":
verdict — overall ✓ / ⚠ / ✗ + per-archetype.summary — clusters built/failed · pages composed/misfit · responsive pass/fail · sections divergent/accepted · pages unverified · provenance flags · fallback/low-confidence pages · est. cost/usage.details[] — per-cluster + per-page status, gate results, QA notes, operator-accepted divergences (with proof).The theme-path run-report-carry.json is parity-compare shaped — see replicate-theme.
If the user asks to resume (e.g. "resume", "continue", "it crashed"):
outputDir is derived from it.liberate_extract with resume: true for extraction; session.json tracks stage so capture resumes where it stopped. Reconstruct resume (per-cluster build status, etc.) is the chosen sub-skill's concern..discovery-complete exists), skip straight to the Step 2 — Confirm + path checkpoint (the idempotent path) and offer to run a reconstruct path. If only discovery completed and the run stopped before the checkpoint, re-run discovery (cheap) and ask the path question — extraction must not start until the operator has chosen.The resume flag causes extraction to:
extraction-log.jsonl)These guarantees are enforced by the reconstruct sub-skills (mainly replicate-with-blocks's validate-artifacts + QA gates; alt-text + copyright apply to both paths):
columns/group) and is flagged in the run-report — never silently forced into a wrong-specific template.compose-page-blocks are labeled low-confidence / fallback in the run-report.style.css "Benchmark reference only — not for publication." header.If you encounter something notable during extraction — a new API endpoint, a platform quirk, a workaround for blocked content, a better extraction technique — add an entry to DISCOVERIES.md at the top of the repo.
After extraction completes, always run liberate_verify on the output directory. This checks:
Report the verification results and flag anything that needs attention before importing.
Squarespace sites benefit significantly from admin extraction via CDP. Without it, you only get public content — no drafts, no unlisted pages, and Squarespace 7.1 fluid engine sites often return empty content from the ?format=json API.
Guide the user through admin setup:
google-chrome --remote-debugging-port=9222
(On macOS: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222)--cdp-port 9222 (CLI) or cdpPort: 9222 (MCP).The admin session gives the adapter access to:
__NEXT_DATA__ hydration payloads on 7.1 sitesAlways offer CDP-based extraction for Squarespace. Public-only extraction works but produces lower quality results.
Extraction uses Playwright (headless browser) to intercept Wix's internal API calls and extract window globals. This is slower but captures content that isn't available via HTTP alone. Large sites may take several minutes.
Webflow requires a Webflow API token. Ask the user for their token and pass it via --token (CLI) or the token parameter (MCP).
Shopify has two extraction tiers. Always offer the richer one first and fall back only if the user can't produce an Admin API token.
Tier 1 — Public JSON API (no credentials)
Works for any public Shopify storefront. Pulls pages, blog posts, and products via the public /pages.json, /blogs.json, and /products.json endpoints plus HTML fallback for theme-rendered content. No token needed. Product data is limited to what the public API exposes — you lose compareAtPrice sale semantics, real stock policy, cost of goods, variant images, and collections.
Tier 2 — Admin GraphQL (richer product data)
When the user has admin access to their store, offer to use Shopify's Admin GraphQL API. This yields:
compareAtPrice → proper sale/regular price mapping on simple + variable productsinventoryPolicy + inventoryItem.tracked → real stock status (oversell-aware)inventoryItem.unitCost → cost of goods written to meta:_wc_cog_costinventoryItem.measurement.weight → unit-normalized weight (kg)meta:_yoast_wpseo_title / _yoast_wpseo_metadesc)Guide the user through admin setup:
read_products (required)read_inventory (for cost-of-goods + stock)read_online_store_pages / read_online_store_navigation (for pages)read_content (for blog articles)adminToken (MCP) or via the adapter opts. You do not need to ask the user for the shop domain — liberate_discover auto-detects the *.myshopify.com hostname from the storefront HTML (Shopify.shop JS global) and stores it as inventory.shopDomain, even for sites served on custom domains.When to use which tier:
shop.brand.com) → Tier 2 still works because of auto-detection; do NOT ask them for the myshopify.com subdomain manually unless the detector failedIf liberate_discover did not populate inventory.shopDomain (rare — the site may be behind Cloudflare or heavy bot protection that blocks HTML fetch), ask the user directly:
"I couldn't auto-detect the myshopify.com subdomain. Can you paste the URL you see when you log into your Shopify admin? It looks like https://admin.shopify.com/store/<name> — the <name> is what I need."
Pass the admin-resolved value as shopDomain alongside adminToken.
GraphQL failures fall back to Tier 1 automatically — if the token is wrong or the scopes are insufficient, the adapter logs a warning and continues with the public JSON path, so the user's extraction still produces output.
Public-crawl adapter for GoDaddy's legacy Websites & Marketing platform (also called "Go Daddy Website Builder" in page sources). Not to be confused with the newer Airo AI Builder.
GoDaddy offers no data export from W+M — this adapter rescues content by crawling the public site. Detection looks for the Go Daddy Website Builder generator meta tag, the img1.wsimg.com/isteam/ CDN pattern, and the X-SiteId header.
Discovery fetches the three standard W+M sub-sitemaps individually so blog posts can be tagged precisely (W+M's /news,-updates/f/<slug> URL shape doesn't match the generic classifier):
sitemap.website.xml — pagessitemap.blog.xml — blog postssitemap.ols.xml — products (v1.1, not yet implemented)Blog post bodies are hydrated client-side from a window._BLOG_DATA JSON blob. The adapter parses this blob and converts the Draft.js ContentState (post.fullContent) into HTML — preserving paragraphs, headings, lists, blockquotes, code blocks, links, and images. Title, publish date, categories, and featured image are also pulled from _BLOG_DATA rather than HTML meta tags (higher fidelity).
Pages use DOM-based extraction: strip HEADER_SECTION, FOOTER_*, cookie banners, and the first-section title/image widgets (*_SECTION_TITLE_RENDERED, *_IMAGE_RENDERED0) which would otherwise duplicate the <wp:post_title> and media attachment.
v1 limitations: No GoDaddy Online Store (OLS) product extraction yet — sites with a store are flagged, but products need a real store URL for testing before v1.1 ships.
products.csv (WooCommerce format) and products.jsonl are also produced.liberate_extract's contentStatus default ('draft'). When building a replica/preview (the design phase — a Studio replica whose nav must resolve), pass contentStatus: 'publish' to liberate_extract/liberate_extract_one so imported pages/posts are live instead of 404ing. Attachments always use WP's inherit regardless.importAuthors: true to create WP user accounts per author, or importAuthors: false (default) to assign all content to the authenticated user. Ask before importing.liberate_setup first, then call liberate_import with REST API credentials. If the environment provides an import skill (e.g. import-liberated-data), use delegate: true with both liberate_setup and liberate_import.npx claudepluginhub automattic/data-liberation-agent --plugin data-liberationExports WordPress pages, posts, and custom posts to local portable packages with builder data, media, and markdown. Imports to another site with ID remapping and auto-backup before edits.
Rebuilds a WordPress block theme from an extracted site's HTML, preserving layout, structure, and visual design via spec-driven reconstruction and pattern generation.
Executes Webflow CMS migrations from WordPress, Contentful, Strapi, CSV/JSON using Data API v2 bulk endpoints, data mapping, validation, and strangler fig pattern.