From cheerio-consistency
Write, review, refactor, or debug Node.js code that parses HTML with Cheerio (cheerio.load, $ selections, scraping, server-side DOM manipulation) using one canonical idiom set. Use this skill whenever code extracts data from HTML strings in Node, builds a scraper on fetched pages, transforms markup server-side, or when the user hits "$(...).click is not a function", .map returning a cheerio object instead of an array, undefined attr results, ESM/CJS import confusion, or expects JavaScript on the page to run. Trigger it even when the user just says "parse this HTML in Node" or "get all the links from this page" — without saying the word "Cheerio."
How this skill is triggered — by the user, by Claude, or both
Slash command
/cheerio-consistency:cheerio-consistencyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Cheerio is stable, but it occupies a confusing niche: a jQuery-*like* API over a static
Cheerio is stable, but it occupies a confusing niche: a jQuery-like API over a static
parse tree, with no browser underneath. Generated code drifts across that boundary —
calling .click(), waiting for JS-rendered content, mishandling the raw elements that
each/map callbacks receive, and tripping over the 1.0 ESM/CJS import split. This skill
pins the canonical idiom set for Cheerio 1.x.
| Always | Never | Why |
|---|---|---|
import * as cheerio from "cheerio" then const $ = cheerio.load(html) | const $ = require("cheerio")(html) default-call style | There is no default export; load is the entry point in both module systems. |
| treat the tree as static data | .click(), .trigger(), waiting for content | No events, no rendering, no script execution exist; JS-rendered content needs a browser tool. |
rewrap in callbacks: $items.each((i, el) => $(el).find(...)) | calling cheerio methods on the bare el | Callbacks receive raw parse-tree nodes; only $(el) has the API. |
$items.map((i, el) => $(el).text()).get() | forgetting .get() (.toArray()) | .map returns a cheerio collection; .get() unwraps to a plain array. |
$el.attr("href") ?? fallback | assuming attr returns null | Missing attributes yield undefined; empty selections too. |
$el.text() for all matched text, $el.html() knowingly first-element-only inner HTML | confusing the two | .text() concatenates across the set; .html() reads only the first match. |
scope with $container.find("a") or $("a", container) | re-querying the whole doc inside loops | Context queries express intent and avoid cross-card bleed. |
$.html() to serialize the document | $("html").html() reconstruction | $.html() is the canonical full-document serializer. |
cheerio.load(xml, { xml: true }) for XML | parsing XML in HTML mode | HTML mode lowercases tags and "fixes" structure — XPath-ish queries silently miss. |
check $sel.length before single-element logic | $sel.first().text() === "" ambiguity | Empty selections don't throw; they return empty strings/undefined that flow onward. |
House style for a scrape:
import * as cheerio from "cheerio";
const $ = cheerio.load(html);
const products = $("ul.products > li.product")
.map((_, el) => {
const $card = $(el);
return {
name: $card.find("a.title").text().trim(),
url: $card.find("a.title").attr("href"),
price: $card.find(".price").text().trim(),
sku: $card.attr("data-sku"),
};
})
.get();
fetched markup lacks the data (SPA shells),
every selector "works" and returns nothing. Inspect the raw string first; switch to
Playwright/Puppeteer when the data is rendered client-side..map without .get() passes a cheerio object to JSON.stringify/array code —
often serializing as {} instead of erroring.each early exit is return false (jQuery convention), not break.:visible, :hidden don't exist
(no layout); :contains("text") is supported; case-sensitive in XML mode..text() includes <script>/<style> contents when present in the subtree —
remove them ($("script, style").remove()) before whole-page text extraction.new URL(href, baseUrl).href
before storing.& → &); when serializing back out, cheerio
re-encodes — don't double-decode with extra libraries..text() preserves source whitespace; .trim()/normalize per field,
or join structured pieces deliberately rather than splitting big text blobs.Target Cheerio 1.x stable (1.0+). The 1.0 release (after the long rc series)
reorganized entry points: named exports (load, fromURL), ESM+CJS dual support, parse5
default HTML parsing with htmlparser2 available via xml/options. Code from the
0.x/rc era (require("cheerio").load, default-import styles) mostly still runs in CJS —
but write the modern form. fromURL exists for convenience fetching; in scrapers prefer
explicit fetch + load so headers/errors are yours.
load once per document; select with CSS, scope per item card, rewrap raw elements..map(...).get()); resolve URLs; trim text fields.addClass, attr, remove, replaceWith) then
serialize with $.html().el method calls, missing .get(),
doc-wide queries in loops, XML parsed in HTML mode, unresolved relative URLs.For the selector-support matrix, traversal/manipulation API reference, and load-option
details, read references/cheerio-patterns.md.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub guidogl/cheerio-consistency --plugin cheerio-consistency