From firecrawl-pack
Scrape single pages or crawl sites into LLM-ready markdown via Firecrawl JS library. Handles sync/async jobs, depth limits, path filters, JS rendering.
How this skill is triggered — by the user, by Claude, or both
Slash command
/firecrawl-pack:firecrawl-core-workflow-aThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with `scrapeUrl`, multi-page crawling with `crawlUrl`, async crawl jobs with polling, and content processing pipelines.
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with scrapeUrl, multi-page crawling with crawlUrl, async crawl jobs with polling, and content processing pipelines.
@mendable/firecrawl-js installedFIRECRAWL_API_KEY environment variable setimport FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape a single page to clean markdown
const result = await firecrawl.scrapeUrl("https://docs.example.com/api", {
formats: ["markdown"],
onlyMainContent: true, // strips nav, footer, sidebars
waitFor: 2000, // wait 2s for JS to render
});
if (result.success) {
console.log("Title:", result.metadata?.title);
console.log("Source:", result.metadata?.sourceURL);
console.log("Markdown:", result.markdown?.substring(0, 200));
}
// Crawl a site — Firecrawl follows links, renders JS, returns all pages
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 50, // max pages to crawl
maxDepth: 3, // link depth from start URL
includePaths: ["/docs/*", "/api/*"], // only these paths
excludePaths: ["/blog/*", "/changelog/*"],
allowBackwardLinks: false, // only crawl child paths
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
console.log(` ${page.metadata?.sourceURL}: ${page.markdown?.length} chars`);
}
// Start an async crawl job — returns immediately with job ID
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
console.log(`Crawl started: ${job.id}`);
// Poll for completion with backoff
let pollInterval = 2000;
let status = await firecrawl.checkCrawlStatus(job.id);
while (status.status === "scraping") {
console.log(`Progress: ${status.completed}/${status.total} pages`);
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000);
status = await firecrawl.checkCrawlStatus(job.id);
}
if (status.status === "completed") {
console.log(`Done: ${status.data?.length} pages scraped`);
} else {
console.error("Crawl failed:", status.error);
}
import { writeFileSync, mkdirSync } from "fs";
function processResults(pages: any[], outputDir: string) {
mkdirSync(outputDir, { recursive: true });
const manifest = pages.map((page, i) => {
const url = page.metadata?.sourceURL || `page-${i}`;
const slug = new URL(url).pathname
.replace(/\//g, "_")
.replace(/^_|_$/g, "") || "index";
const filename = `${slug}.md`;
// Clean markdown: collapse whitespace, remove JS links
const content = (page.markdown || "")
.replace(/\n{3,}/g, "\n\n")
.replace(/\[.*?\]\(javascript:.*?\)/g, "")
.trim();
writeFileSync(`${outputDir}/${filename}`, content);
return { url, filename, chars: content.length };
});
writeFileSync(`${outputDir}/manifest.json`, JSON.stringify(manifest, null, 2));
return manifest;
}
manifest.json with URL-to-file mapping| Error | Cause | Solution |
|---|---|---|
Empty markdown | JS content not rendered | Increase waitFor to 5000ms |
429 Too Many Requests | Rate limit hit | Back off, reduce concurrency |
| Crawl returns few pages | URL filters too strict | Widen includePaths patterns |
402 Payment Required | Credits exhausted | Check balance, reduce limit |
| Partial crawl results | Site blocks bot on some pages | Use scrapeUrl for failed URLs individually |
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown", "html", "links"],
onlyMainContent: true,
});
console.log("Markdown:", result.markdown?.length);
console.log("HTML:", result.html?.length);
console.log("Links:", result.links?.length);
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 100,
scrapeOptions: { formats: ["markdown"] },
webhook: {
url: "https://api.yourapp.com/webhooks/firecrawl",
events: ["completed", "page"],
},
});
console.log(`Crawl ${job.id} started — webhook will fire on completion`);
For structured data extraction, see firecrawl-core-workflow-b.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin firecrawl-packScrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.
Scrapes URLs to markdown/HTML/JSON, crawls websites for multi-page extraction, searches the web, maps sites, and extracts structured data using Firecrawl MCP tools.
Scrapes web pages and websites using Firecrawl API, converting to clean markdown. Handles JavaScript rendering, anti-bot protection, paywalled content, and dynamic sites for articles, blogs, docs.