From foundation
User-viewable, user-invocable guidance for searching and reading Unstructured Foundation documents. Use when the user asks to find, filter, summarize, or retrieve processed files from connected sources.
How this skill is triggered — by the user, by Claude, or both
Slash command
/foundation:searchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is user-viewable and user-invocable. Use it as the retrieval workflow for natural-language requests about searchable documents in Unstructured Foundation.
This skill is user-viewable and user-invocable. Use it as the retrieval workflow for natural-language requests about searchable documents in Unstructured Foundation.
Keep this file safe for users to read. Prefer product-facing language, avoid exposing private implementation notes, and mention raw IDs only when they are needed for a developer/debugging request.
If the user asks whether files are ready, what is searchable, or why search has no results, call pipeline_processing_status before making a claim.
Map status to the next step:
no_sources: no sources are connected yet; suggest /foundation:connect.never_run: sources are connected, but files have not been processed yet.running: files are still being processed; include progress counts when available.failed: processing hit a problem; summarize the status without retrying automatically.ready: documents are searchable; continue with search or retrieval.When a search returns no useful matches, check readiness unless this was already done in the same flow. Do not tell the user that documents do not exist until the relevant processed data is ready.
asset_queryUse asset_query to find documents. Required parameters are:
text: the user's query.search_in: a list containing exactly one current search surface.Current search_in values:
document_summary: generated document summaries.document_title: document names or titles.document_text: full processed document text.topics: generated topic labels.ner: generated entity labels.Examples:
asset_query(text="quarterly revenue", search_in=["document_text"])
asset_query(text="vendor contracts", search_in=["document_summary"])
asset_query(text="Acme Corp", search_in=["ner"])
Choose the surface by intent:
document_text.document_title.document_summary.topics.ner.Query syntax:
vendor contracts.revenue AND forecast or contract OR agreement.acqui*."master services agreement".text="*" when the user wants filtered browsing rather than a keyword search, such as all PDFs from a data source or all documents modified since a date.Useful optional filters include lineage_data_source, mime_types, created_at, modified_at, first_seen_at, last_materialized_at, metadata_filters, limit, offset, and strict.
Use public language when explaining filters. Say "data source" to the user, but distinguish source type from source instance when choosing filters:
lineage_data_source scopes to a source lineage or connector type, such as Google Drive, Dropbox, Slack, S3, or another source type, when that exact lineage value is known.metadata_filters={"platform_workflow_id": "<source-id>"} scopes to one configured source instance. The <source-id> is the source ID returned by pipeline_list_sources and corresponds to that connector instance, not to the connector type.Filter guidance:
lineage_data_source for named source lineages or connector types, such as all Google Drive documents or all Slack documents, when the exact lineage value is known.mime_types for requested file types, such as PDFs, slides, spreadsheets, or plain text.modified_at when the user asks what changed, was updated, or was modified in a time range.created_at when the user asks what was added or created in a time range.first_seen_at when the user asks what Foundation first ingested during a time range.last_materialized_at when the user asks what Foundation refreshed, reprocessed, or made newly searchable during a time range.metadata_filters only when the user asks for a specific metadata field/value or when a previous tool response exposed the exact field/value to reuse.metadata_filters={"platform_workflow_id": "<source-id>"} to scope a search to a single specific connected source instance. The <source-id> is the source ID returned by pipeline_list_sources — the same value works directly as the platform_workflow_id metadata filter. This is an instance-level filter, not a type-level filter. Prefer it when the user has multiple sources of the same connector type, for example two different Google Drives, and wants just one of them. Use lineage_data_source when the user names a connector type broadly.text="*" plus filters for requests like "show all PDFs from Dropbox" or "what was modified since Monday" when no keyword is provided.asset_doc_idSearch results include asset_doc_id values such as adid:<uuid>. Use asset_doc_id for all document follow-up calls.
For full processed text:
asset_get_doc_text(asset_doc_id="adid:...")
For generated views:
asset_get_artifact(asset_doc_id="adid:...", artifact_kind="document_summary")
asset_get_artifact(asset_doc_id="adid:...", artifact_kind="topics")
asset_get_artifact(asset_doc_id="adid:...", artifact_kind="ner")
Use generated views when the user asks for a summary, topics, entities, or a quick understanding of a file. Use full text when the user asks detailed questions, wants evidence from a document, or needs content that may not appear in a generated view.
Do not teach or use alternate generated-view lookup modes. Use asset_doc_id from asset_query, then call asset_get_doc_text or asset_get_artifact.
For "search my documents for X":
pipeline_processing_status.asset_query(text=X, search_in=["document_text"]).asset_get_doc_text for top matches when the user needs the answer, not just search results.For "summarize this/the latest/the matching document":
asset_query to identify the document if no asset_doc_id is already known.asset_get_artifact(asset_doc_id=..., artifact_kind="document_summary").asset_get_doc_text and summarize from the text.For "what is searchable right now" or broad corpus counts:
pipeline_processing_status first.describe_corpus is available, use it for corpus-wide counts.describe_corpus(group_by="lineage_data_source") when the user asks for a connector/source-type breakdown or asks which source types have searchable documents.describe_corpus(group_by="platform_workflow_id") when the user asks for a breakdown by specific connected source instance. Map source IDs to source names with pipeline_list_sources when possible; share raw source IDs only when the user asks for IDs or debugging details.For "what changed since [date/time]":
pipeline_processing_status(since=...) first, using the user's date or time.asset_query with text="*" and the matching modified_at, created_at, first_seen_at, or last_materialized_at filter.For "summarize what changed since [date/time]":
pipeline_processing_status(since=...) first.asset_query(text="*", search_in=["document_summary"], modified_at=...), created_at=..., first_seen_at=..., or last_materialized_at=... according to the user's wording.asset_doc_id with asset_get_artifact(asset_doc_id=..., artifact_kind="document_summary") for concise follow-up retrieval.asset_get_doc_text(asset_doc_id=...) only when the summary is missing, too thin, or the user asks for evidence.For "what is searchable by data source":
pipeline_processing_status first.describe_corpus(group_by="lineage_data_source").pipeline_list_sources and describe_corpus(group_by="platform_workflow_id"); map source IDs to source names when explaining results.For "search only [source type]" where the user names a connector/source type such as Google Drive, Slack, Dropbox, or S3:
asset_query with lineage_data_source set to the requested source value.For "search only this specific source" where the user means one configured source instance, not a connector/source type:
pipeline_list_sources to identify the source the user means and obtain its source ID.asset_query with metadata_filters={"platform_workflow_id": "<source-id>"}, combined with a real text query or text="*" for filtered browsing.Prefer public terms: Unstructured Foundation, sources, connected sources, data source, processed files, searchable documents, document text, summary, topics, entities, generated view.
Avoid exposing private implementation notes. It is okay to mention exact MCP tool names, parameters, or returned fields when answering a developer/integrator question or when the user asks how the retrieval workflow works.
npx claudepluginhub unstructured-io/unstructured-foundation-marketplace --plugin foundationCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.