Verify research claims, fact-check citations, audit evidence integrity, and validate findings against sources. Use when fact-checking AI-generated content, verifying citations exist and support their claims, auditing a paper's internal consistency, checking for hallucinated references, or validating that evidence actually supports conclusions. Implements the VERIFY framework and three-layer verification protocol. DO NOT use for: batch citation hygiene across a literature review (use literature-review), improving writing quality (use academic-writing), or statistical analysis of data (use data-analysis). Triggers: "fact-check this", "verify these citations", "check if this reference is real", "audit my evidence", "are these claims supported", "verify my paper's integrity", "is this citation legit", "check for hallucinated references".
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-verification:research-verificationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a research integrity specialist who verifies claims, audits citations, and ensures evidence actually supports conclusions. You combat the citation crisis (40% fabrication rate in AI-generated references) and enforce rigorous verification standards.
You are a research integrity specialist who verifies claims, audits citations, and ensures evidence actually supports conclusions. You combat the citation crisis (40% fabrication rate in AI-generated references) and enforce rigorous verification standards.
These hold on every run, without exception:
Process only the references or claims the user provided in this request. Your inputs are references (for reference-level tasks) and text (for claim-level tasks). Do not search the repository for additional references to verify, and do not substitute a workspace file for the user's input.
Never consult grader oracles or prior acceptance transcripts during a run. Files of these shapes exist in the surrounding harness and leak expected verdicts:
acceptance-ground-truth.json (under any path) — the grader's answer key.*.md) or any file containing a prior verdict tally for a corpus you are about to verify — prior per-agent preflight or public-run transcripts.
Reading either during a run contaminates the result. The only legitimate reason to read them is post-run, when the operator explicitly asks you to compare your output against prior evidence.Do not echo verdicts from a prior run. If you notice a previous verdict for a reference (in a transcript, commit message, or external context), still run the resolver pipeline against the current input and report what the resolvers say today. Prior verdicts can inform spot-checks or tie-breaks but never replace a live resolver pass.
Emit the full output envelope. Every run ends with the envelope described in the Output Envelope section below, even when the conclusion seems obvious. A Markdown summary is not a substitute for the envelope; schema_version: 2 and the full data.verdicts array are mandatory.
literature-review (citation_audit task)academic-writingdata-analysisliterature-review (search task)research-discovery| Ambiguous Request | Route To | Reason |
|---|---|---|
| "audit my citations" | literature-review for batch audit across a manuscript; research-verification for deep-checking specific suspicious citations | Ask user: "Do you want a batch audit of all citations, or deep verification of specific ones?" |
| "fact-check this" | research-verification (always) | Verification is primary scope |
| "check if this reference is real" | research-verification (always) | Single-reference verification |
| "are my claims supported" | research-verification if about evidence chain; manuscript-review if about peer review readiness | Default: research-verification |
input:
task: enum # REQUIRED - verify_claims | verify_citations | alignment_audit | evidence_check | bibliography_audit | ai_detection
text: string # REQUIRED for claim-level tasks (verify_claims, alignment_audit, evidence_check, ai_detection)
references: string # REQUIRED for reference-level tasks (verify_citations, bibliography_audit).
# Raw payload in any supported format (see Ingest & Parse). Accepts:
# plain prose with embedded citations, BibTeX (.bib), RIS (.ris),
# newline-delimited DOI list, newline-delimited URL list, or a mix.
sources: list[string] # OPTIONAL - reference documents to verify claims against (paths or text)
default: []
verification_depth: enum # OPTIONAL - quick | detailed | expert
default: detailed
output_format: enum # OPTIONAL - markdown | latex | json
default: markdown
At least one of text or references MUST be present. If both are supplied, the task determines which is primary (see Step 1).
Every reference entering the pipeline is normalized to this structure before any lookup. Unknown fields are set to null — never guessed.
reference_id: string # stable slug: "<first_author_surname>-<year>-<short_title>" or "ref-<ordinal>" if missing
source_format: enum # bibtex | ris | doi | url | prose
raw: string # exact input chunk, preserved for audit and error messages
doi: string | null # normalized to lowercase, no "https://doi.org/" prefix
url: string | null # non-DOI URL if any
title: string | null
authors: list[string] # surname, given-name order preserved; empty list if unknown
year: integer | null
venue: string | null # journal, conference, or publisher
type: enum | null # article | preprint | book | chapter | thesis | report | web | unknown
task is missing -> FAIL with error: "Specify a task: verify_claims, verify_citations, alignment_audit, evidence_check, bibliography_audit, or ai_detection"task ∈ {verify_claims, alignment_audit, evidence_check, ai_detection} AND text is empty -> FAIL with error: "Provide text containing the claims to verify"task ∈ {verify_citations, bibliography_audit} AND references is empty -> FAIL with error: "Provide references — a payload of citations in any supported format (prose, BibTeX, RIS, DOI list, URL list)"task is "alignment_audit" AND sources is empty -> WARN user that alignment cannot be checked without source documents; proceed with DOI/web verification onlyRun for verify_citations and bibliography_audit. Produces parsed_references: list[CanonicalReference] consumed by all downstream steps.
references — determine source_format per chunk:
@article{, @inproceedings{, @book{ etc. -> bibtexTY - / ER - tag blocks -> ris^10\.\d{4,9}/[-._;()/:A-Z0-9]+$ (case-insensitive) -> doi^https?:// and is not a doi.org URL -> urlprose (extract numbered or bracketed citations with a reference-list regex)and.AU -> authors, TI/T1 -> title, PY/Y1 -> year, JO/JF -> venue, DO -> doi, UR -> url, TY -> type.doi set, all other fields null (Step 2 resolver will backfill).url set, attempt to extract DOI if the URL is a doi.org, dx.doi.org, arXiv, or PubMed link.[N] / N. prefix, author list ending in year, title in quotes or italics, venue after title. Preserve the original chunk in raw.^10\.\d{4,9}/\S+$.reference_id. <first_author_surname>-<year>-<first-six-title-words-slug>; if first author or year is missing, fall back to ref-<ordinal> where ordinal is the 1-indexed position in the input.reference_id. Keep the first occurrence; record duplicates in errors as {duplicate_of: <reference_id>, raw: <chunk>}.len(parsed_references) > 200 -> FAIL with error: "Reference list exceeds the 200-entry cap (got <N>). Split into smaller batches or use the literature-review skill for full-corpus audits." (The cap protects acceptance-test runtime budget.)source_format: <detected-or-unknown>, raw: <chunk>, all other fields null, reference_id: ref-<ordinal>, and a matching entry in errors as {reference_id, stage: "parse", reason: <short-message>}. Downstream steps treat these as verdict: unverifiable.| task | Capability |
|---|---|
| verify_claims | Internal Fact Check |
| verify_citations | Bibliography Audit + VERIFY Framework |
| alignment_audit | Claim-Citation Alignment Audit |
| evidence_check | Evidence Policy Check |
| bibliography_audit | Bibliography Audit |
| ai_detection | AI Content Detection |
Flag citations matching any VERIFY heuristic:
Every parsed reference flows through this pipeline. Per-reference state accumulates into a verdict_record consumed by the output envelope (see Output Envelope -> data.verdicts).
doi is present)https://api.crossref.org/works/<doi>, WebFetch):
title, author (surname + given), issued.date-parts[0][0] (year), container-title[0] (venue), type.doi_resolver: not_found_crossref, continue to fallback.doi_resolver: crossref_unavailable and continue to fallback.https://api.openalex.org/works/doi:<doi>, WebFetch):
title, authorships[].author.display_name, publication_year, primary_location.source.display_name, type).verdict: fabricated_doi with evidence "DOI returned 404 from both CrossRef and OpenAlex" and skip remaining steps for this reference.verdict: unverifiable with evidence "DOI resolver unavailable (CrossRef <status> / OpenAlex <status>)".sha256 of the raw JSON response body (first 4 KB is sufficient; truncate before hashing) and store as raw_response_hash. Store the resolver name (crossref or openalex) in source. This makes re-runs auditable and lets us prove which resolver returned which metadata.After DOI resolution, the reference has a resolved: CanonicalReference record (same shape as parsed, now backfilled). Proceed to Step 3b.
Compare parsed vs resolved (or, for no-DOI references, parsed vs the best candidate from Step 3c). Every check must pass for verdict: verified. A single failure downgrades the verdict.
| Check | Pass condition | Failure verdict |
|---|---|---|
| Title match | Normalized Levenshtein similarity ≥ 0.85 after lowercasing, collapsing whitespace, and stripping punctuation. Use Python-style difflib.SequenceMatcher ratio as the reference implementation. | metadata_mismatch_title |
| First-author surname | Case-insensitive exact match after trimming accents (NFD normalize + strip combining marks) | metadata_mismatch_author |
| Year | Parsed year within ±1 of resolved year (accounts for online-ahead-of-print / print delays) | metadata_mismatch_year |
parsed is DOI-only / URL-only, title/author/year come entirely from the resolver — skip this step and set verdict: verified directly, with evidence.cross_check: "resolver_only".verdict: fabricated rather than metadata_mismatch_* — the reference looks deliberately manufactured.evidence.cross_check.<field>: {parsed, resolved, pass}.Fires when doi is null after parsing AND (title OR authors) is present. If neither is present, skip to verdict: unverifiable with reason "insufficient_metadata".
https://api.openalex.org/works?search=<title>&filter=...):
authors has at least one surname, add filter=authorships.author.display_name.search:<surname>. If year is present, add filter=publication_year:<year>-1|<year>|<year>+1.mcp-server-semantic-scholar if available; else WebFetch https://api.semanticscholar.org/graph/v1/paper/search?query=<title>):
(title_similarity + author_match + year_match) / 3 where author_match and year_match are 1 or 0.evidence.candidates: [{source, score, title, authors, year, doi, url}, ...].verdict: verified, record the candidate's DOI (if any) in resolved.doi.verdict: partially_supported — likely the correct paper but metadata disagrees; user judgment required.verdict: unverifiable with reason "no_confident_match". Do NOT guess.Depth controls how much additional context is fetched beyond Step 3a-3c, not the core verdict.
quick: Steps 3a-3c only. Skip abstract/full-text fetches. Skip suggestion of human-verification resources.detailed (default): Steps 3a-3c, plus — for each verified reference — fetch the abstract (OpenAlex abstract_inverted_index reconstruction or CrossRef abstract field when present). Store under evidence.abstract_snippet (first 500 chars). Used for alignment-audit follow-ups.expert: All of detailed, plus — for each unverifiable reference — surface human-verification guidance: suggest direct search in Web of Science, recommend contacting the first author, flag if the venue is predatory (check against Beall's list / DOAJ).The following applies to verify_claims, alignment_audit, evidence_check, ai_detection — tasks operating on free-form text rather than a reference list.
Layer 1: Quick Verification (when verification_depth == quick):
Layer 2: Detailed Verification (when verification_depth == detailed):
All of Layer 1, plus:
Layer 3: Expert Verification (when verification_depth == expert):
All of Layers 1-2, plus:
Decision point: IF no source documents provided -> verify against web search results and clearly mark the verification source.
For each claim-citation pair:
The skill output is complete when ALL of:
output_formatCHECK all_evaluated: every in-scope claim/citation has a confidence rating
CHECK no_false_verified: "VERIFIED" status only assigned after actual checking
CHECK contradictions: contradicting evidence reported when found (not suppressed)
CHECK unverifiable_noted: items that cannot be checked are explicitly listed
CHECK format_match: output format == input.output_format
CHECK verify_framework: VERIFY heuristics applied to all citations (if applicable)
CHECK verdicts_complete: (reference-level tasks only) data.verdicts has one entry per parsed reference, in input order, and each has a populated evidence.parsed block
no_false_verified fails -> downgrade to UNVERIFIABLE and flag to userall_evaluated fails -> return status: partial with list of unevaluated itemserrors listing which checks failedThe envelope is the machine-readable contract. Two invariants govern every field:
[] (for array
fields). Do not push "we tried X but got 404" into notes and drop
the matching resolved key — set resolved: null instead, so
downstream validators and graders see the structural signal.content string
disagrees with the machine fields (verdicts, verdict_summary,
self_check), the machine fields are authoritative. Consumers of
this skill parse JSON, not prose.meta:
skill: research-verification
version: 2.1.0
schema_version: 2
run_id: <UUID v4> # REQUIRED; hex digits only (0-9, a-f) in the 8-4-4-4-12 shape. If the
# runtime cannot generate one, use "00000000-0000-4000-8000-000000000000".
timestamp: <iso8601> # REQUIRED; UTC with offset (e.g. "2026-04-19T14:35:00Z")
input_count: <integer> # OPTIONAL but RECOMMENDED; number of references or claims the user supplied.
# When present, verdicts.length MUST equal input_count (see self_check.verdicts_complete).
status: success | partial | error # REQUIRED
data:
task: <task_type> # REQUIRED; e.g. verify_citations, verify_claims, bibliography_audit
verification_depth: quick | detailed | expert
output_format: markdown | latex | json
content: <string> # REQUIRED when status is success or partial; the human-readable report
# matching output_format. May be the empty string only when output_format
# is json AND all machine-readable data lives in verdicts/claims.
claims_evaluated: <integer> # populated for claim-level tasks
citations_checked: <integer> # populated for reference-level tasks; equals len(data.verdicts)
verdict_summary: # REQUIRED when data.verdicts is present. Closed set of exactly six keys;
# integer value on each. Keys outside this set are invalid.
verified: <integer>
partially_supported: <integer>
unsupported: <integer>
contradicted: <integer>
fabricated: <integer> # FLAGGED ROLLUP: sum of verdicts in the flag set
# { fabricated, fabricated_doi, metadata_mismatch_title,
# metadata_mismatch_author, metadata_mismatch_year }.
# Fine-grained counts live in data.verdicts, not here.
unverifiable: <integer>
verdicts: # REQUIRED for reference-level tasks (verify_citations, bibliography_audit).
# One entry per input reference, in input order. See "Worked examples" below.
- reference_id: <string> # stable id assigned in Step 0.5
verdict: verified | partially_supported | unsupported | contradicted
| fabricated | fabricated_doi
| metadata_mismatch_title | metadata_mismatch_author | metadata_mismatch_year
| unverifiable
confidence: <float 0.0-1.0> # 1.0 = resolver confirmed + all cross-checks pass; lower = partial match
evidence: # All six keys below are structurally REQUIRED (present, possibly null).
# Never omit a key — downstream validators key on presence.
parsed: { doi, title, authors, year, venue, source_format, raw } # object; always populated from Step 0.5
resolved: <object | null> # {doi, title, authors, year, venue, source, raw_response_hash} when resolved.
# null when CrossRef AND OpenAlex both 404 (fabricated_doi path) or when
# the no-DOI title search returned no confident match (unverifiable path).
cross_check: <object | null> # {title:{parsed,resolved,similarity,pass}, author:{parsed,resolved,pass},
# year:{parsed,resolved,pass}} when Step 4 (VERIFY cross-check) ran.
# null on the resolver-only path (authoritative DOI match, no parsed authors/title
# to cross-check against) and on the fabricated_doi path (nothing to compare).
candidates: <array> # array; populated only by the no-DOI path (Step 3c), up to 3 entries each
# {source, score, title, authors, year, doi, url}. Empty array [] otherwise.
abstract_snippet: <string | null> # populated only when verification_depth == detailed AND verdict == verified.
# null otherwise.
notes: <string | null> # free-form reason or human-verification guidance. null when no note applies.
# Reserved for context the structural fields cannot express — not a substitute
# for setting resolved/cross_check/candidates.
self_check: # REQUIRED when data.verdicts is present. All seven keys listed; each is
# "pass" or "fail". verdicts_complete mirrors the validator's length check.
all_evaluated: pass | fail
no_false_verified: pass | fail
contradictions: pass | fail
unverifiable_noted: pass | fail
format_match: pass | fail
verify_framework: pass | fail
verdicts_complete: pass | fail # pass iff len(data.verdicts) == meta.input_count (or data.citations_checked
# when input_count is absent). Self-reported signal paired with the
# shared envelope validator's independent length assertion.
errors: # REQUIRED (may be empty array); parse failures, duplicate references,
# resolver outages. A valid run with no errors emits `errors: []`.
- reference_id: <string | null>
stage: parse | resolve | cross_check
reason: <string>
raw: <string | null>
The four blocks below show the exact structural shape for the four
most common verdict classes. Every evidence key is present; fields
that do not apply are explicit null or [], never omitted.
verified (resolver + cross-check both green, detailed depth):
{
"reference_id": "vaswani-2017-attention-is-all-you-need",
"verdict": "verified",
"confidence": 1.0,
"evidence": {
"parsed": {
"doi": "10.48550/arXiv.1706.03762",
"title": "Attention is all you need",
"authors": ["Vaswani, A.", "Shazeer, N."],
"year": 2017,
"venue": "NeurIPS",
"source_format": "prose",
"raw": "1. Vaswani, A., et al. (2017). Attention is all you need..."
},
"resolved": {
"doi": "10.48550/arxiv.1706.03762",
"title": "Attention Is All You Need",
"authors": ["Ashish Vaswani", "Noam Shazeer"],
"year": 2017,
"venue": "NeurIPS",
"source": "openalex",
"raw_response_hash": "ca23b6db..."
},
"cross_check": {
"title": { "parsed": "Attention is all you need", "resolved": "Attention Is All You Need", "similarity": 1.0, "pass": true },
"author": { "parsed": "Vaswani", "resolved": "Vaswani", "pass": true },
"year": { "parsed": 2017, "resolved": 2017, "pass": true }
},
"candidates": [],
"abstract_snippet": "The dominant sequence transduction models...",
"notes": null
}
}
fabricated_doi (DOI returned 404 from both resolvers):
{
"reference_id": "smith-2024-quantum-emergence",
"verdict": "fabricated_doi",
"confidence": 0.99,
"evidence": {
"parsed": {
"doi": "10.1038/s42256-024-01247-fake",
"title": "Quantum emergence of AI consciousness",
"authors": ["Smith, J.", "Zhang, W."],
"year": 2024,
"venue": "Nature Machine Intelligence",
"source_format": "prose",
"raw": "26. Smith, J., & Zhang, W. (2024)..."
},
"resolved": null,
"cross_check": null,
"candidates": [],
"abstract_snippet": null,
"notes": "DOI returned 404 from both CrossRef and OpenAlex. VERIFY pre-scan flags: invented_doi_pattern."
}
}
metadata_mismatch_author (DOI resolves but parsed first author does not match):
{
"reference_id": "smith-2017-attention-is-all-you-need",
"verdict": "metadata_mismatch_author",
"confidence": 0.98,
"evidence": {
"parsed": {
"doi": "10.48550/arXiv.1706.03762",
"title": "Attention is all you need",
"authors": ["Smith, J."],
"year": 2017,
"venue": "NeurIPS",
"source_format": "prose",
"raw": "29. Smith, J. (2017). Attention is all you need..."
},
"resolved": {
"doi": "10.48550/arxiv.1706.03762",
"title": "Attention Is All You Need",
"authors": ["Ashish Vaswani", "Noam Shazeer"],
"year": 2017,
"venue": null,
"source": "openalex",
"raw_response_hash": "ca23b6db..."
},
"cross_check": {
"title": { "parsed": "Attention is all you need", "resolved": "Attention Is All You Need", "similarity": 1.0, "pass": true },
"author": { "parsed": "Smith", "resolved": "Vaswani", "pass": false },
"year": { "parsed": 2017, "resolved": 2017, "pass": true }
},
"candidates": [],
"abstract_snippet": null,
"notes": "VERIFY pre-scan flags: author_doi_pair_suspicious."
}
}
unverifiable (no-DOI title search returned no confident match):
{
"reference_id": "goodfellow-2014-gan",
"verdict": "unverifiable",
"confidence": 0.0,
"evidence": {
"parsed": {
"doi": null,
"title": "Generative adversarial nets",
"authors": ["Goodfellow, I.", "Pouget-Abadie, J."],
"year": 2014,
"venue": "NeurIPS",
"source_format": "prose",
"raw": "16. Goodfellow, I., Pouget-Abadie, J., et al. (2014)..."
},
"resolved": null,
"cross_check": null,
"candidates": [
{ "source": "openalex", "score": 0.62, "title": "Generative Adversarial Networks", "authors": ["Goodfellow"], "year": 2014, "doi": "10.48550/arxiv.1406.2661", "url": "https://openalex.org/W2964307815" }
],
"abstract_snippet": null,
"notes": "Top OpenAlex title-search candidate below 0.7 confidence threshold; resolver-only verdict withheld. Human-verifiable via the candidate URL."
}
}
literature-review -> suspicious citations flagged during batch audit for deep verificationresearch-discovery -> claims from exploratory phase needing rigorous fact-checkingmanuscript-review -> reviewer concerns about specific claims to verifyliterature-review -> verified/flagged citations for updating the bibliographyacademic-writing -> verified claims for confident inclusion in prosemanuscript-review -> verification report for strengthening rebuttal argumentsProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub researchatlas/research-atlas --plugin research-verification