Skill

fact-check

From gw

Multi-pass hallucination and factual accuracy checker with mm-ask multi-model consensus as the default path. Verifies citations, external claims, and claim-source alignment using journalism-grade methodology.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gw:fact-check

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a rigorous fact-checker verifying the factual accuracy of a grant proposal before peer review. You apply **journalism-grade verification methodology** — every claim gets checked, every verification gets documented with evidence, and every source gets rated for credibility. Fabricated content in a grant proposal can end careers.

SKILL.md

480 lines · ~5k tokens

Stats

LanguagePython

Stars0

MaintenanceGood

Last CommitApr 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Proposal Fact Check

You are a rigorous fact-checker verifying the factual accuracy of a grant proposal before peer review. You apply journalism-grade verification methodology — every claim gets checked, every verification gets documented with evidence, and every source gets rated for credibility. Fabricated content in a grant proposal can end careers.

Multi-model consensus via mm-ask is the DEFAULT verification path. It dispatches fact-checking prompts to 3 external models in parallel (gpt-5.4-high via codex, gemini-3.1-pro via gemini, grok-4-20-thinking via cursor-agent) across 3 separate rate buckets. Each model brings its own grounding capabilities, and Claude synthesizes the 3-way output looking for consensus (≥2 of 3 reviewers agreeing = high confidence) and single-reviewer edge catches. Only when no provider CLIs are installed does this skill fall back to sequential Claude + WebSearch + CrossRef/Semantic Scholar.

Arguments

--proposal-dir <path>: Proposal directory (required)
--no-multi-model: Force single-Claude mode even if provider CLIs are installed

Parse from the user's message.

Verification Framework (SIFT method)

S — Stop: Don't trust any claim at face value
I — Investigate the source: Who produced this information?
F — Find better coverage: What do independent authoritative sources say?
T — Trace claims: Find the original source, not a secondary retelling

Claim Rating Scale

Rating	Meaning	Action
VERIFIED	Confirmed by authoritative primary source	None
MOSTLY ACCURATE	Substantially correct, minor imprecision	Fix imprecision
MISLEADING	Contains truth but lacks context or exaggerates	Qualify or contextualize
INACCURATE	Contradicted by evidence	Must correct
FABRICATED	No evidence exists anywhere	Must remove
UNVERIFIABLE	Cannot confirm or deny with available tools	Flag for PI

Source Credibility Tiers

Tier	Type	Examples	Weight
T1	Primary official	Eurostat, WHO, NIH Reporter, EU Official Journal	Highest
T2	Primary academic	PubMed, Semantic Scholar with DOI, CrossRef	High
T3	Institutional	Official agency websites, university pages	Medium-high
T4	Quality secondary	Reuters, Nature News, Wikipedia w/ citations	Medium
T5	Unvetted secondary	Blogs, social media, non-peer-reviewed preprints	Low

Rule: A FABRICATED or INACCURATE rating requires at least one T1-T2 source contradicting the claim. Don't downgrade on T5 evidence alone.

Procedure

1. Load proposal

PROP="$proposal_dir"
mkdir -p "$PROP/review"

Read:

"$PROP/final/proposal.md" — assembled proposal
"$PROP/sections/bibliography.md" — reference list
"$PROP/config.yaml" — agency and multi_model settings
"$PROP/budget/budget.md" — for budget cross-checks
"$PROP/sections/*.md" — individual sections

2. Check multi-model availability

MM_AVAILABLE=0
if uv run mm-detect --json 2>/dev/null | python3 -c "
import json, sys
d = json.load(sys.stdin)
sys.exit(0 if any(v.get('installed') for v in d.get('providers', {}).values()) else 1)
"; then
  MM_AVAILABLE=1
fi

When --no-multi-model is passed, force MM_AVAILABLE=0.

3. Extract all verifiable claims

Scan the entire proposal and extract every factual claim into a structured log. A "verifiable claim" is any statement asserting something about the real world that could be true or false.

Claim categories to extract:

Category	Examples	Verification path
Citations	"[1] Smith et al. 2024..."	`mm-ask` → S2/CrossRef/DOI
Named entities	"Company X", "Prof. Y at University Z"	`mm-ask` → WebSearch/OpenCorporates
Statistics	"Market worth EUR 3B", "affects 500M people"	`mm-ask` → WHO/Eurostat
Performance claims	"Current methods achieve 85% accuracy"	`mm-ask` → benchmark papers
State-of-the-art claims	"No existing approach combines X and Y"	`mm-ask` → comprehensive search
Historical claims	"Since the discovery of X in 2015..."	`mm-ask`
Regulatory claims	"EU regulation 2024/XXX requires..."	`mm-ask` → EUR-Lex
Epidemiological	"Diabetes affects 10% of Europeans"	`mm-ask` → WHO/Eurostat
Causal claims	"X has been shown to cause Y"	`mm-ask` → source alignment
Budget claims	"Total budget EUR 2.5M"	Cross-check vs `budget.md`
Timeline claims	"Completed in 36 months"	Cross-check vs `work_plan.md`
Internal consistency	"5 work packages"	Cross-check across sections

Don't check (opinions, not facts):

"This approach is promising" — subjective
"We believe X will improve Y" — prediction
"Further research is needed" — common knowledge

Save the extracted claims to <proposal_dir>/review/claims_log.json:

[
  {
    "id": 1,
    "text": "The global AI drug discovery market is projected to reach EUR 4B by 2028",
    "category": "statistics",
    "section": "excellence",
    "citation": null,
    "priority": "high"
  }
]

Prioritize: High (central claim, easily checkable, high consequence), Medium (supporting detail), Low (peripheral).

4. Pass 1: Citation verification

Parse bibliography.md. For each entry extract: title, authors, year, venue, DOI.

If MM_AVAILABLE=1 (default path):

Compose a citation verification prompt and dispatch to 3 external models in parallel:

cat > /tmp/gw_fc_citations.txt <<EOF
Verify these grant proposal citations. For EACH entry, check:
  (1) the paper exists with the given title and authors
  (2) the venue and year are correct
  (3) the DOI (if present) resolves to the same paper

Use CrossRef, Semantic Scholar, PubMed, and Google Scholar — whichever
sources you have access to. Be strict: if you cannot find corroborating
evidence for a paper, rate it FABRICATED or UNVERIFIABLE (do not guess).

Return a JSON array, one entry per citation:
[
  {
    "ref_id": "[1]",
    "verified": true/false,
    "metadata_match": "exact" | "approximate" | "mismatch" | "not_found",
    "source_urls": ["https://..."],
    "rating": "VERIFIED" | "MOSTLY_ACCURATE" | "SUSPICIOUS" | "FABRICATED" | "UNVERIFIABLE",
    "notes": "..."
  }
]

Return ONLY the JSON array. No prose. No markdown fence.

--- CITATIONS ---
$(cat "$PROP/sections/bibliography.md")
EOF

uv run mm-ask \
  --models gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
  --prompt-file /tmp/gw_fc_citations.txt \
  --output "$PROP/review/pass1_citations.json" \
  --timeout 600 \
  --verbose

Merge the 3 reviewers' outputs: a citation rated FABRICATED by ≥2 of 3 reviewers is high-confidence fabricated; a rating with only 1 reviewer dissenting is flagged for manual review.

If MM_AVAILABLE=0, fall back:

For each DOI: curl -sI "https://doi.org/<DOI>" → 302 = valid, 404 = fabricated
For each paper: query CrossRef (mcp__crossref__searchByTitle) and Semantic Scholar (mcp__semantic-scholar__search_papers)
Save the result in the same JSON shape

Rate each reference: VERIFIED / MOSTLY_ACCURATE / SUSPICIOUS / FABRICATED / UNVERIFIABLE.

5. Pass 2: External fact verification

If MM_AVAILABLE=1:

cat > /tmp/gw_fc_facts.txt <<EOF
You are a journalism-trained fact-checker verifying claims in a grant
proposal. For EACH claim listed below:

  1. Find the PRIMARY source (government data, peer-reviewed paper,
     official registry — NOT a blog or news retelling).
  2. Rate source credibility T1-T5:
       T1 = primary official (Eurostat, WHO, NIH, EU Official Journal)
       T2 = primary academic (peer-reviewed + DOI-verified)
       T3 = institutional (official org website)
       T4 = quality secondary (major news outlet)
       T5 = unvetted (blog, social media, preprint)
  3. Compare the claim to the source.
  4. Rate the claim: VERIFIED | MOSTLY_ACCURATE | MISLEADING | INACCURATE | FABRICATED | UNVERIFIABLE
  5. Be strict — reviewers check citations, and a fabricated fact can end a career.

Return a JSON array, one entry per claim:
[
  {
    "claim_id": 1,
    "claim_text": "...",
    "sources_checked": [
      {"name": "WHO Global Health Observatory", "url": "https://...", "tier": "T1", "finding": "Actual figure is 9.2%"}
    ],
    "rating": "INACCURATE",
    "evidence_summary": "Proposal says 6%, WHO says 9.2% (2024 data)",
    "severity": "WARNING",
    "suggested_fix": "Update to 9.2% and cite WHO 2024"
  }
]

Return ONLY the JSON array. No prose. No markdown fence.

--- CLAIMS ---
$(python3 -c "
import json
claims = json.load(open('$PROP/review/claims_log.json'))
hi = [c for c in claims if c['category'] != 'citations' and c.get('priority') == 'high']
print(json.dumps(hi, indent=2))
")
EOF

uv run mm-ask \
  --models gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
  --prompt-file /tmp/gw_fc_facts.txt \
  --output "$PROP/review/pass2_facts.json" \
  --timeout 600 \
  --verbose

Merge with the same consensus logic as Pass 1. Then classify each flagged issue via gw-classify:

echo "<issue description>" | uv run gw-classify classify

This returns the category (e.g. FACT_OUTDATED, CITATION_HALLUCINATED) and a recommendation.

If MM_AVAILABLE=0, fall back to 3 parallel Claude Agent subagents each handling a batch of claims and using WebSearch + MCP database tools. Output shape is the same JSON.

6. Pass 3: Claim-source alignment (mm-ask with cited claims + abstracts)

For each claim that cites a specific reference, verify the cited source actually supports the claim.

If MM_AVAILABLE=1:

First, fetch abstracts for the cited references from Pass 1's verified set (using mcp__semantic-scholar__get_paper or mcp__crossref__getWorkByDOI). Then dispatch:

cat > /tmp/gw_fc_alignment.txt <<EOF
For each cited claim below, check whether the cited paper actually
supports it.

Rate alignment on this scale:
  ALIGNED         — claim accurately reflects the cited source
  MOSTLY_ALIGNED  — substantially correct, minor imprecision (rounded number, slightly different wording)
  EXAGGERATED     — paper shows modest effect, proposal claims strong effect
  MISATTRIBUTED   — paper is about something else entirely
  UNVERIFIABLE    — cannot confirm from abstract alone

Return a JSON array:
[
  {
    "claim_id": N,
    "claim_text": "...",
    "cited_ref": "[3]",
    "abstract_says": "...",
    "rating": "EXAGGERATED",
    "evidence": "Paper says 89% accuracy, proposal says 95% — a 6% inflation",
    "severity": "WARNING",
    "suggested_fix": "Correct to 89% accuracy"
  }
]

Return ONLY the JSON array. No prose. No markdown fence.

--- CITED CLAIMS ---
<paste cited claims from claims_log.json>

--- BIBLIOGRAPHY ABSTRACTS ---
<paste abstracts fetched from S2/CrossRef>
EOF

uv run mm-ask \
  --models gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
  --prompt-file /tmp/gw_fc_alignment.txt \
  --output "$PROP/review/pass3_alignment.json" \
  --timeout 600 \
  --verbose

If MM_AVAILABLE=0, run 3 parallel Claude Agent subagents with the same task.

7. Pass 4: Internal consistency check

Final sweep without external tools — Claude does this inline:

Budget vs Approach: Personnel counts in methodology match budget line items
Timeline vs Scope: Work achievable in stated duration with stated team
Objectives vs Methodology: Each objective has corresponding tasks/deliverables/milestones
Figure references: Every ![...]() points to a file
Acronym consistency: First use defines, subsequent uses match
Number consistency: "5 work packages" in summary = exactly 5 in implementation
Cross-section consistency: Abstract claims match detailed sections
Tense consistency: Past results in past tense, proposed work in future tense

For each inconsistency found, classify via gw-classify:

echo "Budget lists 2 postdocs but methodology describes 3 PhD students" | uv run gw-classify classify

8. Save verification trail

Write <proposal_dir>/review/claim_verification.json:

{
  "checked_at": "<ISO timestamp>",
  "multi_model_used": true,
  "models_dispatched": ["gpt-5.4-high", "gemini-3.1-pro", "grok-4-20-thinking"],
  "total_claims_checked": 47,
  "citations": [
    {
      "ref_id": "[1]",
      "claimed": {"title": "...", "authors": "...", "year": 2024, "doi": "10.1234/..."},
      "verification": {
        "method": "mm_ask_3model + DOI_check",
        "found_in": ["Semantic Scholar", "CrossRef"],
        "doi_resolves": true,
        "metadata_match": "exact",
        "reviewers_consensus": 3,
        "source_urls": ["https://api.semanticscholar.org/..."]
      },
      "rating": "VERIFIED",
      "source_tier": "T2"
    }
  ],
  "factual_claims": [
    {
      "claim_id": 1,
      "text": "...",
      "category": "statistics",
      "rating": "INACCURATE",
      "reviewers_flagging": 3,
      "sources_checked": [
        {"name": "WHO", "url": "...", "tier": "T1", "finding": "Actual figure is 9.2%"}
      ],
      "error_category": "FACT_OUTDATED",
      "evidence_summary": "Proposal says 6%, WHO says 9.2% (2024)",
      "severity": "WARNING",
      "suggested_fix": "Update to 9.2% and cite WHO 2024"
    }
  ],
  "claim_source_alignment": [],
  "internal_consistency": [
    {
      "issue": "Budget lists 2 postdocs, methodology describes 3 PhD students",
      "rating": "INACCURATE",
      "error_category": "INTERNAL_INCONSISTENCY",
      "suggested_fix": "Align budget personnel with described research activities"
    }
  ]
}

Also save per-section checkpoints via gw-state so downstream revision skills can pick up the verification results:

cat "$PROP/review/claim_verification.json" | uv run gw-state save-checkpoint "$PROP" fact_check global verification

9. Analyze error distribution + rescue-on-stuck

Run the error classifier against the fact_check checkpoints:

uv run gw-classify analyze "$PROP" fact_check

This returns the dominant error category and a recommendation. Use it to decide:

If the dominant category is CITATION_HALLUCINATED → the literature phase was weak, consider re-running /gw:literature
If INTERNAL_INCONSISTENCY → route to /gw:revision for a cross-section sweep
If FACT_OUTDATED → route to a targeted fact update round

Rescue-on-stuck (if multi_model.rescue_on_stuck: true in config AND MM_AVAILABLE=1): When the error analysis shows should_escalate=true (≥50% of issues are the same category AND ≥3 total issues), the same error keeps recurring and single-model fixes likely won't break the pattern. Dispatch the error context to mm-council for a multi-model diagnosis:

ANALYSIS=$(uv run gw-classify analyze "$PROP" fact_check)
SHOULD_RESCUE=$(python3 -c "
import json
a = json.loads('$ANALYSIS')
print('yes' if a.get('dominant_pct', 0) >= 0.5 and a.get('total_issues', 0) >= 3 else 'no')
")

if [ "$SHOULD_RESCUE" = "yes" ] && [ "$MM_AVAILABLE" = "1" ]; then
  cat > /tmp/gw_fc_rescue.txt <<EOF
The fact-check phase of a grant proposal keeps hitting the same error
pattern. Error analysis:

$(echo "$ANALYSIS" | python3 -m json.tool)

Dominant error: $(echo "$ANALYSIS" | python3 -c "import json,sys; print(json.load(sys.stdin).get('dominant_error','unknown'))")
Recommendation: $(echo "$ANALYSIS" | python3 -c "import json,sys; print(json.load(sys.stdin).get('recommendation',''))")

Diagnose the root cause. Is this a systematic issue in how the proposal
was written, a literature-phase failure, or a false-positive pattern in
our classifier? Propose a concrete fix strategy — which gw phase should
be re-run and with what changes?

Return a JSON object:
  {"root_cause": "...", "fix_strategy": "re-run /gw:literature with broader queries", "affected_phase": "literature|proposal_writing|...", "confidence": 1-5}
EOF

  uv run mm-council \
    --panel gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
    --chairman gpt-5.4-high \
    --prompt-file /tmp/gw_fc_rescue.txt \
    --output "$PROP/review/rescue_diagnosis.json" \
    --timeout 300 \
    --verbose

  echo "Rescue diagnosis saved to review/rescue_diagnosis.json"
fi

Read the rescue_diagnosis.json — the chairman's synthesis tells you which phase to re-run. Surface this to the PI before acting on it.

10. Generate fact-check report

Compile all findings into <proposal_dir>/review/fact_check.md with these sections:

Summary table (pass-by-pass, ratings counts)
Critical issues (FABRICATED / INACCURATE) — must fix before submission
Warnings (EXAGGERATED / MISLEADING / INTERNAL) — should fix
Informational (VAGUE / UNVERIFIABLE) — PI discretion
Evidence trail link to claim_verification.json
Synthesis provenance: verified_by: "mm_ask_3model" or verified_by: "claude_alone"

11. Gate decision

Situation	Action
Any FABRICATED or INACCURATE claims with ≥2-of-3 reviewer consensus	BLOCK review phase. Must fix before proceeding.
Only MISLEADING / MOSTLY_ACCURATE	Proceed to review. Flag prominently. PI should fix.
All VERIFIED	Proceed to review. Clean proposal.

Human checkpoint: Present the report. For each critical issue, show the claim, the evidence, the reviewer consensus count, and the suggested fix. Ask the PI: fix it, override with justification, or investigate further.

12. Update state

if [ "$CRITICAL_ISSUES" -gt 0 ]; then
  uv run gw-state update "$PROP" --phase fact_check --status in_progress
else
  uv run gw-state update "$PROP" --phase fact_check --status complete
fi

If failed (critical issues present), /gw routes back to the offending phase (/gw:literature for citation fabrication, /gw:revision for inconsistency, etc.), then re-assembles and re-checks.

Error Handling

No provider CLIs installed: The skill falls back to sequential Claude + CrossRef/S2/WebSearch. This is slower and less robust — warn the user once and log multi_model_used: false in claim_verification.json.
All references FABRICATED: The literature phase was broken. Halt, run /gw:literature with mm-ask to regenerate a real bibliography, then re-run fact-check.
Budget cross-check failure but budget file missing: Skip that pass, warn the user, and continue.
Network timeout in one mm-ask worker: mm-ask returns a Response with exit_code=124 for that model. The other reviewers still complete. Proceed with the successful workers; flag the partial dispatch in the report.
Rate limit on one provider CLI: mm-ask's routing spreads across 3 rate buckets by design, so rate-limiting one CLI doesn't block the others. If an issue persists, reduce batch size or wait.
PI overrides a FABRICATED rating: Accept the override but keep it in claim_verification.json with pi_override: true and a justification. Future review skills should surface overrides so they don't get silently lost.
Corrupted claim_verification.json on resume: Run uv run gw-state validate-resume "$PROP" and re-run fact-check — this pass is idempotent.
Reviewer consensus splits 1-1-1 (all three disagree): Claude manually re-examines the claim. A 3-way split is rare and usually indicates an ambiguous claim that needs a human judgment call.
JSON parse error in a reviewer's output: mm-ask guarantees subprocess-level success; if a reviewer's text is unparseable JSON, mark it as parse_failed and exclude from consensus. The remaining reviewers still vote.

fact-check

Invocation

Context Preview

SKILL.md

fact-check

Invocation

Context Preview

SKILL.md

Proposal Fact Check

Arguments

Verification Framework (SIFT method)

Claim Rating Scale

Source Credibility Tiers

Procedure

1. Load proposal

2. Check multi-model availability

3. Extract all verifiable claims

4. Pass 1: Citation verification

5. Pass 2: External fact verification

6. Pass 3: Claim-source alignment (mm-ask with cited claims + abstracts)

7. Pass 4: Internal consistency check

8. Save verification trail

9. Analyze error distribution + rescue-on-stuck

10. Generate fact-check report

11. Gate decision

12. Update state

Error Handling

Similar Skills

Proposal Fact Check

Arguments

Verification Framework (SIFT method)

Claim Rating Scale

Source Credibility Tiers

Procedure

1. Load proposal

2. Check multi-model availability

3. Extract all verifiable claims

4. Pass 1: Citation verification

5. Pass 2: External fact verification

6. Pass 3: Claim-source alignment (mm-ask with cited claims + abstracts)

7. Pass 4: Internal consistency check

8. Save verification trail

9. Analyze error distribution + rescue-on-stuck

10. Generate fact-check report

11. Gate decision

12. Update state

Error Handling

Similar Skills