From kivi-claude-skills
AI research pipeline that orchestrates HF Papers API, web search, and content extraction into actionable practitioner briefings. Teaches research thinking — problem decomposition, query diversification, strategic paper reading, and cross-source synthesis. Use when the user wants to research an AI/ML topic, understand a technique, survey recent papers, find state-of-the-art approaches, or bridge academic findings to practical implementation. Triggers include "context-research", "research this AI topic", "what papers exist on", "survey recent work on", "literature review", "SOTA for", "what's the latest research on".
How this skill is triggered — by the user, by Claude, or both
Slash command
/kivi-claude-skills:context-researchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Transform a problem statement into an actionable practitioner briefing by searching HF Papers, reading actual methodology, complementing with web findings, and synthesizing across sources.
Transform a problem statement into an actionable practitioner briefing by searching HF Papers, reading actual methodology, complementing with web findings, and synthesizing across sources.
problem → query expansion (5 angles) → parallel HF paper search
→ triage (relevance x recency) → deep read (methodology + results)
→ web complement (implementations, blogs, code)
→ synthesis → docs/research/context-<slug>-<YYYY-MM-DD>.md
/context-research <problem> — Full 6-phase pipeline (5 queries, top 5 papers, web complement)/context-research quick <problem> — Fast mode (3 queries, top 3 papers, skip web complement)/context-research deep <problem> — Deep mode (5 queries, top 8 papers, full web + implementation search)docs/research/context-<slug>-<YYYY-MM-DD>.md
Example: docs/research/context-kv-cache-compression-2026-03-23.md
Create the docs/research/ directory if it doesn't exist.
This is the core of the skill. These are cognitive strategies, not API calls. Internalize them before executing any phase.
Before generating any search query, decompose the user's problem into 5 dimensions:
| Dimension | Question | Example ("KV cache compression for long-context LLMs") |
|---|---|---|
| Core concept | What is the fundamental technique? | KV cache compression |
| Upstream | What does this build on? | Attention mechanisms, memory-efficient transformers |
| Downstream | Where is this applied? | Long-context inference, deployment cost reduction |
| Alternatives | What else solves this problem? | Sparse attention, sliding window, linear attention |
| Limitations | What are known failure modes? | Quality degradation at high compression ratios |
This map drives query expansion. Skip it and you'll search with tunnel vision.
Generate exactly 5 keyword variants. Each must surface papers the others would miss:
| Variant | Strategy | Example |
|---|---|---|
| Q1 | Exact technique — the canonical term | "KV cache compression" |
| Q2 | Broader category — the parent field | "efficient transformer inference memory" |
| Q3 | Upstream method — the technique it builds on | "attention head pruning quantization" |
| Q4 | Downstream application — the use case | "long context window LLM serving deployment" |
| Q5 | Alternative approach — competing methods | "sparse attention linear attention mechanisms" |
Anti-pattern: "KV cache compression", "compressing KV caches", "KV cache size reduction", "reduce KV cache memory", "smaller KV caches" — these are rephrasings that return the same papers.
For quick mode: generate Q1, Q2, Q5 only (3 queries).
Score each paper 1-5 on these dimensions:
| Signal | Weight | How to assess |
|---|---|---|
| Title match | High | Does the title directly address the user's problem? |
| Recency | Medium | < 6 months = 5, 6-12 months = 4, 1-2 years = 3, older = 2 |
| Citation/upvote signal | Medium | HF Papers upvotes, if visible |
| Reproducibility | High | Links to code, models, or datasets = strong signal |
| Methodology fit | Varies | Method paper vs. survey vs. position paper — match to user intent |
Composite score = weighted average. Papers scoring >= 3.5 proceed to deep read.
Exception: A foundational paper (pre-2024) scoring 3.0 on recency but 5.0 on everything else should still proceed. Recency is a signal, not a filter.
Do NOT read papers linearly. Use this priority order:
Skip: Related Work (you already have your own survey), Introduction (redundant with abstract), Conclusion (redundant with results).
For each paper, extract exactly these fields:
If you can't fill these fields, you haven't read deeply enough.
RESEARCH STRATEGY
Problem: <user's problem>
Decomposition:
Core: <...>
Upstream: <...>
Downstream: <...>
Alternatives: <...>
Limitations: <...>
Search Queries:
Q1 (exact): <...>
Q2 (broader): <...>
Q3 (upstream): <...>
Q4 (downstream): <...>
Q5 (alternative): <...>
Launch concurrent Agent sub-agents, each executing a single paper_search call:
Agent 1: paper_search(query="<Q1>", results_limit=10)
Agent 2: paper_search(query="<Q2>", results_limit=10)
Agent 3: paper_search(query="<Q3>", results_limit=10)
Agent 4: paper_search(query="<Q4>", results_limit=10)
Agent 5: paper_search(query="<Q5>", results_limit=10)
Each agent should return: paper title, paper ID (arxiv ID or HF paper ID), date, authors, abstract snippet, upvotes (if available), and any linked repos.
| Mode | Agents | results_limit |
|---|---|---|
| quick | 3 | 5 |
| default | 5 | 10 |
| deep | 5 | 15 |
Important: Use the huggingface-skills:hugging-face-paper-publisher skill's paper_search via the HF MCP tool. Each sub-agent prompt should instruct it to call paper_search and return structured results.
| Mode | Top N |
|---|---|
| quick | 3 |
| default | 5 |
| deep | 8 |
TRIAGE RESULTS
────────────────────────────────────────────────────────────────
# Score Date Code? Title
1 4.2 2026-02 Yes KV Cache Compression via Learned...
2 3.8 2025-11 Yes Efficient Long-Context Inference...
3 3.7 2026-01 No Sparse Attention for Production...
4 3.6 2025-09 Yes Dynamic Token Pruning in...
5 3.5 2024-12 Yes A Survey of Memory-Efficient...
────────────────────────────────────────────────────────────────
Searched: 50 papers | Unique: 32 | Selected: 5 for deep read
For each selected paper:
Fetch content: Try in order:
WebFetch on https://arxiv.org/html/<arxiv-id> (HTML version, best for extraction)defuddle parse "https://arxiv.org/abs/<arxiv-id>" -m -j (abstract page)WebFetch on https://huggingface.co/papers/<arxiv-id> for community contextIf paper links to HF models/datasets: use hub_repo_details(repo_ids=[<linked-repo>], include_readme=true) to understand the implementation
Apply Strategic Reading (Section 1.4): extract methodology, key results, ablation findings
Apply Practitioner Lens (Section 1.5): fill the 5 extraction fields for each paper
Do NOT summarize the abstract and call it "reading". You must extract information from the methodology and results sections that is NOT in the abstract.
Skip entirely in quick mode.
Use WebSearch with targeted queries:
Default mode (3 searches):
"<core technique>" blog tutorial explained — practitioner explanations"<core technique>" github implementation — code repos"<core technique>" production deployment experience — real-world reportsDeep mode (5 searches, adds):
4. "<core technique>" benchmark comparison evaluation — comparative analysis
5. "<core technique>" limitations failure cases — known issues
For the best 2-3 web results, use defuddle parse "<url>" -m -j to extract clean content. Read the extracted content and pull out:
Generate the output document using the template below. Save to docs/research/context-<slug>-<YYYY-MM-DD>.md.
The synthesis is not a concatenation of summaries. It must:
# Context Research: <Problem Statement>
Date: <YYYY-MM-DD>
Mode: <default|quick|deep>
## Executive Summary
<3-5 sentences answering: What is the current state of this problem? What is the
most promising approach right now? What should a practitioner do TODAY? This is the
only section many readers will read — make it count.>
## Key Findings
### 1. <Finding Title>
- **Paper**: <title> (<date>) [<arxiv-id>]
- **Core Insight**: <one sentence, plain language>
- **Key Result**: <headline metric with context>
- **Applicability**: <when this works / when it doesn't>
- **Complexity**: <drop-in | architecture change | retraining required>
- **Code**: [<repo-name>](<url>) | No public implementation
### 2. <Finding Title>
...
## Landscape Overview
| Approach | Paper | Date | Key Metric | Code | Complexity |
|----------|-------|------|-----------|------|------------|
| ... | ... | ... | ... | ... | ... |
## Methodology Deep Dive
<For the top 2-3 most relevant papers, explain the methodology in practitioner
terms. Focus on "what would I need to change in my code?" not "the authors
formulate an optimization objective." Include architecture diagrams or algorithm
steps if extracted from the paper.>
### <Paper 1 Title>
<Methodology explanation>
<Key algorithmic steps>
<Implementation considerations>
### <Paper 2 Title>
...
## Practical Implications
- **Try first**: <the lowest-risk, highest-reward approach>
- **Try if X**: <conditional recommendations based on constraints>
- **Avoid**: <approaches that look promising but have hidden costs>
- **Open questions**: <what you'd need to validate yourself>
- **Infrastructure needs**: <tooling/compute/data changes required>
## Web Complement
### Implementations Found
- [<Repo Name>](<url>) — <what it implements, stars, last updated>
### Practitioner Reports
- <Summary of real-world deployment experiences, linking to source>
### Tutorials & Explanations
- [<Title>](<url>) — <what it covers>
## Research Gaps
<What questions remain unanswered? Where is the field converging vs. diverging?
What experiments would settle open debates? What would the user need to benchmark
themselves?>
## Sources
### Papers
- [<Title>](<HF Papers URL>) — arxiv:<id> (<date>)
### Web Sources
- [<Title>](<URL>) — accessed <YYYY-MM-DD>
quick mode, aim for speed — skip web complement, lighter synthesis, 3 queries only"GQA grouped query attention" github pytorchnpx claudepluginhub phoxiao/kivi-claude-skills --plugin kivi-claude-skillsLooks up current research information by auto-routing queries to the Parallel Chat API (general research) or Perplexity sonar-pro-search (academic paper searches). Use for finding papers, gathering research data, or verifying scientific information.
Guides research planning: analyzes query complexity, decomposes via Self-Ask, Least-to-Most, DAG, parallel strategies; scales effort, sets stopping criteria, avoids anti-patterns.
Researches SOTA AI/ML literature for topics, methods, or architectures; finds papers, builds comparison tables, recommends codebase strategies, and generates phased implementation plans.