From seshat
Use when co-authoring a specific section of a benchmark / evaluation paper (introduction, related work, methodology, results, discussion, limitations, conclusion) at frontier-lab register (METR, DeepMind, Anthropic, OpenAI). Triggers on "help me write the [section]", "let's work on the [section]", "frontier paper quality", "METR style", "DeepMind style", "sound like Anthropic", "make this read like a real paper", page-budget mentions ("max 2 pages", "we can't fit everything"), bulleted-finding requests, and any request to mimic how Anthropic, OpenAI, or DeepMind benchmark papers express ideas. Also triggers when the user supplies a whole draft or asks to "walk the draft", "go through every section", or "review my paper section by section" (document-walk mode). Pairs with `seshat:using-frontier-lexicon` (diction) by adding the missing stage of rhetorical-move mining from real frontier paragraphs in the matching section type. Do not use for pure word-level de-slopping (that is the lexicon skill alone), and do not use when the user has no claim or content for the section yet.
How this skill is triggered — by the user, by Claude, or both
Slash command
/seshat:co-authoring-frontier-paper-sectionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Frontier-paper-quality prose comes from mimicking sentence-level rhetorical moves taken from real frontier papers in the *same kind of section* you are writing, not from polishing your own first draft. The companion skill `seshat:using-frontier-lexicon` teaches diction (word-level register); this skill adds structure, format, rhetoric mining from real paragraphs, and per-paragraph co-authorship...
Frontier-paper-quality prose comes from mimicking sentence-level rhetorical moves taken from real frontier papers in the same kind of section you are writing, not from polishing your own first draft. The companion skill seshat:using-frontier-lexicon teaches diction (word-level register); this skill adds structure, format, rhetoric mining from real paragraphs, and per-paragraph co-authorship. Apply it section by section. Two modes: single-section (the user names one section; go straight to it) and document walk (the user supplies a whole draft; inventory the sections, then loop the five phases over each one in turn).
| Situation | Trigger |
|---|---|
| User picks a specific section to co-author | "let's work on the introduction", "help me write related work", "the discussion section needs work" |
| User has content / claims and asks how to phrase them | "I know what I want to say, I just don't know how" |
| User asks for METR / DeepMind / Anthropic / OpenAI register | "frontier paper quality", "sound like a real paper" |
| User has page-budget constraints | "max 2 pages", "what should be in the appendix" |
| User wants skim-readable structure | "readers will only see the bolded text", "METR-style bullets" |
When NOT to use:
seshat:using-frontier-lexicon directly.| Phase | What happens | Output |
|---|---|---|
| 1. Structure | Confirm the section's role in the paper; cut to page budget; pick format | Section spine + cut decisions |
| 2. Headlines | Write skim-readable hooks (bolded labels for results-style sections, paragraph-purpose openers for narrative sections) | Skim-readable contribution |
| 3. Diction | Lexicon search for native templates per claim or paragraph intent | Word-level register |
| 4. Rhetoric mining | Read real frontier paragraphs from the same section type you are writing; extract sentence-level moves | Move catalogue + templates |
| 5. Co-authorship | Per-paragraph multiple-choice phrasings; user picks; mimic moves | Final prose |
Phase 4 is the load-bearing phase. Diction-only mining (Phase 3 alone) produces "well-worded" prose that still reads as undergraduate. Rhetoric mining is what closes the gap to frontier register.
Section type matters in Phase 4. A results-section paragraph and an introduction paragraph have different rhetorical moves. Mine paragraphs from the same section type as the one you are writing (e.g., for an Introduction, mine introductions; for Limitations, mine limitations sections).
Default when the user supplies a whole draft or asks to work through the paper. The five phases are unchanged; the walk adds an outer loop and an agenda.
Walk-mode rules:
This is the methodology most users miss. The lexicon teaches diction; real paragraphs teach argumentation. Both stages are required.
seshat:using-frontier-lexicon)Invoke lex search "<intent>" --json to find native templates for the kind of claim the current paragraph needs to make.
What this stage gives you:
we evaluate, we observe, we find, we argue, we hypothesise, we caution).we introduce, we contribute, motivated by, we frame X as Y.prior work has shown, in contrast to, extends, complementary to.we evaluate, we measure, we score, we use the following protocol.we find, we observe, the strongest model, different models lead.is consistent with, suggests, taken together.we caution, we do not claim, the panel is, confidence intervals.paper_id values pointing at papers that already use the register your section needs. Harvest those for Stage 2.What this stage does NOT give you:
The corpus ships with this plugin at $CLAUDE_PLUGIN_ROOT/parsed/{anthropic,deepmind,openai}/*.md (341 parsed papers; longer-form research web articles also live in web-research/ and feed the lexicon, but Stage 2 paragraph mining uses the section-structured papers in parsed/). Outside a plugin session CLAUDE_PLUGIN_ROOT is unset; resolve paths from the seshat checkout root instead (the directory containing parsed/ and bin/lex). You do not read all of them. The hard part is selecting the 2-3 papers whose same-section paragraphs match the shape of the claim you are writing.
Discovery rule. For each paragraph, pick a candidate set of papers using whichever of these is fastest. Re-run discovery per paragraph; do not memorise a shortlist.
paper_id values from your Stage 1 search and open $CLAUDE_PLUGIN_ROOT/<paper_id>.md (a paper_id is the corpus-relative path, e.g. parsed/anthropic/<stem>; web-research/ ids resolve the same way, but prefer parsed/ sources here since Stage 2 mines section-structured papers). Jump to the section type you are writing.grep -rli "<keyword>" "$CLAUDE_PLUGIN_ROOT/parsed/" for shape-specific keywords. Examples (templates, not a fixed list):
"we introduce", "we present", "motivated by", "in this work""in contrast", "unlike prior", "complementary", "extends""we evaluate", "we score", "we use the following", "protocol""the best performing", "different models", "excelled in particular""fails to meet", "headroom", "the strongest model""taken together", "is consistent with", "suggests that""we caution", "we do not claim", "confidence intervals", "panel size"ls "$CLAUDE_PLUGIN_ROOT/parsed/<lab>/" and skim filenames for benchmark-eval papers with a similar evaluation regime, then jump straight to the section you are writing.Selection criterion. A paper is a good Stage-2 source if its paragraph in the same section type contains the shape of argument you need. Match on argumentation, not domain. A safety-eval paper can teach a math-eval paper how to phrase a threshold-gap result. A vision-eval paper can teach a telecom-eval paper how to phrase a per-model rotation. A robotics paper can teach a language paper how to phrase a limitations paragraph.
Anti-pattern: shortlist memorisation. Do not memorise a fixed shortlist of "good" papers. The shortlist that worked for the last claim is not necessarily right for the next claim, and the shortlist that worked for one section type is not necessarily right for another. The discovery commands are the artefact, not the discovered papers.
For each selected paper, extract:
[BRACKETS] for the user's specifics, capturing the move structure.These were extracted from frontier benchmark papers and recur across section types and claim shapes. Use them as starting points; mine more for the section and paragraph at hand.
| Move | One-line description | Verbatim corpus example |
|---|---|---|
| Lead with the surprising fact; hedge LAST | State the result before any qualifiers; let pivots (while, but, despite) carry complications | "Claude Opus 4.1 was the best performing model ... excelling in particular on aesthetics ... while GPT-5 excelled in particular on accuracy." |
| Contrastive connectors | while, despite, though, but chain multi-dimensional results into one argument | "gpt-5-mini performs nearly as well across the tasks despite being a far smaller model." |
| Mechanism after observation | State what you saw, then in the next clause or sentence state why | "Small distilled models ... only slightly underperform the base model ... model intelligence appears more important in output accuracy than model size." |
| Quantify the gap with a ratio or multiplier | Replace direction words (much, significantly) with ratios (25x, reduced by a third) | "GPT-4.1 nano outperforming August 2024's GPT-4o model despite being 25x cheaper." |
| Reframe the negative as headroom | Failure becomes test coverage for the next generation | "the strongest model, o3, achieves a score of 0.32 (compared to its score of 0.60 on HealthBench overall), providing headroom for the next generation of models." |
These five moves came from a results-section worked example; equivalent moves exist for introduction (e.g., frame the gap your work fills), related work (position by contrast, not survey), methodology (operational specifics, no framing language), discussion (aggregate findings into one synthesis sentence), limitations (name the confound, do not minimise it). Mine for them in the same section type you are writing.
paper_id values).[BRACKETS] template).furthermore, it is important to note, comprehensive, robust); parentheticals not dashes for asides.| Error | Why it happens | Fix |
|---|---|---|
| Blending several sections into one undifferentiated pass | Wider scope feels efficient | Use the document walk: agenda first, then one section at a time with confirmation between sections. |
| Asking every section's questions up front | Fewer round-trips feels polite | Questions belong inside each section's phases; the agenda only confirms order and global defaults. |
| Reusing one paper shortlist across the whole walk | Discovery feels done | Re-run Stage 1/2 discovery per section, matched to its section type. |
| Mining results paragraphs while writing an intro | Paper feels like one register | Section types have different moves. Mine same-section paragraphs. |
| Skipping Stage 2 (rhetoric mining) | Lexicon feels sufficient | Lexicon = words, not argumentation. Read real paragraphs. |
| Reusing the same shortlist of papers across every claim | Past success feels reusable | Re-run discovery per claim. Different shapes need different sources. |
| Pasting example sentences from the corpus | Templates feel ready-made | Use for register only. Write your own sentence. |
| Drafting prose then asking for redlines | Faster to write than to ask | Violates co-authorship. Ask intent → propose options → user picks. |
| Treating content as a flat list | Easier than ranking | Vitality test first. Cut to thesis. Rest goes elsewhere. |
| Em dashes for parentheticals | Default LLM rhythm | Use commas, parentheses, or semicolons. |
Inflated diction (comprehensive, robust, powerful) | Feels paper-like | Replace with the actual coverage / stress test / observation. |
| Inventing numbers or citations to make prose concrete | Plausibility pressure | Refuse per lexicon red flags. Ask the user. |
Single-best-replacement queries to lex | Treating the lexicon as a thesaurus | lex returns a ranked list. Read the examples. |
| Headlining a bullet with a 12-word sentence | "I need to fit the nuance" | The label is 5-6 words. The nuance lives in the body. |
End-to-end flow for one paragraph in a Results section. The same flow applies to Introduction, Related Work, Methodology, Discussion, Limitations; the discovery keywords change, but the phases do not.
Raw finding (from data): GPT-5.5 wins 4 capabilities, Qwen 3.6 Max wins 3, Gemini 3.1 Pro wins 2, Gemini 3.1 Flash Lite wins 1; the aggregate-mean leader and the per-capability leader differ.
Bolded label (5-6 words): Different models lead on different capabilities.
Stage 1 discovery (lexicon):
"$CLAUDE_PLUGIN_ROOT/bin/lex" search "different models lead on different capabilities" --json
The top template hit is Different X [verb] different Y. Harvest the top paper_id values for Stage 2. Different invocations may surface different paper_id values; that is correct — discovery, not memorisation.
Stage 2 discovery (paragraph mining, same section type):
grep -rli "the best performing" "$CLAUDE_PLUGIN_ROOT/parsed/openai/"
grep -rli "excelled in particular" "$CLAUDE_PLUGIN_ROOT/parsed/"
Open the resulting candidates plus the paper_id from Stage 1. Jump to the results section of each (not abstract, not discussion). Read the surrounding paragraphs. Extract the rhetorical move:
Lead with the overall winner. Immediately complicate with
while. Give concrete examples in parentheses. Close with the implication, not the hedge.
Final paragraph (after co-authorship):
Different models lead on different capabilities. GPT-5.5 wins 4 of the 10 capabilities (safety, healing, intent-to-configuration, wireless degradation), while Qwen 3.6 Max wins 3 (math, network optimisation, network deployment), Gemini 3.1 Pro wins 2 (knowledge, RCA), and Gemini 3.1 Flash Lite wins 1 (budget scheduling). The overall-mean leader is not the best on any agentic capability; the aggregate ordering does not predict per-capability outcomes.
Moves used:
while connectives chain the four wins.Generalisation note. A different section (e.g., Limitations) on the same finding would mine limitations paragraphs and apply different moves (e.g., name the confound, do not minimise it). The discovery commands and the move catalogue are the reusable artefacts; the specific papers and moves change with the section type.
Diction-only mining fixes the words but not the structure of an argument. A paragraph that says we evaluate instead of we leverage is closer to frontier register, but it can still bury the headline, hedge first, and stack observations without a connective. Real frontier papers do the opposite, and they do it differently per section: an Introduction frames a gap; a Methodology recites operational specifics; a Results section leads with the surprising fact; a Limitations section names confounds first. Those moves are not in the lexicon; they live in the paragraphs themselves, and they vary by section type. This skill makes that second stage explicit and tells you to mine same-section paragraphs.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub visual-snow/seshat --plugin seshat