Skill

co-authoring-frontier-paper-sections

Use when co-authoring a specific section of a benchmark / evaluation paper (introduction, related work, methodology, results, discussion, limitations, conclusion) at frontier-lab register (METR, DeepMind, Anthropic, OpenAI). Triggers on "help me write the [section]", "let's work on the [section]", "frontier paper quality", "METR style", "DeepMind style", "sound like Anthropic", "make this read like a real paper", page-budget mentions ("max 2 pages", "we can't fit everything"), bulleted-finding requests, and any request to mimic how Anthropic, OpenAI, or DeepMind benchmark papers express ideas. Also triggers when the user supplies a whole draft or asks to "walk the draft", "go through every section", or "review my paper section by section" (document-walk mode). Pairs with `seshat:using-frontier-lexicon` (diction) by adding the missing stage of rhetorical-move mining from real frontier paragraphs in the matching section type. Do not use for pure word-level de-slopping (that is the lexicon skill alone), and do not use when the user has no claim or content for the section yet.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/seshat:co-authoring-frontier-paper-sections

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Frontier-paper-quality prose comes from mimicking sentence-level rhetorical moves taken from real frontier papers in the *same kind of section* you are writing, not from polishing your own first draft. The companion skill `seshat:using-frontier-lexicon` teaches diction (word-level register); this skill adds structure, format, rhetoric mining from real paragraphs, and per-paragraph co-authorship...

SKILL.md

203 lines · ~5.1k tokens(exceeds 5k compaction limit)

Stats

LanguageHTML

Stars0

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Co-Authoring Frontier-Paper Sections

Overview

Frontier-paper-quality prose comes from mimicking sentence-level rhetorical moves taken from real frontier papers in the same kind of section you are writing, not from polishing your own first draft. The companion skill seshat:using-frontier-lexicon teaches diction (word-level register); this skill adds structure, format, rhetoric mining from real paragraphs, and per-paragraph co-authorship. Apply it section by section. Two modes: single-section (the user names one section; go straight to it) and document walk (the user supplies a whole draft; inventory the sections, then loop the five phases over each one in turn).

When to Use

Situation	Trigger
User picks a specific section to co-author	"let's work on the introduction", "help me write related work", "the discussion section needs work"
User has content / claims and asks how to phrase them	"I know what I want to say, I just don't know how"
User asks for METR / DeepMind / Anthropic / OpenAI register	"frontier paper quality", "sound like a real paper"
User has page-budget constraints	"max 2 pages", "what should be in the appendix"
User wants skim-readable structure	"readers will only see the bolded text", "METR-style bullets"

When NOT to use:

User wants only word-level de-slopping; use seshat:using-frontier-lexicon directly.
User has no claims, content, or findings for the section yet; clarify what the section needs to argue first.
User wants you to invent results, citations, or numbers; refuse per the lexicon's red-flag rules.
User asks for a single combined pass over the whole draft without per-section questions; that defeats co-authorship. Use document-walk mode instead, which still works one section at a time.

Core Pattern: Five Phases (Apply Per Section)

Phase	What happens	Output
1. Structure	Confirm the section's role in the paper; cut to page budget; pick format	Section spine + cut decisions
2. Headlines	Write skim-readable hooks (bolded labels for results-style sections, paragraph-purpose openers for narrative sections)	Skim-readable contribution
3. Diction	Lexicon search for native templates per claim or paragraph intent	Word-level register
4. Rhetoric mining	Read real frontier paragraphs from the same section type you are writing; extract sentence-level moves	Move catalogue + templates
5. Co-authorship	Per-paragraph multiple-choice phrasings; user picks; mimic moves	Final prose

Phase 4 is the load-bearing phase. Diction-only mining (Phase 3 alone) produces "well-worded" prose that still reads as undergraduate. Rhetoric mining is what closes the gap to frontier register.

Section type matters in Phase 4. A results-section paragraph and an introduction paragraph have different rhetorical moves. Mine paragraphs from the same section type as the one you are writing (e.g., for an Introduction, mine introductions; for Limitations, mine limitations sections).

Document Walk Mode

Default when the user supplies a whole draft or asks to work through the paper. The five phases are unchanged; the walk adds an outer loop and an agenda.

Inventory. Read the full draft. List every top-level section in order with approximate word count and current format (continuous paragraphs vs bulleted findings). Note missing canonical sections (e.g. no Discussion or Conclusion); they go to the wrap-up as open items, not onto the agenda.
Agenda. Present the inventory as a table and confirm via multiple-choice: walk all sections in order / reorder / skip listed sections. Ask the two global defaults once — register (METR / DeepMind / Anthropic / blend) and page budget for the whole paper. These pre-fill Phase 1 per section; the user can override either per section.
Loop. For each agenda section, announce progress ("Section k of N: "), then run Phases 1-5 exactly as in the Core Pattern, asking the per-section questions as usual. Discovery granularity is unchanged by the walk: re-run Stage 1 / Stage 2 per paragraph within each section, and never carry a shortlist from one section to the next.
Confirm before advancing. After Phase 5, show the section's final text and ask: apply and continue / revise this section / pause the walk. Never start the next section on an unconfirmed one.
Wrap-up. Close with a summary table: per section, what changed, what was cut to the appendix, and any open items the user deferred.

Walk-mode rules:

Do not batch questions across sections. Questions belong inside each section's phases; the agenda only confirms order and the two global defaults.
Do not silently reuse one section's Phase answers for the next. Only the confirmed global defaults (register, page budget) carry across.
Do not mine one shared paper shortlist for the whole walk. Each section re-runs discovery for its own section type.

The Two-Stage Mining Pattern

This is the methodology most users miss. The lexicon teaches diction; real paragraphs teach argumentation. Both stages are required.

Stage 1: Diction (use `seshat:using-frontier-lexicon`)

Invoke lex search "<intent>" --json to find native templates for the kind of claim the current paragraph needs to make.

What this stage gives you:

Word-level register; native verb choices (we evaluate, we observe, we find, we argue, we hypothesise, we caution).
Template shapes tied to the section type. Examples by section:
- Introduction: we introduce, we contribute, motivated by, we frame X as Y.
- Related work: prior work has shown, in contrast to, extends, complementary to.
- Methodology: we evaluate, we measure, we score, we use the following protocol.
- Results: we find, we observe, the strongest model, different models lead.
- Discussion: is consistent with, suggests, taken together.
- Limitations: we caution, we do not claim, the panel is, confidence intervals.
A list of paper_id values pointing at papers that already use the register your section needs. Harvest those for Stage 2.

What this stage does NOT give you:

Argumentation flow.
Whether to lead with the surprising fact or the hedge.
Where to put the mechanism within the paragraph.
How to chain claims into a coherent paragraph.

Stage 2: Rhetoric (read real paragraphs from the corpus)

The corpus ships with this plugin at $CLAUDE_PLUGIN_ROOT/parsed/{anthropic,deepmind,openai}/*.md (341 parsed papers; longer-form research web articles also live in web-research/ and feed the lexicon, but Stage 2 paragraph mining uses the section-structured papers in parsed/). Outside a plugin session CLAUDE_PLUGIN_ROOT is unset; resolve paths from the seshat checkout root instead (the directory containing parsed/ and bin/lex). You do not read all of them. The hard part is selecting the 2-3 papers whose same-section paragraphs match the shape of the claim you are writing.

Discovery rule. For each paragraph, pick a candidate set of papers using whichever of these is fastest. Re-run discovery per paragraph; do not memorise a shortlist.

Lexicon-driven discovery. Take the paper_id values from your Stage 1 search and open $CLAUDE_PLUGIN_ROOT/<paper_id>.md (a paper_id is the corpus-relative path, e.g. parsed/anthropic/<stem>; web-research/ ids resolve the same way, but prefer parsed/ sources here since Stage 2 mines section-structured papers). Jump to the section type you are writing.
Grep-driven discovery. Run grep -rli "<keyword>" "$CLAUDE_PLUGIN_ROOT/parsed/" for shape-specific keywords. Examples (templates, not a fixed list):
- introduction motivation hook → "we introduce", "we present", "motivated by", "in this work"
- related-work positioning → "in contrast", "unlike prior", "complementary", "extends"
- methodology protocol → "we evaluate", "we score", "we use the following", "protocol"
- results rotation claim → "the best performing", "different models", "excelled in particular"
- results threshold gap → "fails to meet", "headroom", "the strongest model"
- discussion synthesis → "taken together", "is consistent with", "suggests that"
- limitations / caveats → "we caution", "we do not claim", "confidence intervals", "panel size"
Title-driven discovery. ls "$CLAUDE_PLUGIN_ROOT/parsed/<lab>/" and skim filenames for benchmark-eval papers with a similar evaluation regime, then jump straight to the section you are writing.

Selection criterion. A paper is a good Stage-2 source if its paragraph in the same section type contains the shape of argument you need. Match on argumentation, not domain. A safety-eval paper can teach a math-eval paper how to phrase a threshold-gap result. A vision-eval paper can teach a telecom-eval paper how to phrase a per-model rotation. A robotics paper can teach a language paper how to phrase a limitations paragraph.

Anti-pattern: shortlist memorisation. Do not memorise a fixed shortlist of "good" papers. The shortlist that worked for the last claim is not necessarily right for the next claim, and the shortlist that worked for one section type is not necessarily right for another. The discovery commands are the artefact, not the discovered papers.

For each selected paper, extract:

2-3 verbatim paragraph examples from the same section type whose argumentation matches.
The rhetorical moves (2-4 sentences abstracting the argumentation).
A template with [BRACKETS] for the user's specifics, capturing the move structure.

Quick Reference: Five Reusable Moves

These were extracted from frontier benchmark papers and recur across section types and claim shapes. Use them as starting points; mine more for the section and paragraph at hand.

Move	One-line description	Verbatim corpus example
Lead with the surprising fact; hedge LAST	State the result before any qualifiers; let pivots (`while`, `but`, `despite`) carry complications	"Claude Opus 4.1 was the best performing model ... excelling in particular on aesthetics ... while GPT-5 excelled in particular on accuracy."
Contrastive connectors	`while`, `despite`, `though`, `but` chain multi-dimensional results into one argument	"gpt-5-mini performs nearly as well across the tasks despite being a far smaller model."
Mechanism after observation	State what you saw, then in the next clause or sentence state why	"Small distilled models ... only slightly underperform the base model ... model intelligence appears more important in output accuracy than model size."
Quantify the gap with a ratio or multiplier	Replace direction words (`much`, `significantly`) with ratios (`25x`, `reduced by a third`)	"GPT-4.1 nano outperforming August 2024's GPT-4o model despite being 25x cheaper."
Reframe the negative as headroom	Failure becomes test coverage for the next generation	"the strongest model, o3, achieves a score of 0.32 (compared to its score of 0.60 on HealthBench overall), providing headroom for the next generation of models."

These five moves came from a results-section worked example; equivalent moves exist for introduction (e.g., frame the gap your work fills), related work (position by contrast, not survey), methodology (operational specifics, no framing language), discussion (aggregate findings into one synthesis sentence), limitations (name the confound, do not minimise it). Mine for them in the same section type you are writing.

Implementation Steps

Pick the mode. If the user names one section, work on that section alone — even when they also supply the whole draft (read the rest for context only). Enter Document Walk Mode only when no section is named: inventory, agenda, then steps 2-11 per section in turn.
Confirm the user has claims, content, or findings to put in the section. If not, surface that gap before continuing.
Surface the section's role and page budget. Multiple-choice question. Pre-recommend with reasons.
Pick format. Multiple-choice question: continuous paragraphs (DeepMind register) vs bulleted findings with bolded labels (METR register, mostly results / contributions / limitations). Show short mocks.
Run the vitality test on content. A claim, paragraph, or finding is vital if it is load-bearing for the section's role OR independently quotable. Anything else moves to the appendix or a later section. Show the cut as a table so the user can disagree before committing.
Write skim-readable hooks, one per surviving paragraph or bullet. For results-style sections, use 5-6 word bolded labels. For narrative sections (introduction, discussion), use paragraph-purpose openers (the first sentence is the hook).
For each paragraph, run Stage 1 (lexicon search; harvest paper_id values).
For each paragraph, run Stage 2 (open the corpus papers; jump to the same section type; extract 2-3 verbatim paragraph examples; abstract 2-4 rhetorical moves; build a [BRACKETS] template).
Co-author paragraph by paragraph. Propose 2-3 phrasing options, pre-recommend one with reasons, let the user pick. Never draft prose then ask for redlines.
Apply project style rules. For OTBench: no em dashes (use semicolons or commas); no LLM filler (furthermore, it is important to note, comprehensive, robust); parentheticals not dashes for asides.
Verify each paragraph against source data. Every number cited must trace to a findings file; refuse to invent any specific the user did not supply.

Common Mistakes

Error	Why it happens	Fix
Blending several sections into one undifferentiated pass	Wider scope feels efficient	Use the document walk: agenda first, then one section at a time with confirmation between sections.
Asking every section's questions up front	Fewer round-trips feels polite	Questions belong inside each section's phases; the agenda only confirms order and global defaults.
Reusing one paper shortlist across the whole walk	Discovery feels done	Re-run Stage 1/2 discovery per section, matched to its section type.
Mining results paragraphs while writing an intro	Paper feels like one register	Section types have different moves. Mine same-section paragraphs.
Skipping Stage 2 (rhetoric mining)	Lexicon feels sufficient	Lexicon = words, not argumentation. Read real paragraphs.
Reusing the same shortlist of papers across every claim	Past success feels reusable	Re-run discovery per claim. Different shapes need different sources.
Pasting example sentences from the corpus	Templates feel ready-made	Use for register only. Write your own sentence.
Drafting prose then asking for redlines	Faster to write than to ask	Violates co-authorship. Ask intent → propose options → user picks.
Treating content as a flat list	Easier than ranking	Vitality test first. Cut to thesis. Rest goes elsewhere.
Em dashes for parentheticals	Default LLM rhythm	Use commas, parentheses, or semicolons.
Inflated diction (`comprehensive`, `robust`, `powerful`)	Feels paper-like	Replace with the actual coverage / stress test / observation.
Inventing numbers or citations to make prose concrete	Plausibility pressure	Refuse per lexicon red flags. Ask the user.
Single-best-replacement queries to `lex`	Treating the lexicon as a thesaurus	`lex` returns a ranked list. Read the examples.
Headlining a bullet with a 12-word sentence	"I need to fit the nuance"	The label is 5-6 words. The nuance lives in the body.

Worked Example (Results Section)

End-to-end flow for one paragraph in a Results section. The same flow applies to Introduction, Related Work, Methodology, Discussion, Limitations; the discovery keywords change, but the phases do not.

Raw finding (from data): GPT-5.5 wins 4 capabilities, Qwen 3.6 Max wins 3, Gemini 3.1 Pro wins 2, Gemini 3.1 Flash Lite wins 1; the aggregate-mean leader and the per-capability leader differ.

Bolded label (5-6 words): Different models lead on different capabilities.

Stage 1 discovery (lexicon):

"$CLAUDE_PLUGIN_ROOT/bin/lex" search "different models lead on different capabilities" --json

The top template hit is Different X [verb] different Y. Harvest the top paper_id values for Stage 2. Different invocations may surface different paper_id values; that is correct — discovery, not memorisation.

Stage 2 discovery (paragraph mining, same section type):

grep -rli "the best performing" "$CLAUDE_PLUGIN_ROOT/parsed/openai/"
grep -rli "excelled in particular" "$CLAUDE_PLUGIN_ROOT/parsed/"

Open the resulting candidates plus the paper_id from Stage 1. Jump to the results section of each (not abstract, not discussion). Read the surrounding paragraphs. Extract the rhetorical move:

Lead with the overall winner. Immediately complicate with while. Give concrete examples in parentheses. Close with the implication, not the hedge.

Final paragraph (after co-authorship):

Different models lead on different capabilities. GPT-5.5 wins 4 of the 10 capabilities (safety, healing, intent-to-configuration, wireless degradation), while Qwen 3.6 Max wins 3 (math, network optimisation, network deployment), Gemini 3.1 Pro wins 2 (knowledge, RCA), and Gemini 3.1 Flash Lite wins 1 (budget scheduling). The overall-mean leader is not the best on any agentic capability; the aggregate ordering does not predict per-capability outcomes.

Moves used:

Lead with the claim (the bolded label is the lead).
Contrastive while connectives chain the four wins.
Concrete capability names in parentheses.
Mechanism in the final clause.

Generalisation note. A different section (e.g., Limitations) on the same finding would mine limitations paragraphs and apply different moves (e.g., name the confound, do not minimise it). The discovery commands and the move catalogue are the reusable artefacts; the specific papers and moves change with the section type.

Why this skill exists

Diction-only mining fixes the words but not the structure of an argument. A paragraph that says we evaluate instead of we leverage is closer to frontier register, but it can still bury the headline, hedge first, and stack observations without a connective. Real frontier papers do the opposite, and they do it differently per section: an Introduction frames a gap; a Methodology recites operational specifics; a Results section leads with the surprising fact; a Limitations section names confounds first. Those moves are not in the lexicon; they live in the paragraphs themselves, and they vary by section type. This skill makes that second stage explicit and tells you to mine same-section paragraphs.

co-authoring-frontier-paper-sections

Invocation

Context Preview

SKILL.md

co-authoring-frontier-paper-sections

Invocation

Context Preview

SKILL.md

Co-Authoring Frontier-Paper Sections

Overview

When to Use

Core Pattern: Five Phases (Apply Per Section)

Document Walk Mode

The Two-Stage Mining Pattern

Stage 1: Diction (use `seshat:using-frontier-lexicon`)

Stage 2: Rhetoric (read real paragraphs from the corpus)

Quick Reference: Five Reusable Moves

Implementation Steps

Common Mistakes

Worked Example (Results Section)

Why this skill exists

Similar Skills

Co-Authoring Frontier-Paper Sections

Overview

When to Use

Core Pattern: Five Phases (Apply Per Section)

Document Walk Mode

The Two-Stage Mining Pattern

Stage 1: Diction (use `seshat:using-frontier-lexicon`)

Stage 2: Rhetoric (read real paragraphs from the corpus)

Quick Reference: Five Reusable Moves

Implementation Steps

Common Mistakes

Worked Example (Results Section)

Why this skill exists

Similar Skills

co-authoring-frontier-paper-sections

Invocation

Context Preview

SKILL.md

co-authoring-frontier-paper-sections

Invocation

Context Preview

SKILL.md

Co-Authoring Frontier-Paper Sections

Overview

When to Use

Core Pattern: Five Phases (Apply Per Section)

Document Walk Mode

The Two-Stage Mining Pattern

Stage 1: Diction (use seshat:using-frontier-lexicon)

Stage 2: Rhetoric (read real paragraphs from the corpus)

Quick Reference: Five Reusable Moves

Implementation Steps

Common Mistakes

Worked Example (Results Section)

Why this skill exists

Similar Skills

Co-Authoring Frontier-Paper Sections

Overview

When to Use

Core Pattern: Five Phases (Apply Per Section)

Document Walk Mode

The Two-Stage Mining Pattern

Stage 1: Diction (use seshat:using-frontier-lexicon)

Stage 2: Rhetoric (read real paragraphs from the corpus)

Quick Reference: Five Reusable Moves

Implementation Steps

Common Mistakes

Worked Example (Results Section)

Why this skill exists

Similar Skills

Stage 1: Diction (use `seshat:using-frontier-lexicon`)

Stage 1: Diction (use `seshat:using-frontier-lexicon`)