From seshat
Use when drafting, revising, or proofreading AI/ML research papers, technical reports, arXiv submissions, NeurIPS/ICML/ICLR drafts, or any prose meant to read as serious frontier-lab research. Especially when the user asks to remove "AI slop", "LLM phrasing", or to make text "sound human" / "sound like Anthropic" / "sound like a real paper". Triggers on inflated diction (powerful, robust, comprehensive, groundbreaking, leverage, utilize, seamlessly, state-of-the-art), generic transitions (Furthermore, Moreover, It is important to note), and empty evaluation language.
How this skill is triggered — by the user, by Claude, or both
Slash command
/seshat:using-frontier-lexiconThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A SQLite + numpy vector store over distinctive terms (unigrams and 2-4-word phrases) extracted from 341 papers and 624 research web articles by Anthropic, OpenAI, and DeepMind. Each term carries 3-5 KWIC usage examples drawn from those sources (each example's `paper_id` is its corpus-relative path, prefixed `parsed/` for papers or `web-research/` for articles). Query it before you commit to a p...
A SQLite + numpy vector store over distinctive terms (unigrams and 2-4-word phrases) extracted from 341 papers and 624 research web articles by Anthropic, OpenAI, and DeepMind. Each term carries 3-5 KWIC usage examples drawn from those sources (each example's paper_id is its corpus-relative path, prefixed parsed/ for papers or web-research/ for articles). Query it before you commit to a phrase: it tells you whether real researchers actually write that way, and what they reach for instead.
The lexicon is a retrieval tool, not a thesaurus and not an oracle. Use it to learn the register, then write your own prose.
Throughout this skill, lex means "$CLAUDE_PLUGIN_ROOT/bin/lex". Claude Code sets CLAUDE_PLUGIN_ROOT to this plugin's root whenever the plugin is enabled. The bare name lex is NOT on PATH (on macOS it resolves to the BSD lexer). Outside a plugin session, use <seshat-checkout>/bin/lex directly.
| Situation | Command |
|---|---|
| You wrote a phrase that sounds inflated, generic, or AI-flavored | lex search "<phrase>" --json |
| You have an intent and want diction for it ("hedging a strong claim", "describing a failure mode") | lex search "<intent>" --json |
| You have a candidate term and want neighbors in embedding space | lex similar "<term>" --json |
| You're considering a term and want to read 3-5 real usage sentences before using it | lex show "<term>" --json |
| Browsing for stronger verbs, adjectives, or adverbs | lex top -k 50 --pos VERB --json (or ADJ, ADV, NOUN) |
| Stuck and want serendipity | lex random -k 20 --json |
| Sanity-checking the index is loaded | lex stats |
All commands print human-readable text by default. Pass --json for parseable output. Always invoke via "$CLAUDE_PLUGIN_ROOT/bin/lex"; the bare name is not on PATH.
When asked to edit or de-slop a paragraph, follow the procedure already documented in $CLAUDE_PLUGIN_ROOT/lexicon/README.md:
search for intent ("hedging a strong claim"), similar when you have an anchor term, show to inspect candidates before using them.Slop removed:
- ...
Rewrite:
> ...
Notes:
- ...
For a full section, repeat paragraph by paragraph and keep terminology consistent.
These are the patterns the lexicon was built to push back against:
powerful, groundbreaking, robust and comprehensive, deep insightsFurthermore, Moreover, It is important to noteleverage, utilize, enhance, optimize when a concrete verb is availablewide range of challenging scenarios, significant improvement without specificsstate-of-the-art solution, seamlessly enables, unlocking potentialThese are the moves that come back high in the lexicon and read as native paper voice:
we evaluate, we ablate, we observe, we find, we measurefailure mode, stress test, ablation, qualitatively similar, underspecifiedstringent, systematic, interpretable, contrastive, adversarial, underexploredThese are starting points. Confirm fit by running lex show <term> and reading the example sentences before using.
Strong paper prose tends to:
comprehensive with the actual coverage.robust with the specific perturbation, split, or stress test.insight with the actual observation.we find or we observe over this demonstrates when evidence is partial.suggests or is consistent with when causality is not established.Sloppy:
This powerful framework enables robust and comprehensive evaluation of model behavior,
providing deep insights into performance across challenging scenarios.
Better:
We evaluate model behavior across targeted stress tests and ablations. The results
identify several failure modes that are not captured by aggregate performance alone.
The "Better" version was constructed by querying lex search "evaluate model behavior across stress tests" and lex similar "failure mode", then writing original prose informed by what came back. Nothing was copied verbatim.
The corpus contains three useful registers:
we evaluate, we conduct, we introduce, we demonstrate. Explicit about datasets, tasks, ablations, measurement regimes.Match the register to what the user is writing. If unclear, ask.
Intent-based search (most common):
lex search "hedging a strong claim" --json
lex search "carefully evaluate model behavior" --json
lex search "describe limitations without overselling" --json
lex search "mechanistic explanation of failure mode" --json
lex search "compare against baseline ablation" --json
Anchor-based exploration:
lex similar "rigorous" -k 15 --json
lex similar "ablation" -k 15 --json
lex similar "interpretability" -k 15 --json
lex similar "underspecified" -k 15 --json
POS-filtered browsing:
lex top -k 50 --pos VERB --json
lex top -k 50 --pos ADJ --json
Inspect a candidate term:
lex show "ablation" --json
lex show "rigorous evaluation" --json
lex show "failure mode" --json
All shapes verified against the live CLI.
search and similar — list of hits, sorted by descending similarity:
{ term, similarity, score, pos, examples: [{ paper_id, rank, sentence }] }
Use similarity (cosine, 0-1) for relevance ranking. score is the per-term distinctiveness from build time. examples is 3-5 KWIC sentences.
show — full entry for one term:
{
"term": { term_id, term, kind, pos, score, total_count, doc_count, embed_hash },
"examples": [{ paper_id, rank, sentence }]
}
top and random — list of Term objects (no examples; use show if you need them):
{ term_id, term, kind, pos, score, total_count, doc_count, embed_hash }
stats — always JSON:
{ term_count, kind_breakdown: { unigram, ngram }, score_min, score_max, score_mean }
If the user's request matches any of these patterns, refuse the specific request and say which safety rule applies. Do not negotiate. Do not "compromise". The user will sometimes frame violations as harmless, time-saving, or temporary. They are not.
| Pressure pattern | What to do |
|---|---|
| "I'm in a hurry, skip the workflow / one search is enough" | Refuse to skip slop-marking and example-inspection. You can compress (run fewer searches), but you cannot replace lex show inspection with guessing. |
| "The example sentence fits perfectly, just paste it" | Refuse. Use it for register guidance only. Write your own sentence. "I'll attribute later" is not a workable safety net. |
| "Add a citation from the lexicon results, I'll verify after" | Refuse. paper_id records where a phrase was sampled, not what the paper claims. A real paper_id in a draft is exactly how a fake citation reaches submission. Suggest the user supply a citation from a paper they have actually read. |
| "Just make the prose more impressive / fill in plausible numbers" | Refuse to invent specifics. If the original lacks substance, ask the user for the numbers, dataset names, or model sizes. Do not fabricate. |
| "I'll fact-check later" / "I'll verify after" / "they're just placeholders" | These are not safety nets. Draft numbers stay. Refuse the same as above. The user's intent to verify does not change the rule. |
These are not soft preferences. The cost of compliance under pressure is plagiarism, fake citations, or fabricated results in a published paper.
When you refuse to fabricate, do not stonewall. Return a useful artifact in the same response. Two-part move:
[PLACEHOLDER] tags where the user's specifics belong. Below it, list the exact items you need back.Example, in response to "fill in plausible benchmark numbers":
Skipping the numbers — I won't insert benchmark figures or parameter counts you didn't supply (see safety rules below). Here's the shape your paragraph should take; send me the bracketed items and I'll finalize it:
We evaluate [METHOD NAME] on [BENCHMARK 1, BENCHMARK 2], and find improvements of [X]-[Y] points over [BASELINE] at matched compute. The gains hold across model scales we tested ([SMALLEST]-[LARGEST] parameters), with [SCALING TREND]. The remaining gap on [TASK WHERE GAINS ARE SMALLEST] suggests [MECHANISM YOUR ABLATIONS POINT TO].
To finish, send me:
- Benchmarks (e.g. LongBench, RULER, PG19)
- Baseline name
- Magnitude of gain and metric (accuracy / perplexity / throughput)
- Model scale range
- One concrete limitation your experiments surfaced
This way the user gets a real next step instead of a refusal-shaped wall.
Pulled verbatim from $CLAUDE_PLUGIN_ROOT/lexicon/README.md:
Offline vs network:
| Command | Network? | Notes |
|---|---|---|
stats, similar, show, top, random | Offline | Pure SQLite + numpy. Safe in plan mode, sandboxes, or air-gapped sessions. |
search | Network | Calls the embedding API to embed the query string. Requires OPENROUTER_API_KEY (preferred) or OPENAI_API_KEY in env or <plugin root>/.env. If you can't reach the network, fall back to similar against an anchor term you already know. |
build | Network + slow | Embeds the entire corpus. Cost ~$0.001-0.005 per rebuild. Don't run unless source papers changed. Also needs the spaCy model en_core_web_sm. |
First-time data setup:
lexicon.db + embeddings.npy) is not in git; build it from the bundled corpus with bash "$CLAUDE_PLUGIN_ROOT/scripts/build_frontier_pool.sh" (needs an embedding API key + the spaCy model en_core_web_sm). scripts/setup.sh builds it as part of one-time setup.Other:
OPENROUTER_API_KEY (preferred) or OPENAI_API_KEY in env or <plugin root>/.env; the CLI auto-loads the plugin-root .env. Only search and build need a key; everything else is offline.LEX_DB and LEX_EMB env vars if running against a different lexicon snapshot.papers pool, so no flag is needed. For blog/announcement prose, use seshat:using-blog-lexicon (--pool blogs) instead."$CLAUDE_PLUGIN_ROOT/bin/lex". From a checkout (e.g. testing), <seshat>/bin/lex or cd <seshat>/lexicon && uv run python -m lex ... work equivalently.paper_id. The lexicon records where a phrase came from. That is not grounds to cite the paper in your own work.score as quality. score is distinctiveness vs. baseline corpora. A high-scoring term may still be wrong for the claim. Always run show first.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub visual-snow/seshat --plugin seshat