From llm-guidelines
Apply the community guidelines for empirical software engineering studies involving LLMs to a paper draft and its supplementary material. Invoke when the user is preparing or reviewing a manuscript whose study falls into one of the seven types defined in `study-types/` (LLMs as annotators, judges, synthesis, or subjects; studying LLM usage in software engineering; LLMs for new software engineering tools; benchmarking LLMs for software engineering tasks), and wants per-guideline feedback on what to report.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-guidelines:llm-guidelinesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill helps a *paper author* (and, secondarily, a *reviewer*) check a draft manuscript and its supplementary material against the eight community guidelines for reporting empirical SE studies involving LLMs. The guidelines themselves live in `guidelines/`, indexed below; the scope, study-type taxonomy, and reporting checklist live alongside.
checklist.mdguidelines/declare-llm-usage-and-role.mdguidelines/report-limitations-and-mitigations.mdguidelines/report-model-version-configuration-and-customizations.mdguidelines/report-session-traces.mdguidelines/report-system-and-prompt-design.mdguidelines/use-an-open-llm-as-a-baseline.mdguidelines/use-human-validation-for-llm-outputs.mdguidelines/use-suitable-baselines-benchmarks-and-metrics.mdscope.mdstudy-types/advantages-and-challenges.mdstudy-types/benchmarking-llms-for-software-engineering-tasks.mdstudy-types/llms-as-annotators.mdstudy-types/llms-as-judges.mdstudy-types/llms-as-subjects.mdstudy-types/llms-as-tools-for-software-engineering-researchers.mdstudy-types/llms-as-tools-for-software-engineers.mdstudy-types/llms-for-new-software-engineering-tools.mdstudy-types/llms-for-synthesis.mdstudy-types/studying-llm-usage-in-software-engineering.mdThis skill helps a paper author (and, secondarily, a reviewer) check a draft manuscript and its supplementary material against the eight community guidelines for reporting empirical SE studies involving LLMs. The guidelines themselves live in guidelines/, indexed below; the scope, study-type taxonomy, and reporting checklist live alongside.
Audience and tone. The default user is an author improving their own draft. Findings should be framed as suggestions for clearer reporting, not as violations. If a reviewer invokes this skill, treat the output as a starting point for review comments, not as a rejection rubric. The guidelines are community recommendations; incompleteness against them is not grounds for rejection.
Invoke this skill when the user:
Do not invoke when the user is doing unrelated software engineering work that happens to mention LLMs.
The user typically provides one or more of:
.tex): read it directly and follow \input{}/\include{} for files inside the project tree. A flattened .tex works the same way..pdf): extract text using whatever tool is available in the environment (e.g., pdftotext, mutool draw -F txt, or a Python library). If no extractor is available, tell the user which one to install rather than failing silently.
When both are available, prefer the LaTeX source: it lets you spot LaTeX-specific artifacts (e.g., commented-out disclosures, \todo{} notes) that get lost in the PDF.If no path is supplied, ask the user for the paper source path and (optionally) supplementary paths before proceeding.
Resolve inputs. For a LaTeX paper, read the entry-point file and follow \input{}/\include{} for files inside the project tree. For a PDF paper, extract text with a tool available in the environment (pdftotext, mutool, pdfminer.six, etc.). For supplementary directories, start with the top-level README* or INDEX* and the directory layout to orient yourself, then read whatever the layout points at as relevant: prompts, traces, datasets, code, replication scripts. Drill deeper only where the structure suggests evidence for a guideline; do not try to read every file in a large package.
Identify the study type(s). Use study-types/ to classify the study. A single paper can fall under multiple types (e.g., a new tool that also benchmarks LLMs). Note the classification at the top of the report.
Consult scope.md to confirm the work is in scope (LLM use that materially affects the research method or its outcomes, not just LLM-assisted writing).
Per-guideline assessment. For each of the eight guidelines listed below, load the corresponding file in guidelines/ on demand, then for that guideline produce:
Status: one of covered, partial, not found, or not applicable (with a one-line reason if N/A).Evidence: 1 to 3 short quotes or file pointers from the paper or supplementary material that support the status.Gaps: bullet list of specific missing items (must/should-level), each phrased as an author-facing suggestion (e.g., "Consider naming the exact model version and access date in the methodology.").Pointers: links to the relevant section(s) of the consulted guideline file.Apply the RFC 2119 levels from the guideline text: must items become "required for full reporting"; should items become "recommended". Do not invent severity levels not present in the guideline.
Cross-cutting concerns. After the per-guideline pass, scan checklist.md for any item that did not surface during step 4 and add it to a Checklist gaps section if missing.
Write the report. Save the assessment as llm-guidelines-report.md in the user's current working directory and print the same content to the console. Use the report template below.
Stop after the report. Do not modify the user's paper or supplementary material. If the user asks for follow-up edits, treat that as a new request.
The eight guidelines are stored as standalone files under guidelines/. Load on demand (do not load all eight unless needed).
Stored under study-types/. Load only the files relevant to the study being assessed.
scope.md: what is in and out of scope for the guidelines.checklist.md: the consolidated reporting checklist organized by paper section, with severity markers.# LLM Guidelines Assessment
**Paper:** <path or title>
**Supplementary material:** <paths>
**Identified study type(s):** <one or more from study-types/>
**Skill version:** <VERSION>
> This report applies the community LLM reporting guidelines from
> https://llm-guidelines.org as a self-check for authors. It is not a
> rejection rubric; missing items are reporting gaps to consider, not
> grounds for rejection.
## Summary
<Two-to-four-sentence overall summary: what is reported well, what is missing.>
## Per-guideline findings
### Declare LLM Usage and Role
- Status: <covered | partial | not found | not applicable>
- Evidence: <quotes or pointers>
- Gaps: <author-facing bullets>
- Pointers: <links to guidelines/declare-llm-usage-and-role.md>
<... repeat for each of the remaining guidelines ...>
## Checklist gaps
<Items from checklist.md that were not surfaced above.>
## Notes for reviewers
<If invoked by a reviewer, surface 3 to 5 clarifying questions worth asking the
authors. Otherwise omit this section.>
must or should in the underlying guideline. If the guideline does not specify a level, use neutral language ("Consider...").Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub se-uhd/llm-guidelines-skill --plugin llm-guidelines