Skill

llm-guidelines

Apply the community guidelines for empirical software engineering studies involving LLMs to a paper draft and its supplementary material. Invoke when the user is preparing or reviewing a manuscript whose study falls into one of the seven types defined in `study-types/` (LLMs as annotators, judges, synthesis, or subjects; studying LLM usage in software engineering; LLMs for new software engineering tools; benchmarking LLMs for software engineering tasks), and wants per-guideline feedback on what to report.

Popularity

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/llm-guidelines:llm-guidelines

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill helps a *paper author* (and, secondarily, a *reviewer*) check a draft manuscript and its supplementary material against the eight community guidelines for reporting empirical SE studies involving LLMs. The guidelines themselves live in `guidelines/`, indexed below; the scope, study-type taxonomy, and reporting checklist live alongside.

Supporting Files

checklist.mdguidelines/declare-llm-usage-and-role.mdguidelines/report-limitations-and-mitigations.mdguidelines/report-model-version-configuration-and-customizations.mdguidelines/report-session-traces.mdguidelines/report-system-and-prompt-design.mdguidelines/use-an-open-llm-as-a-baseline.mdguidelines/use-human-validation-for-llm-outputs.mdguidelines/use-suitable-baselines-benchmarks-and-metrics.mdscope.mdstudy-types/advantages-and-challenges.mdstudy-types/benchmarking-llms-for-software-engineering-tasks.mdstudy-types/llms-as-annotators.mdstudy-types/llms-as-judges.mdstudy-types/llms-as-subjects.mdstudy-types/llms-as-tools-for-software-engineering-researchers.mdstudy-types/llms-as-tools-for-software-engineers.mdstudy-types/llms-for-new-software-engineering-tools.mdstudy-types/llms-for-synthesis.mdstudy-types/studying-llm-usage-in-software-engineering.md

SKILL.md

138 lines · ~2.3k tokens

Stats

Parent stars0

Parent forks1

MaintenanceGood

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

LLM Guidelines for Empirical SE Studies

This skill helps a paper author (and, secondarily, a reviewer) check a draft manuscript and its supplementary material against the eight community guidelines for reporting empirical SE studies involving LLMs. The guidelines themselves live in guidelines/, indexed below; the scope, study-type taxonomy, and reporting checklist live alongside.

Audience and tone. The default user is an author improving their own draft. Findings should be framed as suggestions for clearer reporting, not as violations. If a reviewer invokes this skill, treat the output as a starting point for review comments, not as a rejection rubric. The guidelines are community recommendations; incompleteness against them is not grounds for rejection.

When to use

Invoke this skill when the user:

Asks to check a paper (or paper plus supplementary material) against the LLM reporting guidelines.
Is drafting the methodology, results, or limitations of an empirical study involving an LLM and wants reporting suggestions.
Is preparing a replication package and wants to know what to include for a study involving an LLM.
Is reviewing such a paper and asks for clarifying questions to consider.

Do not invoke when the user is doing unrelated software engineering work that happens to mention LLMs.

Inputs the user will provide

The user typically provides one or more of:

A path to the paper. Both forms are supported:
- A LaTeX source file (.tex): read it directly and follow \input{}/\include{} for files inside the project tree. A flattened .tex works the same way.
- A PDF (.pdf): extract text using whatever tool is available in the environment (e.g., pdftotext, mutool draw -F txt, or a Python library). If no extractor is available, tell the user which one to install rather than failing silently. When both are available, prefer the LaTeX source: it lets you spot LaTeX-specific artifacts (e.g., commented-out disclosures, \todo{} notes) that get lost in the PDF.
One or more pointers to supplementary material with prompts, traces, datasets, code, or replication packages. Each can be a local directory, a local repository, or a public URL (e.g., a GitHub repo or a Zenodo record); clone or fetch URLs as the user prefers.

If no path is supplied, ask the user for the paper source path and (optionally) supplementary paths before proceeding.

Workflow

Resolve inputs. For a LaTeX paper, read the entry-point file and follow \input{}/\include{} for files inside the project tree. For a PDF paper, extract text with a tool available in the environment (pdftotext, mutool, pdfminer.six, etc.). For supplementary directories, start with the top-level README* or INDEX* and the directory layout to orient yourself, then read whatever the layout points at as relevant: prompts, traces, datasets, code, replication scripts. Drill deeper only where the structure suggests evidence for a guideline; do not try to read every file in a large package.
Identify the study type(s). Use study-types/ to classify the study. A single paper can fall under multiple types (e.g., a new tool that also benchmarks LLMs). Note the classification at the top of the report.
Consult scope.md to confirm the work is in scope (LLM use that materially affects the research method or its outcomes, not just LLM-assisted writing).
Per-guideline assessment. For each of the eight guidelines listed below, load the corresponding file in guidelines/ on demand, then for that guideline produce:
- Status: one of covered, partial, not found, or not applicable (with a one-line reason if N/A).
- Evidence: 1 to 3 short quotes or file pointers from the paper or supplementary material that support the status.
- Gaps: bullet list of specific missing items (must/should-level), each phrased as an author-facing suggestion (e.g., "Consider naming the exact model version and access date in the methodology.").
- Pointers: links to the relevant section(s) of the consulted guideline file.
Apply the RFC 2119 levels from the guideline text: must items become "required for full reporting"; should items become "recommended". Do not invent severity levels not present in the guideline.
Cross-cutting concerns. After the per-guideline pass, scan checklist.md for any item that did not surface during step 4 and add it to a Checklist gaps section if missing.
Write the report. Save the assessment as llm-guidelines-report.md in the user's current working directory and print the same content to the console. Use the report template below.
Stop after the report. Do not modify the user's paper or supplementary material. If the user asks for follow-up edits, treat that as a new request.

Guidelines index

The eight guidelines are stored as standalone files under guidelines/. Load on demand (do not load all eight unless needed).

Study-types index

Stored under study-types/. Load only the files relevant to the study being assessed.

Other resources

scope.md: what is in and out of scope for the guidelines.
checklist.md: the consolidated reporting checklist organized by paper section, with severity markers.

Report template

# LLM Guidelines Assessment

**Paper:** <path or title>
**Supplementary material:** <paths>
**Identified study type(s):** <one or more from study-types/>
**Skill version:** <VERSION>

> This report applies the community LLM reporting guidelines from
> https://llm-guidelines.org as a self-check for authors. It is not a
> rejection rubric; missing items are reporting gaps to consider, not
> grounds for rejection.

## Summary

<Two-to-four-sentence overall summary: what is reported well, what is missing.>

## Per-guideline findings

### Declare LLM Usage and Role
- Status: <covered | partial | not found | not applicable>
- Evidence: <quotes or pointers>
- Gaps: <author-facing bullets>
- Pointers: <links to guidelines/declare-llm-usage-and-role.md>

<... repeat for each of the remaining guidelines ...>

## Checklist gaps

<Items from checklist.md that were not surfaced above.>

## Notes for reviewers

<If invoked by a reviewer, surface 3 to 5 clarifying questions worth asking the
authors. Otherwise omit this section.>

Constraints

Cite the guidelines; do not paraphrase severity. When you write "required" or "recommended", make sure it matches a must or should in the underlying guideline. If the guideline does not specify a level, use neutral language ("Consider...").
Do not infer non-disclosure. A missing detail in the paper does not mean the authors did something wrong; it means the report could state the detail explicitly.
Stick to the guidelines. Only flag items the eight guideline files cover. Do not add critiques of aspects the guidelines do not address (e.g., the paper's theoretical contribution, novelty argument, narrative structure, or general writing quality), even if you notice them.
Do not modify user files without an explicit follow-up request from the user.

llm-guidelines

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

llm-guidelines

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

LLM Guidelines for Empirical SE Studies

When to use

Inputs the user will provide

Workflow

Guidelines index

Study-types index

Other resources

Report template

Constraints

Similar Skills

LLM Guidelines for Empirical SE Studies

When to use

Inputs the user will provide

Workflow

Guidelines index

Study-types index

Other resources

Report template

Constraints

Similar Skills