Runs automated field tests on all registered GNOME extensions with ego-lint, diffs against baselines, classifies findings as TP/FP/borderline, and generates regression reports. Useful for extension quality assurance.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gnome-extension-reviewer:ego-field-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run batch ego-lint across all field test extensions with baseline diffing, finding classification, and regression reporting.
Run batch ego-lint across all field test extensions with baseline diffing, finding classification, and regression reporting.
--review: Run ego-review on ALL extensions via headless claude -p (~$0.50-2.00/ext, ~5-10 min each)--review-changed: Run ego-review only on extensions with changed lint results--review-dry-run: Hydrate review prompts without invoking claude -p (for testing templates)--parallel N: Max concurrent claude -p sessions (default: 3)--update-baselines: Save current lint results as new baselines after reviewbash scripts/field-test-runner.sh --no-fetch
If extensions aren't cached yet:
bash scripts/field-test-runner.sh
This runs ego-lint on all extensions in field-tests/manifest.yaml, produces JSON results per extension, diffs against baselines, and appends to history.
To also run ego-review (headless Claude):
# Review all extensions (~$5-20 total, ~20-30 min with parallel 3)
bash scripts/field-test-runner.sh --no-fetch --review
# Review only changed extensions (cheaper, faster)
bash scripts/field-test-runner.sh --no-fetch --review-changed
# Test prompt hydration without invoking Claude
bash scripts/field-test-runner.sh --no-fetch --review-dry-run --extension hara-hachi-bu
# Control concurrency
bash scripts/field-test-runner.sh --no-fetch --review --parallel 5
Read the summary:
field-tests/results/<latest-timestamp>/summary.json
For each extension, read:
field-tests/results/<latest-timestamp>/<name>.lint.json — full resultsfield-tests/results/<latest-timestamp>/<name>.diff.json — diff from baseline (if baseline exists)For each extension with unannotated_findings in the diff:
field-tests/annotations/<name>.yaml with the classificationClassification guide:
--review or --review-changed)The runner handles this automatically when --review or --review-changed is passed:
scripts/review-prompt.md with lint JSON, diff, and annotationsclaude -p with the hydrated prompt, --plugin-dir, --add-dir, 10-minute timeout, $4 budget cap--parallel N (default 3) using background subshells + poll loopfield-tests/results/<timestamp>/<name>.review.mdok, timeout, error, skipped, excluded, dry-run, no-report, noneUse --review-dry-run to test prompt hydration without invoking Claude. Hydrated prompts are saved to <name>.review-prompt.md.
Produce field-tests/reports/<date>-regression.md with:
field-tests/history.jsonl — FP count on approved extensions over last N runs--review)Update the "Latest Lint Results" and "Annotation Coverage" tables in field-tests/README.md to reflect the current run:
history.jsonl entriesannotations/*.yaml and the diff outputKeep the "Extension Catalog" and "Code Metrics" sections unchanged unless a new extension was added to the manifest.
For new false positives on EGO-approved extensions that are confirmed FP (not borderline):
Create a GitHub issue:
false-positiveFalse positive: R-XXXX-NN on <extension>--update-baselines)bash scripts/field-test-runner.sh --update-baselines --no-fetch
field-tests/
├── manifest.yaml # Extension sources (committed)
├── cache/ # Downloaded extensions (gitignored)
├── baselines/ # Golden JSON snapshots (committed)
├── annotations/ # Finding classifications (committed)
├── results/ # Timestamped run output (gitignored)
├── history.jsonl # Trend data (committed)
└── reports/ # Regression reports (committed)
scripts/
├── field-test-runner.sh # Bash orchestrator
├── parse-manifest.py # Manifest YAML → JSON
├── parse-lint-results.py # ego-lint output → JSON
├── diff-baselines.py # Baseline comparison
├── review-prompt.md # Review prompt template ({{PLACEHOLDER}} tokens)
└── hydrate-review-prompt.py # Template hydrator (lint JSON + diff + annotations)
findings:
- id: "R-SEC-22::dconf CLI spawn"
classification: tp
notes: "dconf import/export — legitimate but needs disclosure"
- id: "init/shell-modification::constructor"
classification: fp
notes: "Constructor called from enable(). Fixed in PR #21"
/ego-field-test — see immediate impact across all extensionsfield-tests/README.md results and annotation tables/ego-field-test --update-baselines to snapshot improved stateGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub zvibaratz/gnome-extension-reviewer