Preflight UX
Preflight UX is an open toolkit for pre-ship UX risk review. It runs structured
persona panels against the same product surface, normalizes findings into a
shared issue taxonomy, and keeps the raw evidence needed to score predictions
against known product outcomes.
The project is designed for teams that want faster early critique without
confusing model output for user research. A panel finding is a product-risk
hypothesis until it is validated by benchmark scoring, user observation,
telemetry, support data, or another real-world signal.
What It Contains
- A persona library for expert and user-type review lenses.
- An issue taxonomy for common pre-ship UX risks.
- Benchmark scaffolds for products with documented post-launch UX issues.
- JSON schemas for review briefs, panel runs, findings, benchmark metadata, and
scores.
- A small Python CLI for validation, benchmark scaffolding, run scaffolding,
scoring, and report generation.
- A deployable BYOK web UI for building review briefs, attaching screenshots,
running a panel through a user-provided model key, and exporting repo-ready
artifacts.
Current Status
This repository is an early public scaffold. The method is intentionally
evidence-scoped:
- The schemas, taxonomy, prompts, CLI, web UI, and structural checks are usable.
- Seed benchmark entries are draft examples until their launch surfaces and
scorer reviews are complete.
- Personas are not benchmark-validated yet.
- Reports should be treated as decision support, not as validated user research.
The next credibility step is to promote benchmark entries from draft to ready,
run panels against them, and publish hits, misses, false positives, and
persona-by-issue-class calibration notes.
Positioning
Preflight UX is adjacent to browser-agent usability testing, synthetic heuristic
evaluation, persona-conditioned UI/UX evaluation, and LLM simulation benchmarks.
It does not claim that LLM personas are new.
The project focuses on a different layer: an open, repo-native calibration loop
for deciding when synthetic UX critique is useful. The intended contribution is
the combination of shared issue classes, benchmark surfaces, normalized
predictions, scored misses and false positives, persona reliability notes, and
product-ready exports.
The open-source posture is part of the method. Prompts, schemas, taxonomy,
benchmark entries, redacted execution receipts, score files, and report
templates should be inspectable enough that contributors can challenge scoring
decisions and improve the calibration loop. Public run folders keep normalized
findings and redacted execution receipts; raw model transcripts are archived
outside the public artifact surface.
See docs/POSITIONING.md and docs/EVALUATION_PROTOCOL.md.
Quick Start
Validate the repository:
python3 tools/validate_repo.py
python3 -m uxpanel validate
Install As Agent Skills
Preflight UX also ships as a small skill plugin for Claude Code/Cowork and
Codex-style skill workflows.
Claude Code:
claude plugin marketplace add sparckix/preflight-ux
claude plugin install preflight-ux-review@preflight-ux
Codex:
codex plugin marketplace add sparckix/preflight-ux
codex plugin add preflight-ux-review@preflight-ux
Installed skills:
catch-ux-risks — review a product surface for launch risks
calibrate-ux-findings — compare findings against known failures
write-ux-risk-report — turn findings into a product-ready report
The skill plugin is a distribution layer. The benchmark, CLI, schemas, and
scoring artifacts in this repo remain the source of truth.
The plugin includes a small helper script that locates this checkout from the
current workspace or PREFLIGHT_UX_REPO and delegates to python -m uxpanel
when repo-backed validation, scoring, or report generation is available.
Create a benchmark entry:
python3 -m uxpanel new-benchmark example-product-2026
Inspect benchmark readiness and scored runs:
python3 -m uxpanel benchmark-status
Create a run scaffold:
python3 -m uxpanel run \
--surface benchmark/products/example-product-2026/surface.md \
--surface-type benchmark \
--panel panels/default.yaml \
--run-id example-product-2026-seed
Create a baseline scaffold for comparison:
python3 -m uxpanel baseline \
--surface benchmark/products/example-product-2026/surface.md \
--surface-type benchmark \
--kind generic-critique \
--run-id example-product-2026-generic-baseline
Generate a Markdown report from a run:
python3 -m uxpanel report \
--run runs/example-product-2026-seed/run.json \
--out reports/example-product-2026-seed.md
Score a run against known issues:
python3 -m uxpanel score \
--run runs/example-product-2026-seed/run.json \
--benchmark benchmark/products/example-product-2026 \
--out calibration/example-product-2026-seed.score.json