Skill

hiring-rubric-author

Build-an-X workflow that produces a per-role QA hiring rubric - takes a role description (manual QA / SDET / automation engineer / test lead / quality manager) plus the question bank from `interview-question-author` and emits a competency-anchored scoring rubric with 4-level behavioral anchors (no-hire / borderline / hire / strong-hire) per competency. Distinct from `interview-question-author` (sibling skill that produces the questions) and from `calibration-guide-author` (sibling that produces the gold-standard answer guide). Use after the question bank exists and before the first interview is scheduled - the rubric is what brings interviewer scoring into agreement.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qa-hiring:hiring-rubric-author

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Without a rubric, two interviewers asking the same question produce different scores; the literature on [structured interviewing](https://en.wikipedia.org/wiki/Structured_interview) is clear that the *questions* alone are not sufficient - the scoring rubric is what converts them into a comparable signal. This skill produces the rubric half of the structured-interview pair.

SKILL.md

169 lines · ~3.3k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 7, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

hiring-rubric-author

Overview

Without a rubric, two interviewers asking the same question produce different scores; the literature on structured interviewing is clear that the questions alone are not sufficient - the scoring rubric is what converts them into a comparable signal. This skill produces the rubric half of the structured-interview pair.

Anchored rubrics outperform free-form scoring because the anchor descriptions at each level (no-hire / borderline / hire / strong-hire) constrain what each score means. An interviewer who reads "level 3: candidate explains the AAA pattern with a worked example and identifies one of: assertion strength, mocking pitfalls, or fixture coupling" cannot drift the score on tone or rapport - the anchor is concrete.

When to use

The team has authored a question bank via interview-question-author and needs the matching rubric.
An existing rubric is being recalibrated after a hiring round (the team's gold-standard answers have shifted as the role evolved).
A team is adding a new competency dimension to an existing loop (e.g., adding "test data engineering" to an existing automation rubric).

Do not use this skill to:

Author the questions - that is interview-question-author.
Author the gold-standard model answers and common pitfalls - that is calibration-guide-author. The rubric scores; the calibration guide demonstrates.
Score generic engineering / non-QA roles. The competency model is QA-specific.

Step 1 - Capture the inputs

Required:

Input	Notes
Role + seniority	Same as the upstream question bank - manual QA / SDET / automation / test lead / quality manager × junior / mid / senior / staff+
Question bank	The output of `interview-question-author`. Each question's competency tag drives the rubric's competency-by-question matrix.
Team's competency model	Optional. If absent, defaults to the ISTQB-aligned default model in Step 2.

Step 2 - Pick the competency dimensions

A QA hiring rubric scores against 5 - 8 competency dimensions. The default set (drawn from ISTQB Foundation Level v4.0 competencies and adapted to interviewable behaviour) per role:

manual-qa-engineer / qa-automation-engineer

Test analysis & design - partitioning, boundary, decision-table reasoning per ISTQB technique.
Defect lifecycle - defect vs failure distinction; bug-report quality; reproducibility.
Test code conventions (automation only) - AAA structure, assertion strength, mocking discipline.
Tooling depth - fluency with the team's primary toolchain (Playwright / Cypress / Selenium / pytest / JUnit / etc.).
Communication - written bug reports; verbal hand-off to engineering.
Domain reasoning - applies QA techniques to the team's domain (fintech / healthcare / consumer mobile).

sdet

Test analysis & design.
Test code conventions.
Test framework / tool architecture - how to extend the team's framework; CI integration; flake budget.
Production-quality coding - AAA, refactoring, naming, fixture cleanliness.
System reasoning - service boundaries; what to test at which layer.
Communication & collaboration.

test-lead

Test strategy authoring - risk-based testing; the test pyramid as an argument, not a template.
Stakeholder management - engineering, product, support, leadership.
Hiring & coaching of QA team members - itself; the candidate has done this.
Defect management at the team / cross-team layer.
Tooling & CI ownership.
Communication (written + verbal, exec-level).

quality-manager

Quality strategy across releases / quarters.
Risk-based prioritisation - the risk-matrix-recommender framing; data-informed decisions with traceability.
Stakeholder communication, exec-level.
Hiring & team development.
Process / methodology fluency - agile, BDD, shift-left, shift-right, when each applies.
Defect / escape management at the org layer.
Compliance / regulated-industry framing (if applicable).

The skill emits the dimensions selected for the role; the team can add or remove dimensions before locking the rubric.

Step 3 - Author the 4-level anchors per dimension

For each (competency × question) cell, the rubric needs four behavioural anchors. The anchor describes what the candidate said or did, not what the interviewer felt - this is the load-bearing principle that reduces interviewer noise.

### Test analysis & design — Q3 (Behavioral, STAR: late-defect catch)

| Score | Anchor (what the candidate said / did) |
|---|---|
| **1 — no hire** | Cannot articulate a partition / boundary / decision-table technique. Describes the catch as "I just got lucky." Or attributes the catch to a tool ("the linter caught it"). |
| **2 — borderline** | Names one ISTQB technique correctly but cannot apply it to the catch they describe. STAR is partial: missing Result or missing the candidate's specific Action (says "we" throughout). |
| **3 — hire** | Identifies the specific technique that caught the defect (e.g., "we had no negative test for the empty-cart case — equivalence partitioning would have flagged it"). STAR complete: situation, task, the candidate's specific action, measurable result + retro learning. |
| **4 — strong hire** | Generalises beyond the specific defect: identifies a systemic gap (e.g., "we had no convention requiring a negative test per public method; I added that to our `test-code-conventions` doc"), and ties the change to a measurable downstream improvement. |

**Probe-trigger:** If the candidate scores 2 on STAR completeness, probe for the missing component; do not deduct further on the second pass.
**Time-budget impact:** A score of 4 typically takes 2 extra minutes; budget accordingly.

Each anchor is concrete enough that two interviewers reading the same transcript would arrive at the same score - that is the only test of the anchor's quality.

Step 4 - Compute the role-level summary score

The rubric outputs a per-dimension score and a summary recommendation. The summary is not a simple average:

Per-dimension scoring rule	Summary recommendation
All dimensions ≥ 3, ≥ 1 dimension at 4	Strong hire
All dimensions ≥ 3	Hire
1 dimension at 2, all others ≥ 3	Borderline - debrief required
≥ 2 dimensions at 2, no 1s	No hire - competency gap
Any dimension at 1	No hire - fundamental gap

The summary refuses to average across competencies - a candidate weak in defect lifecycle and strong in tooling depth is not "average"; the role demands both. Per-dimension floors are the load-bearing constraint.

Step 5 - Emit the rubric

The output is a single markdown document with:

Header: role, seniority, source question bank reference, competency dimensions, summary-rule table.
Per-question scoring sections (one per question in the bank, scoring against each competency the question targets - typically 1 - 2 competencies per question).
Summary recommendation rules.
Hand-off block:

## HAND-OFF — required next steps

1. Pair with `calibration-guide-author` to produce gold-standard model answers and common pitfalls per question — without those, the anchors here are aspirational.
2. Run a calibration interview (one panel scores the same recorded interview together) before the first real candidate. Per the structured-interview research, calibration is the dominant variable in inter-rater agreement.
3. Lock the rubric at the start of the hiring round; mid-round changes invalidate prior candidates' scores.
4. After the round, run `defect-trend-narrator`-style retro on the rubric: which competencies discriminated; which were noise; which scored everyone at 3 (a sign the anchor is too generous).

Anti-patterns

Anti-pattern	Why it fails	Fix
Free-text "1 - 5 score" with no anchors	The score is the interviewer's opinion, not a behavioural observation.	Step 3 anchors are mandatory; no anchorless dimensions.
Anchors that describe the interviewer's feeling ("I was impressed", "the candidate seemed confident")	Tone signals; not behaviour. Interviewer noise is the dominant source.	Anchors describe what the candidate said or did verbatim.
Averaging dimension scores into a summary	Hides the load-bearing competency gaps.	Step 4's per-dimension floor; no averages.
Using the same rubric across seniority levels	A senior candidate at "score 3" is mid-level performance for that role; the absolute number means different things.	Per-seniority anchors; junior-3 ≠ senior-3.
Rubrics with 10+ dimensions	Interviewer can't hold them all; scoring fragments.	Cap at 5 - 8 dimensions.
Rubric authored without the question bank	Anchors drift from the actual questions; scoring becomes generic.	Step 1 hard-requires the question bank as input.
"Cultural fit" as a dimension	Documented bias amplifier; legally fraught.	Use the team's Definition of Done / engineering values translated into behavioural anchors instead.

Limitations

The rubric is only as good as its anchors. Vague anchors produce inter-rater drift; concrete behavioural anchors take time to author and refine.
Anchor-validation requires real candidate data. Until the rubric has been used through 5 - 10 interviews, its anchors are theoretical. Plan a calibration interview before the first real candidate.
Rubrics drift over time. A rubric authored in 2024 may anchor on tools that are no longer the team's default. Re-author per hiring round, or at least review.
No fairness audit. The skill does not check the rubric for bias against protected classes - that is the team's HR / legal review (out of marketplace scope).
Weighting is uniform per dimension. Some teams want to weight tooling depth higher than communication; the skill emits unweighted scores and leaves weighting to the hiring manager. Custom weights can be applied post hoc to the per-dimension scores.

Hand-off targets

Calibrate interviewers (gold-standard answers, common pitfalls) → calibration-guide-author.
Author the question bank (upstream) → interview-question-author.
Compliance review of the rubric → team's legal / HR review (out of marketplace scope).

References

ISTQB Certified Tester Foundation Level v4.0 syllabus - the competency model adapted into the default dimensions per role: https://www.istqb.org/certifications/certified-tester-foundation-level
ISTQB glossary - defect / failure distinction (load-bearing for the defect lifecycle dimension): https://glossary.istqb.org/en_US/term/defect-3
Structured interview research - Levashina et al. 2014 (Personnel Psychology) on the validity uplift from structured rubrics + same questions / same order.
STAR behavioral interviewing method - Situation / Task / Action / Result framework, used in the behavioural-question anchors: https://en.wikipedia.org/wiki/Situation,_task,_action,_result
Bloom's taxonomy - K1 - K4 cognitive levels used to align the rubric's anchor depth with the question's intended difficulty: https://en.wikipedia.org/wiki/Bloom%27s_taxonomy
PractiTest 2026 State of Testing Report - hiring rubric authoring named as a high-adoption, low-risk AI use case for QA managers: https://www.practitest.com/state-of-testing/
interview-question-author, calibration-guide-author - sibling skills that complete the structured-interview triple.
risk-matrix-recommender - the data-informed-decisions-with-traceability framing this rubric inherits for the quality-manager role's risk-prioritisation dimension.

hiring-rubric-author

Invocation

Context Preview

SKILL.md

hiring-rubric-author

Invocation

Context Preview

SKILL.md

hiring-rubric-author

Overview

When to use

Step 1 - Capture the inputs

Step 2 - Pick the competency dimensions

manual-qa-engineer / qa-automation-engineer

sdet

test-lead

quality-manager

Step 3 - Author the 4-level anchors per dimension

Step 4 - Compute the role-level summary score

Step 5 - Emit the rubric

Anti-patterns

Limitations

Hand-off targets

References

Similar Skills

hiring-rubric-author

Overview

When to use

Step 1 - Capture the inputs

Step 2 - Pick the competency dimensions

manual-qa-engineer / qa-automation-engineer

sdet

test-lead

quality-manager

Step 3 - Author the 4-level anchors per dimension

Step 4 - Compute the role-level summary score

Step 5 - Emit the rubric

Anti-patterns

Limitations

Hand-off targets

References

Similar Skills