From qa-hiring
Build-an-X workflow that produces a per-role QA hiring rubric - takes a role description (manual QA / SDET / automation engineer / test lead / quality manager) plus the question bank from `interview-question-author` and emits a competency-anchored scoring rubric with 4-level behavioral anchors (no-hire / borderline / hire / strong-hire) per competency. Distinct from `interview-question-author` (sibling skill that produces the questions) and from `calibration-guide-author` (sibling that produces the gold-standard answer guide). Use after the question bank exists and before the first interview is scheduled - the rubric is what brings interviewer scoring into agreement.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-hiring:hiring-rubric-authorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Without a rubric, two interviewers asking the same question produce different scores; the literature on [structured interviewing](https://en.wikipedia.org/wiki/Structured_interview) is clear that the *questions* alone are not sufficient - the scoring rubric is what converts them into a comparable signal. This skill produces the rubric half of the structured-interview pair.
Without a rubric, two interviewers asking the same question produce different scores; the literature on structured interviewing is clear that the questions alone are not sufficient - the scoring rubric is what converts them into a comparable signal. This skill produces the rubric half of the structured-interview pair.
Anchored rubrics outperform free-form scoring because the anchor descriptions at each level (no-hire / borderline / hire / strong-hire) constrain what each score means. An interviewer who reads "level 3: candidate explains the AAA pattern with a worked example and identifies one of: assertion strength, mocking pitfalls, or fixture coupling" cannot drift the score on tone or rapport - the anchor is concrete.
interview-question-author and needs the matching rubric.Do not use this skill to:
interview-question-author.calibration-guide-author. The rubric scores; the calibration guide demonstrates.Required:
| Input | Notes |
|---|---|
| Role + seniority | Same as the upstream question bank - manual QA / SDET / automation / test lead / quality manager × junior / mid / senior / staff+ |
| Question bank | The output of interview-question-author. Each question's competency tag drives the rubric's competency-by-question matrix. |
| Team's competency model | Optional. If absent, defaults to the ISTQB-aligned default model in Step 2. |
A QA hiring rubric scores against 5 - 8 competency dimensions. The default set (drawn from ISTQB Foundation Level v4.0 competencies and adapted to interviewable behaviour) per role:
defect vs failure distinction; bug-report quality; reproducibility.The skill emits the dimensions selected for the role; the team can add or remove dimensions before locking the rubric.
For each (competency × question) cell, the rubric needs four behavioural anchors. The anchor describes what the candidate said or did, not what the interviewer felt - this is the load-bearing principle that reduces interviewer noise.
### Test analysis & design — Q3 (Behavioral, STAR: late-defect catch)
| Score | Anchor (what the candidate said / did) |
|---|---|
| **1 — no hire** | Cannot articulate a partition / boundary / decision-table technique. Describes the catch as "I just got lucky." Or attributes the catch to a tool ("the linter caught it"). |
| **2 — borderline** | Names one ISTQB technique correctly but cannot apply it to the catch they describe. STAR is partial: missing Result or missing the candidate's specific Action (says "we" throughout). |
| **3 — hire** | Identifies the specific technique that caught the defect (e.g., "we had no negative test for the empty-cart case — equivalence partitioning would have flagged it"). STAR complete: situation, task, the candidate's specific action, measurable result + retro learning. |
| **4 — strong hire** | Generalises beyond the specific defect: identifies a systemic gap (e.g., "we had no convention requiring a negative test per public method; I added that to our `test-code-conventions` doc"), and ties the change to a measurable downstream improvement. |
**Probe-trigger:** If the candidate scores 2 on STAR completeness, probe for the missing component; do not deduct further on the second pass.
**Time-budget impact:** A score of 4 typically takes 2 extra minutes; budget accordingly.
Each anchor is concrete enough that two interviewers reading the same transcript would arrive at the same score - that is the only test of the anchor's quality.
The rubric outputs a per-dimension score and a summary recommendation. The summary is not a simple average:
| Per-dimension scoring rule | Summary recommendation |
|---|---|
| All dimensions ≥ 3, ≥ 1 dimension at 4 | Strong hire |
| All dimensions ≥ 3 | Hire |
| 1 dimension at 2, all others ≥ 3 | Borderline - debrief required |
| ≥ 2 dimensions at 2, no 1s | No hire - competency gap |
| Any dimension at 1 | No hire - fundamental gap |
The summary refuses to average across competencies - a candidate weak in defect lifecycle and strong in tooling depth is not "average"; the role demands both. Per-dimension floors are the load-bearing constraint.
The output is a single markdown document with:
## HAND-OFF — required next steps
1. Pair with `calibration-guide-author` to produce gold-standard model answers and common pitfalls per question — without those, the anchors here are aspirational.
2. Run a calibration interview (one panel scores the same recorded interview together) before the first real candidate. Per the structured-interview research, calibration is the dominant variable in inter-rater agreement.
3. Lock the rubric at the start of the hiring round; mid-round changes invalidate prior candidates' scores.
4. After the round, run `defect-trend-narrator`-style retro on the rubric: which competencies discriminated; which were noise; which scored everyone at 3 (a sign the anchor is too generous).
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Free-text "1 - 5 score" with no anchors | The score is the interviewer's opinion, not a behavioural observation. | Step 3 anchors are mandatory; no anchorless dimensions. |
| Anchors that describe the interviewer's feeling ("I was impressed", "the candidate seemed confident") | Tone signals; not behaviour. Interviewer noise is the dominant source. | Anchors describe what the candidate said or did verbatim. |
| Averaging dimension scores into a summary | Hides the load-bearing competency gaps. | Step 4's per-dimension floor; no averages. |
| Using the same rubric across seniority levels | A senior candidate at "score 3" is mid-level performance for that role; the absolute number means different things. | Per-seniority anchors; junior-3 ≠ senior-3. |
| Rubrics with 10+ dimensions | Interviewer can't hold them all; scoring fragments. | Cap at 5 - 8 dimensions. |
| Rubric authored without the question bank | Anchors drift from the actual questions; scoring becomes generic. | Step 1 hard-requires the question bank as input. |
| "Cultural fit" as a dimension | Documented bias amplifier; legally fraught. | Use the team's Definition of Done / engineering values translated into behavioural anchors instead. |
tooling depth higher than communication; the skill emits unweighted scores and leaves weighting to the hiring manager. Custom weights can be applied post hoc to the per-dimension scores.calibration-guide-author.interview-question-author.defect lifecycle dimension): https://glossary.istqb.org/en_US/term/defect-3interview-question-author, calibration-guide-author - sibling skills that complete the structured-interview triple.risk-matrix-recommender - the data-informed-decisions-with-traceability framing this rubric inherits for the quality-manager role's risk-prioritisation dimension.npx claudepluginhub testland/qa --plugin qa-hiringProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.