From qa-test-data-privacy
Pure-reference catalog of personally identifiable information (PII) categories across GDPR, CCPA/CPRA, NIST SP 800-122, and HIPAA. Defines what counts as personal data under each regime, enumerates the explicit identifiers each regulator lists (GDPR Art. 4(1) and Art. 9 special categories; CPRA sensitive personal information; NIST direct-identifier vs linkable distinction; HIPAA Safe Harbor 18 identifiers), and maps overlapping fields across jurisdictions so a masking pipeline knows which regulator's rules apply. Use as the authoritative source when authoring or reviewing masking rules, classifying a dataset's risk level, or scoping which fields a PII detector must catch.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-test-data-privacy:pii-categories-referenceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is the **canonical category catalog** that downstream
This skill is the canonical category catalog that downstream
masking workflows
(pii-masking-pipeline-builder)
and detectors
(presidio-pii-detection)
reference for scope. It enumerates four regimes:
This is a pure reference - no execution steps. Workflow skills in this plugin consume it.
presidio-pii-detection).pii-leak-critic).Definition (Article 4(1)): "any information relating to an identified or identifiable natural person ('data subject')" (gdpr-info.eu/art-4-gdpr/).
The article enumerates identifiers that make a person identifiable:
| Identifier class | Examples |
|---|---|
| Name | Given name, surname, full name, online aliases linked to the person |
| Identification number | National ID, passport, driver's licence, tax ID, employee ID |
| Location data | GPS coordinates, IP-derived city/region, cell-tower triangulation |
| Online identifier | IP address, cookie ID, device fingerprint, advertising ID (per Recital 30) |
| Physical/physiological factor | Height, weight, eye colour, fingerprint, gait |
| Genetic factor | DNA-derived information (further defined in Art. 4(13)) |
| Mental factor | Diagnosed mental-health conditions, IQ test results |
| Economic factor | Salary, credit score, transaction history, account balances |
| Cultural factor | Language, religion, ethnic background |
| Social factor | Marital status, family relationships, social-network connections |
Source: Article 4(1) GDPR (gdpr-info.eu/art-4-gdpr/).
Article 9(1) lists categories whose processing is prohibited by default unless one of the Article 9(2) exceptions applies:
A masking pipeline for an EU dataset must apply at least the broader Art. 4(1) rules and stricter rules to any field falling under Art. 9 (special categories carry higher fines and must be either redacted or fully anonymised, not merely pseudonymised).
"Pseudonymisation" (Art. 4(5)) keeps data attributable to a subject with additional information, kept separately. Pseudonymised data is still personal data under GDPR - it remains in scope.
Anonymised data (no longer linkable to a subject under any
reasonably likely method, per Recital 26) falls out of GDPR
scope. The masking pipeline must mark which output is which
(data-masking-techniques-reference
explains the techniques).
Definition (Cal. Civ. Code § 1798.140(v)(1), as amended by CPRA): "information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household" (oag.ca.gov/privacy/ccpa).
Statutory categories enumerated in § 1798.140(v)(1)(A) - (K):
| # | Category | Examples |
|---|---|---|
| A | Identifiers | Name, postal address, email, IP address, account name, SSN, driver's licence, passport |
| B | Customer records | Records covered by Cal. Civ. Code § 1798.80(e) - name, signature, education, employment, financial info, medical, health-insurance, with paper/electronic regardless of storage medium |
| C | Protected classifications | Race, religion, gender, sexual orientation, age, national origin, disability, marital status (under California or federal law) |
| D | Commercial information | Purchases, products considered, consuming history |
| E | Biometric information | Fingerprints, retina, hand prints, voice recordings, keystroke patterns |
| F | Internet/network activity | Browsing history, search history, interaction with a website or app |
| G | Geolocation data | Physical location, movements, especially "precise geolocation" (CPRA refinement) |
| H | Sensory data | Audio, electronic, visual, thermal, olfactory recordings |
| I | Professional/employment | Job titles, salaries, employment records |
| J | Education | Education records as defined in 20 USC § 1232g (FERPA) |
| K | Inferences | Profile drawn from any of A - J to predict preferences, characteristics, predispositions, behaviour |
CPRA added a subcategory of personal information requiring extra protection (Cal. Civ. Code § 1798.140(ae)):
Citation: oag.ca.gov/privacy/ccpa "Sensitive Personal Information" (oag.ca.gov/privacy/ccpa).
Definition (citing OMB Memorandum 07-16, reproduced in NIST SP 800-122 Section 2.1): "information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc., alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc."
Citation: NIST SP 800-122:2010 §2.1, fetched from csrc.nist.gov/pubs/sp/800/122/final.
NIST 800-122 §2.2 introduces a crucial distinction:
A masking pipeline must consider linkable fields (e.g., birth date alone isn't identifying, but date + zip + sex is - the Sweeney 87 % rule). The pipeline shouldn't only protect direct identifiers.
NIST 800-122 §3 names six factors that drive the PII confidentiality impact level (low / moderate / high):
Masking aggressiveness scales with impact level.
For health data (PHI), the HIPAA Privacy Rule defines two de-identification methods (Expert Determination, 45 CFR § 164.514(b)(1), and Safe Harbor, 45 CFR § 164.514(b)(2)). Safe Harbor requires removing all of these 18 identifiers (per HHS guidance, hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification):
A masking pipeline operating on health data must catch all 18; a detector configured only for GDPR's broader categories will miss HIPAA-required identifiers (e.g., medical record number is not explicit in GDPR Art. 4(1) - covered by "identification number" but a detector may not flag it without a HIPAA-specific recogniser).
The fastest way to scope a masking pipeline is to enumerate fields present in the dataset and look up which regimes flag each:
| Field | GDPR Art. 4(1) | GDPR Art. 9 | CCPA/CPRA | CPRA SPI | NIST 800-122 | HIPAA Safe Harbor |
|---|---|---|---|---|---|---|
| Full name | ✓ | - | ✓ (A) | - | ✓ | ✓ (#1) |
| ✓ | - | ✓ (A) | - | ✓ | ✓ (#6) | |
| Phone | ✓ | - | ✓ (A) | - | ✓ | ✓ (#4) |
| SSN | ✓ | - | ✓ (A, B) | ✓ | ✓ | ✓ (#7) |
| Passport / driver's licence | ✓ | - | ✓ (A) | ✓ | ✓ | ✓ (#11) |
| IP address | ✓ (Recital 30) | - | ✓ (A) | - | linkable | ✓ (#15) |
| Cookie / device ID | ✓ | - | ✓ (A) | - | linkable | ✓ (#13) |
| Birth date | linkable | - | ✓ (A) | - | linkable | ✓ (#3 - months/days) |
| Precise geolocation | ✓ | - | ✓ (G) | ✓ | ✓ | ✓ (#2 - sub-state) |
| Race / ethnicity | ✓ | ✓ | ✓ (C) | ✓ | - | - |
| Religion | ✓ | ✓ | ✓ (C) | ✓ | - | - |
| Sexual orientation | ✓ | ✓ | ✓ (C) | ✓ | - | - |
| Health condition | ✓ | ✓ (Art. 4(15)) | ✓ (B) | ✓ | ✓ | - (covered by PHI rules) |
| Genetic data | ✓ | ✓ (Art. 4(13)) | ✓ (B) | ✓ | - | - |
| Biometric (face, fingerprint) | ✓ | ✓ (Art. 4(14)) | ✓ (E) | ✓ (if uniquely identifying) | ✓ | ✓ (#16, #17) |
| Account login + password | ✓ | - | ✓ (A) | ✓ | ✓ | ✓ (#10) |
| Credit-card / IBAN | ✓ | - | ✓ (A, D) | ✓ | ✓ | ✓ (#10) |
| Medical record number | ✓ | - (covered in B) | ✓ (B) | ✓ (health subset) | ✓ | ✓ (#8) |
| Browsing history | ✓ | - | ✓ (F) | - | ✓ | ✓ (#14) |
| Purchase records | ✓ | - | ✓ (D) | - | ✓ | - |
| Inferred profile / score | ✓ | - | ✓ (K) | - | linkable | - |
"linkable" = field alone may not identify, but combined with other fields it does (NIST §2.2).
| Confusion | Reality |
|---|---|
| "PII = SSN, name, email." | These are subsets. GDPR personal data includes online identifiers, location, biometrics, inferences. Use the full Art. 4(1) list. |
| "If we pseudonymise, GDPR doesn't apply." | False. Pseudonymised data remains personal data under GDPR Art. 4(5); only full anonymisation removes it from scope. |
| "CCPA only covers consumers." | CCPA "consumer" includes employees and job applicants under CPRA (Cal. Civ. Code § 1798.140(i)). |
| "HIPAA only covers hospitals." | HIPAA covers covered entities (providers, plans, clearinghouses) and business associates. Business associates inherit HIPAA obligations via BAAs. |
| "Birth date alone isn't PII." | Per NIST §2.2 it's linkable - combined with ZIP + sex it identifies ~87 % of US population (Sweeney 2000). Treat as PII. |
| "IP address isn't personal data." | GDPR Recital 30 lists IP addresses as online identifiers. CJEU Breyer (C-582/14) confirmed dynamic IPs are personal data when linkable. |
| "CPRA SPI is the same as GDPR Art. 9." | Overlaps but isn't identical - CPRA SPI explicitly includes government IDs + financial-account + login credentials that aren't in Art. 9. Map both lists separately. |
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Single-list scoping | Only catches one regime's identifiers; leaks the others. | Use the cross-jurisdiction map above as the union scope. |
| Treating PHI as "just sensitive PII" | HIPAA Safe Harbor has 18 specific identifiers - birth date months, vehicle IDs, certificate numbers - that GDPR lists don't enumerate. | Apply HIPAA Safe Harbor when the dataset is PHI. |
| Mapping CCPA to GDPR Art. 9 only | CPRA SPI includes financial + government identifiers Art. 9 doesn't. | Apply CPRA SPI as a separate scope layer. |
| Stopping at "direct identifiers" | NIST §2.2 says linkable info is PII. Date-of-birth + ZIP + sex re-identifies most individuals. | Include linkable fields in scope. |
| Pseudonymisation = anonymisation | GDPR Art. 4(5) keeps pseudonymised data personal. | Document which masking outputs are pseudonymised (in scope) vs anonymised (out of scope). |
| Ignoring inferred profiles | CCPA category K covers inferences. A "risk score" derived from PII is itself PII. | Treat inferred / derived fields the same as their sources. |
presidio-pii-detection)
finds patterns that look like PII; it cannot guarantee
category-completeness. Reviewer must spot-check.synthetic-pii-generator - generates fake PII for test fixtures (different scope; this
reference defines what to mask in existing data).pii-masking-pipeline-builder,
presidio-pii-detection,
pii-leak-critic.npx claudepluginhub testland/qa --plugin qa-test-data-privacySearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.