From dstoic
Detects and anonymizes PII (SSNs, cards, emails, phones, names) and business data (companies, revenue, costs, pricing) in files using Scrubadub and spaCy NER. Supports check-only or 5 strategies (mask/hash/pseudo/token/mixed); GDPR/HIPAA aware.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dstoic:anonymize-docsonnetThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Detect and anonymize PII + sensitive business data using ML-powered detection (Scrubadub + spaCy NER).
Detect and anonymize PII + sensitive business data using ML-powered detection (Scrubadub + spaCy NER).
SKILL_DIR="$(dirname "$(realpath "$0")" 2>/dev/null || echo /home/mat/dev/agent-skills/dstoic/skills/anonymize-doc)"
pip install -q -r "$SKILL_DIR/requirements.txt" && python -m spacy download -q en_core_web_sm 2>/dev/null
Question: "Check for PII/business data (detection only) or anonymize?"
python "$SKILL_DIR/scripts/detect.py" <file_path>Ask strategy:
| Strategy | Use Case | Reversible | GDPR |
|---|---|---|---|
mask | Max privacy, redaction | No | ✅ Full |
hash | Analytics, tracking | No | ✅ Full |
pseudo | Demos, case studies | Yes | ⚠️ Partial |
token | Financial, vault-backed | Yes | ⚠️ Partial |
mixed | Complex docs (auto per severity) | Mixed | ⚠️ Partial |
Run: python "$SKILL_DIR/scripts/anonymize.py" <file_path> --strategy <choice>
Outputs: <file>-anonymized.<ext> + <file>-audit-log.json
.gitignore has *-audit-log.jsonSee reference.md for entity types, severity tiers, compliance details.
See examples.md for before/after samples.
npx claudepluginhub digital-stoic-org/agent-skills --plugin dstoicBuilds automated PII detection and redaction pipelines using spaCy NER, Microsoft Presidio, and AWS Macie. Handles confidence scoring, custom entities, batch workflows, and multi-format document scanning.
Apply anonymization and pseudonymization techniques for LGPD compliance in analytics/ML pipelines. Covers tokenization, k-anonymity, differential privacy, with anti-patterns and re-identification tests.
Guides medical researchers in de-identifying clinical data before LLM analysis using a local Python CLI with regex-based PHI detection. Supports 10 country locales (kr, us, jp, cn, de, uk, fr, ca, au, in) and CSV/TSV/Excel input.