From cybersecurity-skills
Helps build, run, or improve a Security Operations Center including alert triage, runbook authoring, escalation criteria, on-call structure, and SOC metrics (MTTD/MTTR). Invoked when user mentions SOC, security operations, runbooks, escalation, or SOC staffing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cybersecurity-skills:soc-operationsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
The operations layer above `siem-detection` (engineering rules) and `incident-triage` (response). This skill is about the *people and process* of running 24/7 alert triage — alert prioritization, runbook authoring, escalation, on-call hygiene, MTTD / MTTR, and the slow drift toward alert fatigue that kills SOCs.
The operations layer above siem-detection (engineering rules) and incident-triage (response). This skill is about the people and process of running 24/7 alert triage — alert prioritization, runbook authoring, escalation, on-call hygiene, MTTD / MTTR, and the slow drift toward alert fatigue that kills SOCs.
Three modes:
Cross-references: siem-detection (the rules that feed alerts to the SOC), incident-triage (the playbook for confirmed incidents), threat-hunting (proactive work between alert triage), breach-patterns (what attacks the SOC should be ready for).
| Model | When it fits | Pitfalls |
|---|---|---|
| Fully in-house | Mature security org, ≥ 5 dedicated analysts, regulated industry | Hard to staff 24/7 with under 8 people; on-call burnout |
| Fully outsourced (MSSP) | Smaller orgs, regulated requirements without internal staffing | MSSP context drift — they don't know your business; alert tuning slow |
| Hybrid (MSSP Tier 1, in-house Tier 2+) | Most common for growth-stage companies | Handoff complexity, "MSSP filtered it but didn't tell us" gaps |
For 24/7 in-house coverage, the math:
Standard tier model (adjust to taste):
siem-detection), threat hunting (see threat-hunting), incident response leadership (see incident-triage). Senior, expensive, the ones building the SOC's capability.For small SOCs (≤ 4 analysts), the tiers collapse — every analyst does T1+T2 work, T3 is part-time or contracted. Don't pretend you have tiers if you don't.
siem-detection)When ten alerts land in five minutes, what's the order?
The default ranking (tweak per environment):
Critical and High should never wait. Medium triages within shift (≤ 8 hours). Low and Info batched.
Every alert that fires more than 2-3 times needs a runbook. Without one, every analyst re-derives the response and quality varies.
Runbook structure:
# Runbook: [Alert name]
## Trigger: [exact SIEM rule / detection name]
## Owner: [team / person]
## Last reviewed: [date]
## Quick reference
- **What the alert means in plain English:** [one sentence]
- **Common false positives:** [list, with how to recognize]
- **Common true positives:** [list, with what to look for next]
## Triage steps
1. [Step 1 — specific query / action]
2. [Step 2]
3. [Decision point — true positive / false positive / escalate]
## False positive handling
- [How to close + what to document]
- [Whether to add to suppression list]
## True positive handling
- [Immediate containment if any]
- [Escalation to whom, with what info]
- [Link to incident-triage skill for full IR]
## Common pivots
- [Other data sources to check]
- [Other systems likely affected]
Runbooks live in version control or a versioned wiki — not in Slack DMs, not in individual analyst notes.
Escalation rules should be explicit, not "use your judgment." Judgment lives at Tier 3+; lower tiers need rules.
| Condition | Escalate to | How quickly |
|---|---|---|
| Active data egress observed | Tier 2 / on-call | Immediately |
| Privileged account behavior anomaly | Tier 2 | This shift |
| Multiple correlated alerts on one host | Tier 2 | This shift |
| Detection rule firing > 50× / hour with high TP rate | Tier 3 (engineering) | Next business day |
| Detection rule firing > 50× / hour with 100% FP | Tier 3 (engineering) | Next business day |
| Anything Tier 1 doesn't know how to handle | Tier 2 | After 30 minutes of triage |
The single most preventable cause of breaches detected days late is "the night shift had something interesting and the morning shift never heard about it."
Handoff checklist (5–10 minutes per shift):
Write it down. Slack handoff channel, daily summary doc, ticketing system shift report — any format works as long as it's persistent and searchable.
| Metric | Definition | Target |
|---|---|---|
| MTTD | Mean Time To Detect — from event to alert firing | < 5 min for high-confidence rules; < 1 hr for behavioral |
| MTTR | Mean Time To Respond — from alert to triage decision | Tier 1: < 15 min for Critical; < 4 hr for Medium |
| MTTC | Mean Time To Contain — from confirmed incident to spread halted | < 1 hr for confirmed compromise (industry P50 is ~6 days; aspire higher) |
| TP rate per rule | True positives / total alerts | > 30% for any rule; tune or kill rules below |
| Alert volume per analyst per shift | Alerts per Tier 1 analyst per 8-hour shift | < 25 — above that, fatigue dominates |
| Coverage by ATT&CK tactic | Tactics with at least one detection rule | Aim for 100% Initial Access, Execution, Persistence, Defense Evasion, Credential Access |
| Runbook coverage | % of alerts that have runbooks | > 80% of alert volume |
| Time to runbook | New alert-type to runbook delivered | < 5 alerts of the new type |
Don't measure things you can't act on. "Number of alerts processed" is a vanity metric — it goes up with noisier rules.
The dominant SOC failure mode is alert fatigue from un-tuned rules. The loop that prevents it:
See siem-detection's tuning section for the engineering side; this is the operations side.
Symptoms your SOC is suffering:
When you see these: full-stop the new-rule pipeline and spend a cycle on tuning. Adding more rules to an over-firing SOC makes things worse.
# SOC Assessment / Build Plan
## Organization: [name]
## Mode: Build / Run / Improve
## Date: [date]
### Current state (or proposed)
- Coverage hours, staffing model
- Tier structure
- Tools in use
- Alert volume / day
- Key metrics — MTTD, MTTR, TP rate distribution
### Findings (Improve mode) / Gaps (Build mode)
| Category | Issue / Gap | Severity |
|----------|-------------|----------|
### Recommendations
| Priority | Item | Owner | Timeline |
|----------|------|-------|----------|
### KPI dashboard proposal
[What to measure, how to source it, target thresholds]
### Runbook backlog
[Alert types that lack runbooks, ranked by volume]
incident-triage — don't continue planning work mid-fireattack.mitre.org/resources/trainingnpx claudepluginhub briiirussell/cybersecurity-skills --plugin cybersecurity-skillsBuilds SOC metrics dashboards (MTTD, MTTR, alert quality, coverage) from SIEM data for operational visibility and executive reporting.
Builds SOC KPI dashboards tracking MTTD, MTTR, alert quality ratios, analyst productivity, and detection coverage from SIEM data for operational visibility and reporting.
Builds SOC performance metrics and KPI dashboards for MTTD, MTTR, alert quality, analyst productivity, detection coverage. Use for operational visibility and executive reporting.