From prodsec-skills
Creates custom Semgrep rules for detecting security vulnerabilities, bug patterns, and code patterns. Includes testing and validation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:semgrep-rule-creatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create production-quality Semgrep rules with proper testing and validation.
Create production-quality Semgrep rules with proper testing and validation.
Ideal scenarios:
Do NOT use this skill for:
static-analysis skill)When writing Semgrep rules, reject these common shortcuts:
semgrep --test --config <rule-id>.yaml <rule-id>.<ext> to verify. Untested rules have hidden false positives/negatives.Too broad - matches everything, useless for detection:
# BAD: Matches any function call
pattern: $FUNC(...)
# GOOD: Specific dangerous function
pattern: eval(...)
Missing safe cases in tests - leads to undetected false positives:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)
# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)
# ok: my-rule
dangerous(sanitize(user_input))
# ok: my-rule
dangerous("hardcoded_safe_value")
Overly specific patterns - misses variations:
# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)
# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sources:
- pattern: input(...)
pattern-sinks:
- pattern: os.system(...)
This workflow is strict - do not skip steps:
languages: generic)todook and todoruleid test annotations: todoruleid: <rule-id> and todook: <rule-id> annotations in tests files for future rule improvements are forbiddenThis skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.
Approach selection:
Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.
Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.
Output structure - exactly 2 files in a directory named after the rule-id:
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file with ruleid/ok annotations
rules:
- id: insecure-eval
languages: [python]
severity: HIGH
message: User input passed to eval() allows code execution
mode: taint
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: eval(...)
Test file (insecure-eval.py):
# ruleid: insecure-eval
eval(request.args.get('code'))
# ok: insecure-eval
eval("print('safe')")
Run tests (from rule directory): semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
Detailed syntax and workflow are inlined below (from upstream references/quick-reference.md and references/workflow.md). (see upstream Trail of Bits prodsec-skills for companion files)
Copy this checklist and track progress:
Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run
Full step descriptions are in Inlined: workflow below.
REQUIRED: Before writing any rule, read all of these Semgrep documentation sources (fetch or open the URLs in a browser / via HTTP):
references/quick-reference.md)rules:
- id: rule-id # Unique identifier (lowercase, hyphens)
languages: [python] # Target language(s)
severity: HIGH # LOW, MEDIUM, HIGH, CRITICAL (ERROR/WARNING/INFO are legacy)
message: Description # Shown when rule matches
pattern: code(...) # OR use patterns/pattern-either/mode:taint
# 'pattern' is the basic unit of matching
pattern: foo(...)
# 'patterns' forms a logical AND - all must match
patterns:
- pattern: $X
- pattern-not: safe($X)
# 'pattern-either' forms a logical OR - any can match
pattern-either:
- pattern: foo(...)
- pattern: bar(...)
# 'pattern-regex' performs PCRE2 regex matching (multiline mode)
pattern-regex: ^foo.*bar$
$VAR - Metavariable, match a single expression
$X, $FUNC, $VAR_1 (NOT $x, $var)$_ - Anonymous metavariable, matches but doesn't bind$...VAR - Ellipsis metavariable, match zero or more arguments... - Ellipsis, match anything in between statements or expressions<... [pattern] ...> - Deep expression operator, match nested expressionConstrain metavariables to specific types (reduces false positives):
# C/C++ - match only int16_t parameters
pattern: (int16_t $X)
# C/C++ - match function with typed parameter
pattern: some_func((int $ARG))
# Java - match Logger type
pattern: (java.util.logging.Logger $LOGGER).log(...)
# Go - match pointer type (uses colon syntax)
pattern: ($READER : *zip.Reader).Open($INPUT)
# TypeScript - match specific type
pattern: ($X: DomSanitizer).sanitize(...)
# Use in taint mode to track only specific types as sources:
pattern-sources:
- pattern: (int $X) # Only int parameters are taint sources
- pattern: (int16_t $X) # Only int16_t parameters
- pattern: int $X = $INIT; # Local variable declarations
pattern-inside: | # Must be inside this pattern
def $FUNC(...):
...
pattern-not-inside: | # Must NOT be inside this pattern
with $CTX:
...
pattern-not: safe(...) # Exclude this pattern
pattern-not-regex: ^test_ # Exclude by regex
metavariable-regex:
metavariable: $FUNC
regex: (unsafe|dangerous).*
metavariable-pattern:
metavariable: $ARG
pattern: request.$X
metavariable-comparison:
metavariable: $NUM
comparison: $NUM > 1024
# In pattern matching mode: report finding on this metavariable only
focus-metavariable: $TARGET
# In taint mode: constrain where taint flows in sources, sinks, and sanitizers
pattern-sources:
- patterns:
- pattern: mutate_argument(&$REF_VAR)
- focus-metavariable: $REF_VAR
by-side-effect: only
rules:
- id: taint-rule
mode: taint
languages: [python]
severity: HIGH
message: Tainted data reaches sink
pattern-sources:
- pattern: user_input()
- pattern: request.args.get(...)
pattern-sinks:
- pattern: eval(...)
- pattern: os.system(...)
pattern-sanitizers: # Optional
- pattern: sanitize(...)
- pattern: escape(...)
pattern-sources:
- pattern: source(...)
exact: true # Only exact match is source (default: false)
by-side-effect: true # Taints by side effect (also accepts: only)
pattern-sanitizers:
- pattern: sanitize($X)
exact: true # Only exact match (default: false)
by-side-effect: true # Sanitizes by side effect
pattern-sinks:
- pattern: sink(...)
exact: false # Subexpressions also sinks (default: true)
Only allowed annotations are ruleid: rule-id and ok: rule-id.
# ruleid: rule-id
vulnerable_code() # This line MUST match
# ok: rule-id
safe_code() # This line MUST NOT match
DO NOT use multi-line comments for test annotations, for example:
/* ruleid: ... */
# Test rules
semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
# Validate YAML syntax
semgrep --validate --config <rule-id>.yaml
# Run with dataflow traces (for taint mode rules)
semgrep --dataflow-traces --config <rule-id>.yaml <rule-id>.<ext>
# Dump AST to understand code structure
semgrep --dump-ast --lang <language> <rule-id>.<ext>
# Run single rule
semgrep --config <rule-id>.yaml <rule-id>.<ext>
# Run single pattern
semgrep --lang <language> --pattern <pattern> <rule-id>.<ext>
ruleid: must be on the line IMMEDIATELY BEFORE the finding. No other text or codepattern: $X without constraintssemgrep --validatesemgrep --dump-ast --lang <language> <rule-id>.<ext>--dataflow-traces to see flowpattern-not for safe casespattern-inside to limit scopemetavariable-regex to filterreferences/workflow.md)Detailed workflow for creating production-quality Semgrep rules.
Before writing any code:
Taint mode is a powerful feature in Semgrep that can track the flow of data from one location to another. By using taint mode, you can:
Why test-first? Writing tests before the rule forces you to think about both vulnerable AND safe cases. Rules written without tests often have hidden false positives (matching safe cases) or false negatives (missing vulnerable variants). Tests make these visible immediately.
Create directory and test file with annotations (# ruleid:, # ok: only). See quick reference above for full syntax.
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file with ruleid/ok annotations
CRITICAL:
# ruleid: or # ok: ) must be on the line IMMEDIATELY BEFORE the code. Semgrep reports findings on the line after the annotation.# ruleid: my-rule). No other text, comments, or code on the same line.You must include test cases for:
Why analyze AST? Semgrep matches against the AST, not raw text. Code that looks similar may parse differently (e.g., foo.bar() vs foo().bar). The AST dump shows exactly what Semgrep sees, preventing patterns that fail due to unexpected tree structure. Understanding how exactly Semgrep parses code is crucial for writing precise patterns.
semgrep --dump-ast --lang <language> <rule-id>.<ext>
Example output helps understand:
Choose the appropriate pattern operators and write the rule.
For pattern operator syntax (basic matching, scope operators, metavariable filters, focus), see Inlined: quick reference above.
semgrep --validate --config <rule-id>.yaml
cd <rule-directory>
semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
1/1: ✓ All tests passed
If tests fail, check:
pattern-not exclusionsemgrep --dataflow-traces --config <rule-id>.yaml <rule-id>.<ext>
Shows:
Work on writing Semgrep rule (patterns) iteratively to ensure the Semgrep rule passes all tests with no missed or incorrect lines.
Each time when you introduce any changes, test Semgrep rule:
semgrep --test --config <rule-id>.yaml <rule-id>.<ext>
For debugging taint mode rules:
semgrep --dataflow-traces --config <rule-id>.yaml <rule-id>.<ext>
Verification checkpoint: Output MUST show "All tests passed". Only proceed when validation passes.
Verification checkpoint: Proceed to Step 6: Optimize the Rule when:
| Problem | Solution |
|---|---|
| Too many matches | Add pattern-not exclusions |
| Missing matches | Add pattern-either variants |
| Wrong line matched | Adjust focus-metavariable |
| Taint not flowing | Check sanitizers aren't too broad |
| Taint false positive | Add sanitizer pattern |
After all tests pass, remove redundant patterns (quote variants, ellipsis subsets, redundant patterns).
Semgrep treats certain patterns as equivalent:
| Written | Also Matches | Reason |
|---|---|---|
"string" | 'string' | Quote style normalized (in languages where both are equivalent) |
func(...) | func(), func(a), func(a,b) | Ellipsis matches zero or more |
func($X, ...) | func($X), func($X, a, b) | Trailing ellipsis is optional |
1. Quote Variants (depends on the language)
Before:
pattern-either:
- pattern: hashlib.new("md5", ...)
- pattern: hashlib.new('md5', ...)
After:
pattern-either:
- pattern: hashlib.new("md5", ...)
2. Ellipsis Subsets
Before:
pattern-either:
- pattern: dangerous($X, ...)
- pattern: dangerous($X)
- pattern: dangerous($X, $Y)
After:
pattern: dangerous($X, ...)
3. Consolidate with Metavariables
Before:
pattern-either:
- pattern: md5($X)
- pattern: sha1($X)
- pattern: sha256($X)
After:
patterns:
- pattern: $FUNC($X)
- metavariable-regex:
metavariable: $FUNC
regex: ^(md5|sha1|sha256)$
... patternssemgrep --test --config <rule-id>.yaml <rule-id>.<ext>
CRITICAL: Always re-run tests after optimization. Some "redundant" patterns may actually be necessary due to AST structure differences. If any test fails, revert the optimization that caused it.
Task complete ONLY when: All tests pass after optimization.
Run the Semgrep rule you created using: semgrep --config <rule-id>.yaml <rule-id>.<ext>.
Ensure that message:
Fix any message issues and re-run that Semgrep rule after each fix.
npx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsCreates custom Semgrep rules for security vulnerabilities, bug patterns, and coding standards with mandatory testing and validation.
Creates and validates custom Semgrep rules for security vulnerabilities, bugs, and code patterns with tests and taint mode. Use when developing static analysis detections.
Creates and validates custom Semgrep rules for security vulnerabilities, bugs, and code patterns with tests and taint mode. Use when developing static analysis detections.