From prodsec-skills
Enforces rate limiting at API gateways to protect AI models from extraction attacks. Use when designing, building, or reviewing API gateways for inference or LLM endpoints.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:rate-limitingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
API gateways protecting AI models SHOULD implement rate limiting. This is a critical defense against attacks that require sending large volumes of requests to the models.
API gateways protecting AI models SHOULD implement rate limiting. This is a critical defense against attacks that require sending large volumes of requests to the models.
| Attack | Description |
|---|---|
| Model data extraction | Attempting to extract sensitive information the model learned during training |
| Training data extraction | Reconstructing training data from model responses |
| Token extraction | Stealing API tokens or credentials through repeated probing |
| Weight extraction | Reverse-engineering model weights through systematic queries |
| Prompt injection probing | Brute-forcing prompt injection payloads |
These attacks typically require sending thousands to millions of requests. Rate limiting makes them impractical or too slow to be worthwhile.
| Strategy | Use Case |
|---|---|
| Per-user/principal | Limit requests per authenticated user over a time window |
| Per-IP | Limit unauthenticated or pre-auth requests by source IP |
| Per-endpoint | Different limits for different model endpoints based on sensitivity |
| Adaptive/dynamic | Adjust limits based on detected anomalous patterns |
| Token-based | Limit based on input/output token consumption, not just request count |
429 Too Many Requests with Retry-After headernpx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsEnforces per-client request quotas using token bucket or sliding window algorithms to protect APIs from abuse, brute force, and resource exhaustion.
Detects inference endpoints missing authentication or rate limiting, enabling model theft via systematic queries. Use when building or auditing LLM-serving infrastructure.
Implements API rate limiting and throttling with token bucket, sliding window, and fixed window algorithms using Redis, API gateway plugins, or middleware. Configures per-user, per-IP, and per-endpoint limits with HTTP 429 responses.