Skill

rate-limiting

Enforces rate limiting at API gateways to protect AI models from extraction attacks. Use when designing, building, or reviewing API gateways for inference or LLM endpoints.

security

api-development

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/prodsec-skills:rate-limiting

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

API gateways protecting AI models SHOULD implement rate limiting. This is a critical defense against attacks that require sending large volumes of requests to the models.

SKILL.md

45 lines · ~557 tokens

Stats

LanguagePython

Stars40

Forks4

MaintenanceExcellent

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Rate Limiting at the API Gateway

Security Requirement

API gateways protecting AI models SHOULD implement rate limiting. This is a critical defense against attacks that require sending large volumes of requests to the models.

Attacks Mitigated by Rate Limiting

Attack	Description
Model data extraction	Attempting to extract sensitive information the model learned during training
Training data extraction	Reconstructing training data from model responses
Token extraction	Stealing API tokens or credentials through repeated probing
Weight extraction	Reverse-engineering model weights through systematic queries
Prompt injection probing	Brute-forcing prompt injection payloads

These attacks typically require sending thousands to millions of requests. Rate limiting makes them impractical or too slow to be worthwhile.

Rate Limiting Strategies

Strategy	Use Case
Per-user/principal	Limit requests per authenticated user over a time window
Per-IP	Limit unauthenticated or pre-auth requests by source IP
Per-endpoint	Different limits for different model endpoints based on sensitivity
Adaptive/dynamic	Adjust limits based on detected anomalous patterns
Token-based	Limit based on input/output token consumption, not just request count

Implementation Checklist

Define rate limits per principal, IP, and endpoint
Set limits low enough to impede extraction attacks but high enough for legitimate use
Return standard 429 Too Many Requests with Retry-After header
Log rate limit violations with principal identity for incident response
Consider adaptive rate limiting that tightens under detected attack patterns
Apply token-based rate limits for LLM endpoints (input + output tokens)
Monitor rate limit metrics for tuning and anomaly detection

rate-limiting

Popularity

Invocation

Context Preview

SKILL.md

rate-limiting

Popularity

Invocation

Context Preview

SKILL.md

Rate Limiting at the API Gateway

Security Requirement

Attacks Mitigated by Rate Limiting

Rate Limiting Strategies

Implementation Checklist

Similar Skills

Rate Limiting at the API Gateway

Security Requirement

Attacks Mitigated by Rate Limiting

Rate Limiting Strategies

Implementation Checklist

Similar Skills