What is Rate Limiting
Rate Limiting is a mechanism that limits the number of API requests within a certain time period. It maintains service stability and protects systems from malicious use or excessive access due to bugs.
Why it’s needed: Without request limits, a single user could exhaust all system resources, or DDoS attacks could bring down services.
Purposes of Rate Limiting
| Purpose | Description |
|---|---|
| Service protection | Prevent downtime from overload |
| Fairness | Distribute resources fairly to all users |
| Abuse prevention | Deter scraping, brute force attacks |
| Cost management | Ensure infrastructure cost predictability |
Major Algorithms
1. Fixed Window
Resets the count at each fixed time window.
| Time | Requests | Status |
|---|---|---|
| 00:00-00:30 | 90 | ✓ |
| 00:30-00:59 | 10 (total 100) | ✓ |
| 01:00 | Counter reset | - |
| 01:00-01:30 | 100 | ✓ |
Problem: Allows momentarily double the requests at window boundaries
| Time | Requests | Problem |
|---|---|---|
| 00:59 | 100 ✓ | |
| 01:00 | 100 ✓ | 200 requests in 2 seconds! |
2. Sliding Window Log
Records timestamps of each request and counts requests in the past N seconds.
Current time: 01:00:30, Window: Past 60 seconds (00:00:30-01:00:30)
| Timestamp | Status |
|---|---|
| 00:00:25 | Outside window (delete) |
| 00:00:35 | ✓ In window |
| 00:00:50 | ✓ In window |
| 01:00:10 | ✓ In window |
Advantage: Accurate rate limiting Disadvantage: High memory usage
3. Sliding Window Counter
Improved version of fixed window. Calculates using weighted counts from previous and current windows.
| Window | Requests |
|---|---|
| Previous (00:00-00:59) | 80 |
| Current (01:00-01:59) | 30 |
| Current time | 01:00:20 (33% into window) |
Estimated requests = 80 × 0.67 + 30 = 83.6
4. Token Bucket
Tokens are added to a bucket at a constant rate, and each request consumes a token.
Configuration: Bucket capacity: 10 tokens, Refill rate: 1 token/second
| State | Tokens | Description |
|---|---|---|
| Initial | 10/10 | Full bucket |
| After 5 requests | 5/10 | 5 tokens consumed |
| After 3 seconds | 8/10 | 3 tokens refilled |
| Burst capacity | 8 | 8 requests possible |
Advantage: Handles bursts, memory efficient
5. Leaky Bucket
Requests are processed from the bucket at a constant rate.
flowchart LR
In["Inflow<br/>(variable)"] --> Bucket["Bucket<br/>(Queue)"] --> Out["Outflow<br/>(fixed rate)"]
Advantage: Stable output rate Disadvantage: Doesn’t handle bursts well
Implementation Patterns
Response Headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
Response When Limit Exceeded
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"retry_after": 30
}
Redis Implementation Example
async function checkRateLimit(userId, limit, windowSec) {
const key = `ratelimit:${userId}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, windowSec);
}
if (current > limit) {
const ttl = await redis.ttl(key);
return { allowed: false, retryAfter: ttl };
}
return { allowed: true, remaining: limit - current };
}
Rate Limit Granularity
User-based
| User | Limit |
|---|---|
| User A | 100 requests/minute |
| User B | 100 requests/minute |
IP Address-based
| IP Address | Limit |
|---|---|
| 192.168.1.1 | 100 requests/minute |
| 192.168.1.2 | 100 requests/minute |
Endpoint-based
| Endpoint | Limit | Note |
|---|---|---|
| GET /api/users | 100 requests/minute | |
| POST /api/users | 10 requests/minute | Stricter for creation |
Tiered
| Tier | Limit |
|---|---|
| Free | 100 requests/day |
| Pro | 10,000 requests/day |
| Enterprise | Unlimited |
Considerations for Distributed Systems
Centralized
flowchart LR
S1["Server 1"] --> Redis["Redis<br/>(shared counter)"]
S2["Server 2"] --> Redis
S3["Server 3"] --> Redis
Advantage: Accurate Disadvantage: Latency to Redis
Local Cache + Sync
flowchart LR
S1["Server 1<br/>[Local counter]"] <-->|"Periodic sync"| S2["Server 2<br/>[Local counter]"]
Advantage: Low latency Disadvantage: Tolerates some overrun
Client-side Handling
Exponential Backoff
async function fetchWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || Math.pow(2, i);
await sleep(retryAfter * 1000);
continue;
}
return response;
}
throw new Error('Rate limit exceeded after retries');
}
Summary
Rate limiting is an important mechanism for ensuring API stability and fairness. By selecting appropriate algorithms like token bucket or sliding window for your use case and setting limits at appropriate granularity, you can protect services while providing a good user experience.
← Back to list