Learning Notes #27 – Rate Limit Requests
Today, i learnt about Rate Limiting Using Redis which is very strange for me. In many blogs, they mentioned implementing redis as a rate limiter, and the code is written in server side. Shouldn’t be offloaded to the gateway ? Many such questions arised. In the mean time, i re-learned the rate limit concept. In this blog i jot down notes on rate limiting with examples using HAProxy (can be used in production).
What is Rate Limiting?

Rate limiting refers to the process of limiting the number of requests a user, application, or IP address can make to a system in a given period. This mechanism is essential to protect systems from:
- Overloading caused by high traffic or malicious activity.
- Denial of Service (DoS) attacks.
- Resource starvation due to unbalanced usage patterns.
- Ensuring fair usage among all clients.
For example, a public API might limit each user to 100 requests per minute to avoid overwhelming the backend systems.
Types of Rate Limiting
Rate limiting can be implemented in various ways depending on the use case and infrastructure. Here are the most common types
- Fixed Window
- Sliding Window
- Token Bucket
- Leaky Bucket
- Concurrent Rate Limiting
1. Fixed Window Rate Limiting

In this method, a fixed time window (e.g., 1 minute) is defined, and a request counter is maintained. If the number of requests exceeds the allowed limit within the window, subsequent requests are denied.
How It Works
- A counter is initialized at the start of the time window.
- Each incoming request increments the counter.
- If the counter exceeds the predefined limit, the request is rejected until the window resets.
Advantages
- Simple to implement.
- Effective for scenarios where traffic is predictable.
Disadvantages
- Burst traffic at the boundary of two windows can lead to uneven load, as a user can send the maximum requests at the end of one window and immediately at the start of the next.
Example: Allow 60 requests per minute. At the start of each new minute, the counter resets.
Implementation:
HAProxy offers various methods to control traffic, including rate limiting. We can implement fixed window rate limiting in HAProxy using stick tables, which are in-memory tables used to store information about each connection. These stick tables can be configured to store request counts, track IP addresses, and enforce rate limits.
Step 1: Define the Stick Table
To track the number of requests for a given client, we define a stick table that holds the request count and sets the expiration for the time window.
backend my_backend stick-table type ip size 1m expire 60s store http_req_rate(60s)
Explanation:
type ip
: This means that the stick table will track client IP addresses.size 1m
: This defines the size of the table (1 million entries in this case).expire 60s
: The table will expire every 60 seconds (i.e., every fixed time window).store http_req_rate(60s)
: This stores the request rate per IP over the last 60 seconds.
Step 2: Apply Rate Limiting Based on the Stick Table
Next, you apply rate limiting based on the values stored in the stick table. You can reject requests that exceed the allowed rate limit by using the http-request
directive.
frontend http_in bind *:80 acl too_many_requests sc_http_req_rate(my_backend) gt 100 http-request deny if too_many_requests default_backend my_backend
Explanation:
acl too_many_requests sc_http_req_rate(my_backend) gt 100
: This defines an Access Control List (ACL) that checks if the request rate for a particular IP (from the stick table) exceeds 100 requests in the last 60 seconds.http-request deny if too_many_requests
: If the ACL condition is met (i.e., the IP exceeds the rate limit), the request is denied.
2. Sliding Window Rate Limiting

This approach improves upon fixed windows by using a rolling window. Requests are tracked using timestamps, and the rate is calculated based on a dynamic window.
How It Works
- Each request is timestamped.
- A sliding window keeps track of all requests within a defined time frame.
- The system calculates the total requests in the window to determine whether a new request is allowed.
Advantages
- Reduces the impact of bursts near window boundaries.
- Provides a smoother throttling experience.
Disadvantages
- Slightly more complex to implement due to the need for maintaining and evaluating timestamps.
Example: Allow 60 requests over the last 60 seconds, calculated dynamically.
Implementation:

In this scenario, you want to limit the number of requests that a user can make within a certain period of time. The period is a sliding window. So, if you set it to allow no more than 20 requests per client during the last 10 seconds, HAProxy will count the last 10 seconds. Consider this HAProxy configuration,
frontend website bind :80 stick-table type ipv6 size 100k expire 30s store http_req_rate(10s) http-request track-sc0 src http-request deny deny_status 429 if { sc_http_req_rate(0) gt 20 } default_backend servers
The stick-table
directive in HAProxy creates a key-value store to track counters like HTTP request rates per client. The client’s IP address is used as the key, and its request count is stored and aggregated. The http-request track-sc0 src
line adds the client to the stick table, starting the count of their requests.
Records in the stick table expire after a specified inactivity period, as defined by the expire
parameter, which helps free up space. Without an expire
parameter, the oldest records are evicted once the table is full. For example, 100,000 records can be allowed.
The http-request deny
line enforces the rate limit and specifies the action when the limit is exceeded. In this case, the rate is limited to 20 concurrent requests, and any additional requests are denied with a 429 status code until the count falls below the threshold. Other actions can include forwarding to a dedicated backend or silently dropping the connection. The sc_http_req_rate
method fetches the client’s current request rate.
You can adjust the time period or threshold, such as allowing up to 1000 requests over 24 hours by changing http_req_rate(10s)
to http_req_rate(24h)
and updating the deny line accordingly.
3. Token Bucket Algorithm

This algorithm uses a bucket to hold tokens, where each token represents a request. Tokens are replenished at a fixed rate. A request is processed only if a token is available; otherwise, it is rejected or delayed.
How It Works:
- A bucket holds a maximum number of tokens (capacity).
- Tokens are added to the bucket at a steady rate.
- When a request is received, a token is removed from the bucket.
- If the bucket is empty, the request is rejected or delayed until a token becomes available.
Advantages:
- Allows for short bursts of activity while maintaining overall limits.
- Efficient and widely used.
Disadvantages:
- Complex to set up in distributed systems.
Example: Refill 10 tokens per second, with a maximum bucket capacity of 100 tokens.
4. Leaky Bucket Algorithm

Similar to the token bucket but focuses on maintaining a consistent outflow of requests. Excess requests are queued and processed at a steady rate.
How It Works:
- Requests enter a queue (bucket).
- The system processes requests at a fixed rate.
- If the queue is full, additional requests are rejected or delayed.
Advantages:
- Ensures a constant request rate.
- Good for smoothing out traffic bursts.
Disadvantages:
- May introduce latency due to queuing.
Example: Process requests at a steady rate of 5 per second, regardless of the input rate.
5. Concurrent Rate Limiting
Limits the number of concurrent requests a user or system can make.
How It Works:
- The system tracks the number of active or ongoing requests for each user.
- If the active requests exceed the limit, new requests are rejected until one or more ongoing requests are completed.
Advantages:
- Useful for systems with high latency or long-lived connections.
- Prevents resource exhaustion from simultaneous requests.
Disadvantages:
- May require complex state management to track active requests.
Example: Allow a maximum of 10 simultaneous requests per user.
References:
- https://www.haproxy.com/blog/four-examples-of-haproxy-rate-limiting
- https://medium.com/@m-elbably/rate-limiting-a-dynamic-distributed-rate-limiting-with-redis-339f9504200f
- https://irshitmukherjee55.hashnode.dev/rate-limiting-using-redis-golang-token-bucket-algorithm
- https://www.infoworld.com/article/2257527/how-to-use-redis-for-real-time-metering-applications.html
- https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-rate-based-request-limiting.html