Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Parotta Salna
Learning Notes #27 – Rate Limit Requests
2 January 2025 at 14:08

Learning Notes #27 – Rate Limit Requests

By: Mr.ParottaSalna

2 January 2025 at 14:08

Today, i learnt about Rate Limiting Using Redis which is very strange for me. In many blogs, they mentioned implementing redis as a rate limiter, and the code is written in server side. Shouldn’t be offloaded to the gateway ? Many such questions arised. In the mean time, i re-learned the rate limit concept. In this blog i jot down notes on rate limiting with examples using HAProxy (can be used in production).

What is Rate Limiting?

Rate limiting refers to the process of limiting the number of requests a user, application, or IP address can make to a system in a given period. This mechanism is essential to protect systems from:

Overloading caused by high traffic or malicious activity.
Denial of Service (DoS) attacks.
Resource starvation due to unbalanced usage patterns.
Ensuring fair usage among all clients.

For example, a public API might limit each user to 100 requests per minute to avoid overwhelming the backend systems.

Types of Rate Limiting

Rate limiting can be implemented in various ways depending on the use case and infrastructure. Here are the most common types

Fixed Window
Sliding Window
Token Bucket
Leaky Bucket
Concurrent Rate Limiting

1. Fixed Window Rate Limiting

In this method, a fixed time window (e.g., 1 minute) is defined, and a request counter is maintained. If the number of requests exceeds the allowed limit within the window, subsequent requests are denied.

How It Works

A counter is initialized at the start of the time window.
Each incoming request increments the counter.
If the counter exceeds the predefined limit, the request is rejected until the window resets.

Advantages

Simple to implement.
Effective for scenarios where traffic is predictable.

Disadvantages

Burst traffic at the boundary of two windows can lead to uneven load, as a user can send the maximum requests at the end of one window and immediately at the start of the next.

Example: Allow 60 requests per minute. At the start of each new minute, the counter resets.

Implementation:

HAProxy offers various methods to control traffic, including rate limiting. We can implement fixed window rate limiting in HAProxy using stick tables, which are in-memory tables used to store information about each connection. These stick tables can be configured to store request counts, track IP addresses, and enforce rate limits.

Step 1: Define the Stick Table

To track the number of requests for a given client, we define a stick table that holds the request count and sets the expiration for the time window.


backend my_backend
    stick-table type ip size 1m expire 60s store http_req_rate(60s)

Explanation:

type ip: This means that the stick table will track client IP addresses.
size 1m: This defines the size of the table (1 million entries in this case).
expire 60s: The table will expire every 60 seconds (i.e., every fixed time window).
store http_req_rate(60s): This stores the request rate per IP over the last 60 seconds.

Step 2: Apply Rate Limiting Based on the Stick Table

Next, you apply rate limiting based on the values stored in the stick table. You can reject requests that exceed the allowed rate limit by using the http-request directive.


frontend http_in
    bind *:80
    acl too_many_requests sc_http_req_rate(my_backend) gt 100
    http-request deny if too_many_requests
    default_backend my_backend

Explanation:

acl too_many_requests sc_http_req_rate(my_backend) gt 100: This defines an Access Control List (ACL) that checks if the request rate for a particular IP (from the stick table) exceeds 100 requests in the last 60 seconds.
http-request deny if too_many_requests: If the ACL condition is met (i.e., the IP exceeds the rate limit), the request is denied.

2. Sliding Window Rate Limiting

This approach improves upon fixed windows by using a rolling window. Requests are tracked using timestamps, and the rate is calculated based on a dynamic window.

How It Works

Each request is timestamped.
A sliding window keeps track of all requests within a defined time frame.
The system calculates the total requests in the window to determine whether a new request is allowed.

Advantages

Reduces the impact of bursts near window boundaries.
Provides a smoother throttling experience.

Disadvantages

Slightly more complex to implement due to the need for maintaining and evaluating timestamps.

Example: Allow 60 requests over the last 60 seconds, calculated dynamically.

Implementation:

In this scenario, you want to limit the number of requests that a user can make within a certain period of time. The period is a sliding window. So, if you set it to allow no more than 20 requests per client during the last 10 seconds, HAProxy will count the last 10 seconds. Consider this HAProxy configuration,


frontend website
    bind :80
    stick-table  type ipv6  size 100k  expire 30s  store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 20 }
    default_backend servers

The stick-table directive in HAProxy creates a key-value store to track counters like HTTP request rates per client. The client’s IP address is used as the key, and its request count is stored and aggregated. The http-request track-sc0 src line adds the client to the stick table, starting the count of their requests.

Records in the stick table expire after a specified inactivity period, as defined by the expire parameter, which helps free up space. Without an expire parameter, the oldest records are evicted once the table is full. For example, 100,000 records can be allowed.

The http-request deny line enforces the rate limit and specifies the action when the limit is exceeded. In this case, the rate is limited to 20 concurrent requests, and any additional requests are denied with a 429 status code until the count falls below the threshold. Other actions can include forwarding to a dedicated backend or silently dropping the connection. The sc_http_req_rate method fetches the client’s current request rate.

You can adjust the time period or threshold, such as allowing up to 1000 requests over 24 hours by changing http_req_rate(10s) to http_req_rate(24h) and updating the deny line accordingly.

3. Token Bucket Algorithm

This algorithm uses a bucket to hold tokens, where each token represents a request. Tokens are replenished at a fixed rate. A request is processed only if a token is available; otherwise, it is rejected or delayed.

How It Works:

A bucket holds a maximum number of tokens (capacity).
Tokens are added to the bucket at a steady rate.
When a request is received, a token is removed from the bucket.
If the bucket is empty, the request is rejected or delayed until a token becomes available.

Advantages:

Allows for short bursts of activity while maintaining overall limits.
Efficient and widely used.

Disadvantages:

Complex to set up in distributed systems.

Example: Refill 10 tokens per second, with a maximum bucket capacity of 100 tokens.

4. Leaky Bucket Algorithm

Similar to the token bucket but focuses on maintaining a consistent outflow of requests. Excess requests are queued and processed at a steady rate.

How It Works:

Requests enter a queue (bucket).
The system processes requests at a fixed rate.
If the queue is full, additional requests are rejected or delayed.

Advantages:

Ensures a constant request rate.
Good for smoothing out traffic bursts.

Disadvantages:

May introduce latency due to queuing.

Example: Process requests at a steady rate of 5 per second, regardless of the input rate.

5. Concurrent Rate Limiting

Limits the number of concurrent requests a user or system can make.

How It Works:

The system tracks the number of active or ongoing requests for each user.
If the active requests exceed the limit, new requests are rejected until one or more ongoing requests are completed.

Advantages:

Useful for systems with high latency or long-lived connections.
Prevents resource exhaustion from simultaneous requests.

Disadvantages:

May require complex state management to track active requests.

Example: Allow a maximum of 10 simultaneous requests per user.

References:

Parotta Salna
Mastering Request Retrying in Python with Tenacity: A Developer’s Journey
7 September 2024 at 01:49

Mastering Request Retrying in Python with Tenacity: A Developer’s Journey

Parotta Salna

By: Mr.ParottaSalna

7 September 2024 at 01:49

Meet Jafer, a talented developer (self boast) working at a fast growing tech company. His team is building an innovative app that fetches data from multiple third-party APIs in realtime to provide users with up-to-date information.

Everything is going smoothly until one day, a spike in traffic causes their app to face a wave of “HTTP 500” and “Timeout” errors. Requests start failing left and right, and users are left staring at the dreaded “Data Unavailable” message.

Jafer realizes that he needs a way to make their app more resilient against these unpredictable network hiccups. That’s when he discovers Tenacity a powerful Python library designed to help developers handle retries gracefully.

Join Jafer as he dives into Tenacity and learns how to turn his app from fragile to robust with just a few lines of code!

Step 0: Mock FLASK Api

from flask import Flask, jsonify, make_response
import random
import time

app = Flask(__name__)

# Scenario 1: Random server errors
@app.route('/random_error', methods=['GET'])
def random_error():
    if random.choice([True, False]):
        return make_response(jsonify({"error": "Server error"}), 500)  # Simulate a 500 error randomly
    return jsonify({"message": "Success"})

# Scenario 2: Timeouts
@app.route('/timeout', methods=['GET'])
def timeout():
    time.sleep(5)  # Simulate a long delay that can cause a timeout
    return jsonify({"message": "Delayed response"})

# Scenario 3: 404 Not Found error
@app.route('/not_found', methods=['GET'])
def not_found():
    return make_response(jsonify({"error": "Not found"}), 404)

# Scenario 4: Rate-limiting (simulated with a fixed chance)
@app.route('/rate_limit', methods=['GET'])
def rate_limit():
    if random.randint(1, 10) <= 3:  # 30% chance to simulate rate limiting
        return make_response(jsonify({"error": "Rate limit exceeded"}), 429)
    return jsonify({"message": "Success"})

# Scenario 5: Empty response
@app.route('/empty_response', methods=['GET'])
def empty_response():
    if random.choice([True, False]):
        return make_response("", 204)  # Simulate an empty response with 204 No Content
    return jsonify({"message": "Success"})

if __name__ == '__main__':
    app.run(host='localhost', port=5000, debug=True)

To run the Flask app, use the command,

python mock_server.py

Step 1: Introducing Tenacity

Jafer decides to start with the basics. He knows that Tenacity will allow him to retry failed requests without cluttering his codebase with complex loops and error handling. So, he installs the library,

pip install tenacity

With Tenacity ready, Jafer decides to tackle his first problem, retrying a request that fails due to server errors.

Step 2: Retrying on Exceptions

He writes a simple function that fetches data from an API and wraps it with Tenacity’s @retry decorator

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(stop=stop_after_attempt(3),
        wait=wait_fixed(2),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_random_error():
    response = requests.get('http://localhost:5000/random_error')
    response.raise_for_status()  # Raises an HTTPError for 4xx/5xx responses
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_random_error()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This code will attempt the request up to 3 times, waiting 2 seconds between each try. Jafer feels confident that this will handle the occasional hiccup. However, he soon realizes that he needs more control over which exceptions trigger a retry.

Step 3: Handling Specific Exceptions

Jafer’s app sometimes receives a “404 Not Found” error, which should not be retried because the resource doesn’t exist. He modifies the retry logic to handle only certain exceptions,

import requests
import logging
from tenacity import before_log, after_log
from requests.exceptions import HTTPError, Timeout
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed
 

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(stop=stop_after_attempt(3),
        wait=wait_fixed(2),
        retry=retry_if_exception_type((HTTPError, Timeout)),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get('http://localhost:5000/timeout', timeout=2)  # Set a short timeout to simulate failure
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries only on HTTPError or Timeout, avoiding unnecessary retries for a “404” error. Jafer’s app is starting to feel more resilient!

Step 4: Implementing Exponential Backoff

A few days later, the team notices that they’re still getting rate-limited by some APIs. Jafer recalls the concept of exponential backoff a strategy where the wait time between retries increases exponentially, reducing the load on the server and preventing further rate limiting.

He decides to implement it,

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_exponential

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@retry(stop=stop_after_attempt(5),
       wait=wait_exponential(multiplier=1, min=2, max=10),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_rate_limit():
    response = requests.get('http://localhost:5000/rate_limit')
    response.raise_for_status()
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_rate_limit()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

With this code, the wait time starts at 2 seconds and doubles with each retry, up to a maximum of 10 seconds. Jafer’s app is now much less likely to be rate-limited!

Step 5: Retrying Based on Return Values

Jafer encounters another issue: some APIs occasionally return an empty response (204 No Content). These cases should also trigger a retry. Tenacity makes this easy with the retry_if_result feature,

import requests
import logging
from tenacity import before_log, after_log

from tenacity import retry, stop_after_attempt, retry_if_result

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
  

@retry(retry=retry_if_result(lambda x: x is None), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_empty_response():
    response = requests.get('http://localhost:5000/empty_response')
    if response.status_code == 204:
        return None  # Simulate an empty response
    response.raise_for_status()
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_empty_response()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries when it receives an empty response, ensuring that users get the data they need.

Step 6: Combining Multiple Retry Conditions

But Jafer isn’t done yet. Some situations require combining multiple conditions. He wants to retry on HTTPError, Timeout, or a None return value. With Tenacity’s retry_any feature, he can do just that,

import requests
import logging
from tenacity import before_log, after_log

from requests.exceptions import HTTPError, Timeout
from tenacity import retry_any, retry, retry_if_exception_type, retry_if_result, stop_after_attempt
 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(retry=retry_any(retry_if_exception_type((HTTPError, Timeout)), retry_if_result(lambda x: x is None)), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout")
    if response.status_code == 204:
        return None
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This approach covers all his bases, making the app even more resilient!

Step 7: Logging and Tracking Retries

As the app scales, Jafer wants to keep an eye on how often retries happen and why. He decides to add logging,

import logging
import requests
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
@retry(stop=stop_after_attempt(2), wait=wait_fixed(2),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout", timeout=2)
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This logs messages before and after each retry attempt, giving Jafer full visibility into the retry process. Now, he can monitor the app’s behavior in production and quickly spot any patterns or issues.

The Happy Ending

With Tenacity, Jafer has transformed his app into a resilient powerhouse that gracefully handles intermittent failures. Users are happy, the servers are humming along smoothly, and Jafer’s team has more time to work on new features rather than firefighting network errors.

By mastering Tenacity, Jafer has learned that handling network failures gracefully can turn a fragile app into a robust and reliable one. Whether it’s dealing with flaky APIs, network blips, or rate limits, Tenacity is his go-to tool for retrying operations in Python.

So, the next time your app faces unpredictable network challenges, remember Jafer’s story and give Tenacity a try you might just save the day!