❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #23 – Retry Pattern | Cloud Patterns

31 December 2024 at 17:34

Today, i refreshed Retry pattern. It handles transient failures ( network issues, throttling, or temporary unavailability of a service).

The Retry Pattern provides a structured approach to handle these failures gracefully, ensuring system reliability and fault tolerance. It is often used in conjunction with related patterns like the Circuit Breaker, which prevents repeated retries during prolonged failures, and the Bulkhead Pattern, which isolates system components to prevent cascading failures.

In this blog, i jot down my notes on Retry pattern for better understanding.

What is the Retry Pattern?

The Retry Pattern is a design strategy used to manage transient failures by retrying failed operations. Instead of immediately failing an operation after an error, the pattern retries it with an optional delay or backoff strategy. This is particularly useful in distributed systems where failures are often temporary.

Key Components of the Retry Pattern

  • Retry Logic: The mechanism that determines how many times to retry and under what conditions.
  • Backoff Strategy: A delay mechanism to space out retries. Common strategies include fixed, incremental, and exponential backoff.
  • Termination Policy: A limit on the number of retries or a timeout to prevent infinite retry loops.
  • Error Handling: A fallback mechanism to gracefully handle persistent failures after retries are exhausted.

Retry Pattern Strategies

1. Fixed Interval Retry

  • Retries are performed at regular intervals.
  • Example: Retry every 2 seconds for up to 5 attempts.

2. Incremental Backoff

  • Retry intervals increase linearly.
  • Example: Retry after 1, 2, 3, 4, and 5 seconds.

3. Exponential Backoff

  • Retry intervals grow exponentially, often with jitter to randomize delays.
  • Example: Retry after 1, 2, 4, 8, and 16 seconds.

4. Custom Backoff

  • Tailored to specific use cases, combining strategies or using domain-specific logic.

Implementing the Retry Pattern in Python with Tenacity

tenacity is a powerful Python library that simplifies the implementation of the Retry Pattern. It provides built-in support for various retry strategies, including fixed interval, incremental backoff, and exponential backoff with jitter.

Example with Fixed Interval Retry


from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Example with Exponential Backoff and Jitter


from tenacity import retry, stop_after_attempt, wait_exponential_jitter

@retry(stop=stop_after_attempt(5), wait=wwmultiplier=1, max=10))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Example with Custom Termination Policy


from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(10), wait=wait_exponential(multiplier=1))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Real-World Use Cases

  • API Rate Limiting – Retrying failed API calls when encountering HTTP 429 errors.
  • Database Operations – Retrying failed database queries due to deadlocks or transient connectivity issues.
  • File Uploads/Downloads – Retrying uploads or downloads in case of network interruptions.
  • Message Processing – Retries for message processing failures in systems like RabbitMQ or Kafka.

Mastering Request Retrying in Python with Tenacity: A Developer’s Journey

7 September 2024 at 01:49

Meet Jafer, a talented developer (self boast) working at a fast growing tech company. His team is building an innovative app that fetches data from multiple third-party APIs in realtime to provide users with up-to-date information.

Everything is going smoothly until one day, a spike in traffic causes their app to face a wave of β€œHTTP 500” and β€œTimeout” errors. Requests start failing left and right, and users are left staring at the dreaded β€œData Unavailable” message.

Jafer realizes that he needs a way to make their app more resilient against these unpredictable network hiccups. That’s when he discovers Tenacity a powerful Python library designed to help developers handle retries gracefully.

Join Jafer as he dives into Tenacity and learns how to turn his app from fragile to robust with just a few lines of code!

Step 0: Mock FLASK Api

from flask import Flask, jsonify, make_response
import random
import time

app = Flask(__name__)

# Scenario 1: Random server errors
@app.route('/random_error', methods=['GET'])
def random_error():
    if random.choice([True, False]):
        return make_response(jsonify({"error": "Server error"}), 500)  # Simulate a 500 error randomly
    return jsonify({"message": "Success"})

# Scenario 2: Timeouts
@app.route('/timeout', methods=['GET'])
def timeout():
    time.sleep(5)  # Simulate a long delay that can cause a timeout
    return jsonify({"message": "Delayed response"})

# Scenario 3: 404 Not Found error
@app.route('/not_found', methods=['GET'])
def not_found():
    return make_response(jsonify({"error": "Not found"}), 404)

# Scenario 4: Rate-limiting (simulated with a fixed chance)
@app.route('/rate_limit', methods=['GET'])
def rate_limit():
    if random.randint(1, 10) <= 3:  # 30% chance to simulate rate limiting
        return make_response(jsonify({"error": "Rate limit exceeded"}), 429)
    return jsonify({"message": "Success"})

# Scenario 5: Empty response
@app.route('/empty_response', methods=['GET'])
def empty_response():
    if random.choice([True, False]):
        return make_response("", 204)  # Simulate an empty response with 204 No Content
    return jsonify({"message": "Success"})

if __name__ == '__main__':
    app.run(host='localhost', port=5000, debug=True)

To run the Flask app, use the command,

python mock_server.py

Step 1: Introducing Tenacity

Jafer decides to start with the basics. He knows that Tenacity will allow him to retry failed requests without cluttering his codebase with complex loops and error handling. So, he installs the library,

pip install tenacity

With Tenacity ready, Jafer decides to tackle his first problem, retrying a request that fails due to server errors.

Step 2: Retrying on Exceptions

He writes a simple function that fetches data from an API and wraps it with Tenacity’s @retry decorator

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(stop=stop_after_attempt(3),
        wait=wait_fixed(2),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_random_error():
    response = requests.get('http://localhost:5000/random_error')
    response.raise_for_status()  # Raises an HTTPError for 4xx/5xx responses
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_random_error()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This code will attempt the request up to 3 times, waiting 2 seconds between each try. Jafer feels confident that this will handle the occasional hiccup. However, he soon realizes that he needs more control over which exceptions trigger a retry.

Step 3: Handling Specific Exceptions

Jafer’s app sometimes receives a β€œ404 Not Found” error, which should not be retried because the resource doesn’t exist. He modifies the retry logic to handle only certain exceptions,

import requests
import logging
from tenacity import before_log, after_log
from requests.exceptions import HTTPError, Timeout
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed
 

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(stop=stop_after_attempt(3),
        wait=wait_fixed(2),
        retry=retry_if_exception_type((HTTPError, Timeout)),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get('http://localhost:5000/timeout', timeout=2)  # Set a short timeout to simulate failure
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries only on HTTPError or Timeout, avoiding unnecessary retries for a β€œ404” error. Jafer’s app is starting to feel more resilient!

Step 4: Implementing Exponential Backoff

A few days later, the team notices that they’re still getting rate-limited by some APIs. Jafer recalls the concept of exponential backoff a strategy where the wait time between retries increases exponentially, reducing the load on the server and preventing further rate limiting.

He decides to implement it,

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_exponential

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@retry(stop=stop_after_attempt(5),
       wait=wait_exponential(multiplier=1, min=2, max=10),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_rate_limit():
    response = requests.get('http://localhost:5000/rate_limit')
    response.raise_for_status()
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_rate_limit()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

With this code, the wait time starts at 2 seconds and doubles with each retry, up to a maximum of 10 seconds. Jafer’s app is now much less likely to be rate-limited!

Step 5: Retrying Based on Return Values

Jafer encounters another issue: some APIs occasionally return an empty response (204 No Content). These cases should also trigger a retry. Tenacity makes this easy with the retry_if_result feature,

import requests
import logging
from tenacity import before_log, after_log

from tenacity import retry, stop_after_attempt, retry_if_result

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
  

@retry(retry=retry_if_result(lambda x: x is None), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_empty_response():
    response = requests.get('http://localhost:5000/empty_response')
    if response.status_code == 204:
        return None  # Simulate an empty response
    response.raise_for_status()
    return response.json()
 
if __name__ == '__main__':
    try:
        data = fetch_empty_response()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries when it receives an empty response, ensuring that users get the data they need.

Step 6: Combining Multiple Retry Conditions

But Jafer isn’t done yet. Some situations require combining multiple conditions. He wants to retry on HTTPError, Timeout, or a None return value. With Tenacity’s retry_any feature, he can do just that,

import requests
import logging
from tenacity import before_log, after_log

from requests.exceptions import HTTPError, Timeout
from tenacity import retry_any, retry, retry_if_exception_type, retry_if_result, stop_after_attempt
 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(retry=retry_any(retry_if_exception_type((HTTPError, Timeout)), retry_if_result(lambda x: x is None)), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout")
    if response.status_code == 204:
        return None
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This approach covers all his bases, making the app even more resilient!

Step 7: Logging and Tracking Retries

As the app scales, Jafer wants to keep an eye on how often retries happen and why. He decides to add logging,

import logging
import requests
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
@retry(stop=stop_after_attempt(2), wait=wait_fixed(2),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout", timeout=2)
    response.raise_for_status()
    return response.json()

if __name__ == '__main__':
    try:
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This logs messages before and after each retry attempt, giving Jafer full visibility into the retry process. Now, he can monitor the app’s behavior in production and quickly spot any patterns or issues.

The Happy Ending

With Tenacity, Jafer has transformed his app into a resilient powerhouse that gracefully handles intermittent failures. Users are happy, the servers are humming along smoothly, and Jafer’s team has more time to work on new features rather than firefighting network errors.

By mastering Tenacity, Jafer has learned that handling network failures gracefully can turn a fragile app into a robust and reliable one. Whether it’s dealing with flaky APIs, network blips, or rate limits, Tenacity is his go-to tool for retrying operations in Python.

So, the next time your app faces unpredictable network challenges, remember Jafer’s story and give Tenacity a try you might just save the day!

❌
❌