Mastering Request Retrying in Python with Tenacity: A Developer’s Journey

7 September 2024 at 01:49

Meet Jafer, a talented developer (self boast) working at a fast growing tech company. His team is building an innovative app that fetches data from multiple third-party APIs in realtime to provide users with up-to-date information.

Everything is going smoothly until one day, a spike in traffic causes their app to face a wave of “HTTP 500” and “Timeout” errors. Requests start failing left and right, and users are left staring at the dreaded “Data Unavailable” message.

Jafer realizes that he needs a way to make their app more resilient against these unpredictable network hiccups. That’s when he discovers Tenacity a powerful Python library designed to help developers handle retries gracefully.

Join Jafer as he dives into Tenacity and learns how to turn his app from fragile to robust with just a few lines of code!

Step 0: Mock FLASK Api

from flask import Flask, jsonify, make_response
import random
import time

app = Flask(__name__)

# Scenario 1: Random server errors
@app.route('/random_error', methods=['GET'])
def random_error():
    if random.choice([True, False]):
        return make_response(jsonify({"error": "Server error"}), 500)  # Simulate a 500 error randomly
    return jsonify({"message": "Success"})

# Scenario 2: Timeouts
@app.route('/timeout', methods=['GET'])
def timeout():
    time.sleep(5)  # Simulate a long delay that can cause a timeout
    return jsonify({"message": "Delayed response"})

# Scenario 3: 404 Not Found error
@app.route('/not_found', methods=['GET'])
def not_found():
    return make_response(jsonify({"error": "Not found"}), 404)

# Scenario 4: Rate-limiting (simulated with a fixed chance)
@app.route('/rate_limit', methods=['GET'])
def rate_limit():
    if random.randint(1, 10) <= 3:  # 30% chance to simulate rate limiting
        return make_response(jsonify({"error": "Rate limit exceeded"}), 429)
    return jsonify({"message": "Success"})

# Scenario 5: Empty response
@app.route('/empty_response', methods=['GET'])
def empty_response():
    if random.choice([True, False]):
        return make_response("", 204)  # Simulate an empty response with 204 No Content
    return jsonify({"message": "Success"})

if __name__ == '__main__':'localhost', port=5000, debug=True)

To run the Flask app, use the command,


Step 1: Introducing Tenacity

Jafer decides to start with the basics. He knows that Tenacity will allow him to retry failed requests without cluttering his codebase with complex loops and error handling. So, he installs the library,

pip install tenacity

With Tenacity ready, Jafer decides to tackle his first problem, retrying a request that fails due to server errors.

Step 2: Retrying on Exceptions

He writes a simple function that fetches data from an API and wraps it with Tenacity’s @retry decorator

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

logger = logging.getLogger(__name__)

        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_random_error():
    response = requests.get('http://localhost:5000/random_error')
    response.raise_for_status()  # Raises an HTTPError for 4xx/5xx responses
    return response.json()
if __name__ == '__main__':
        data = fetch_random_error()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This code will attempt the request up to 3 times, waiting 2 seconds between each try. Jafer feels confident that this will handle the occasional hiccup. However, he soon realizes that he needs more control over which exceptions trigger a retry.

Step 3: Handling Specific Exceptions

Jafer’s app sometimes receives a “404 Not Found” error, which should not be retried because the resource doesn’t exist. He modifies the retry logic to handle only certain exceptions,

import requests
import logging
from tenacity import before_log, after_log
from requests.exceptions import HTTPError, Timeout
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed

logger = logging.getLogger(__name__)

        retry=retry_if_exception_type((HTTPError, Timeout)),
        before=before_log(logger, logging.INFO),
        after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get('http://localhost:5000/timeout', timeout=2)  # Set a short timeout to simulate failure
    return response.json()

if __name__ == '__main__':
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries only on HTTPError or Timeout, avoiding unnecessary retries for a “404” error. Jafer’s app is starting to feel more resilient!

Step 4: Implementing Exponential Backoff

A few days later, the team notices that they’re still getting rate-limited by some APIs. Jafer recalls the concept of exponential backoff a strategy where the wait time between retries increases exponentially, reducing the load on the server and preventing further rate limiting.

He decides to implement it,

import requests
import logging
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_exponential

logger = logging.getLogger(__name__)

       wait=wait_exponential(multiplier=1, min=2, max=10),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_rate_limit():
    response = requests.get('http://localhost:5000/rate_limit')
    return response.json()
if __name__ == '__main__':
        data = fetch_rate_limit()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

With this code, the wait time starts at 2 seconds and doubles with each retry, up to a maximum of 10 seconds. Jafer’s app is now much less likely to be rate-limited!

Step 5: Retrying Based on Return Values

Jafer encounters another issue: some APIs occasionally return an empty response (204 No Content). These cases should also trigger a retry. Tenacity makes this easy with the retry_if_result feature,

import requests
import logging
from tenacity import before_log, after_log

from tenacity import retry, stop_after_attempt, retry_if_result

logger = logging.getLogger(__name__)

@retry(retry=retry_if_result(lambda x: x is None), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_empty_response():
    response = requests.get('http://localhost:5000/empty_response')
    if response.status_code == 204:
        return None  # Simulate an empty response
    return response.json()
if __name__ == '__main__':
        data = fetch_empty_response()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

Now, the function retries when it receives an empty response, ensuring that users get the data they need.

Step 6: Combining Multiple Retry Conditions

But Jafer isn’t done yet. Some situations require combining multiple conditions. He wants to retry on HTTPError, Timeout, or a None return value. With Tenacity’s retry_any feature, he can do just that,

import requests
import logging
from tenacity import before_log, after_log

from requests.exceptions import HTTPError, Timeout
from tenacity import retry_any, retry, retry_if_exception_type, retry_if_result, stop_after_attempt
logger = logging.getLogger(__name__)

@retry(retry=retry_any(retry_if_exception_type((HTTPError, Timeout)), retry_if_result(lambda x: x is None)), stop=stop_after_attempt(3), before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout")
    if response.status_code == 204:
        return None
    return response.json()

if __name__ == '__main__':
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This approach covers all his bases, making the app even more resilient!

Step 7: Logging and Tracking Retries

As the app scales, Jafer wants to keep an eye on how often retries happen and why. He decides to add logging,

import logging
import requests
from tenacity import before_log, after_log
from tenacity import retry, stop_after_attempt, wait_fixed

logger = logging.getLogger(__name__)
@retry(stop=stop_after_attempt(2), wait=wait_fixed(2),
       before=before_log(logger, logging.INFO),
       after=after_log(logger, logging.INFO))
def fetch_data():
    response = requests.get("http://localhost:5000/timeout", timeout=2)
    return response.json()

if __name__ == '__main__':
        data = fetch_data()
        print("Data fetched successfully:", data)
    except Exception as e:
        print("Failed to fetch data:", str(e))

This logs messages before and after each retry attempt, giving Jafer full visibility into the retry process. Now, he can monitor the app’s behavior in production and quickly spot any patterns or issues.

The Happy Ending

With Tenacity, Jafer has transformed his app into a resilient powerhouse that gracefully handles intermittent failures. Users are happy, the servers are humming along smoothly, and Jafer’s team has more time to work on new features rather than firefighting network errors.

By mastering Tenacity, Jafer has learned that handling network failures gracefully can turn a fragile app into a robust and reliable one. Whether it’s dealing with flaky APIs, network blips, or rate limits, Tenacity is his go-to tool for retrying operations in Python.

So, the next time your app faces unpredictable network challenges, remember Jafer’s story and give Tenacity a try you might just save the day! – To save you from un-noticed events

4 August 2024 at 05:19

Alex Pandian was the system administrator for a tech company, responsible for managing servers, maintaining network stability, and ensuring that everything ran smoothly.

With many scripts running daily and long-running processes that needed monitoring, Alex was constantly flooded with notifications.

Alex Pandian: “Every day, I have to gothrough dozens of emails and alerts just to find the ones that matter,”

Alex muttered while sipping coffee in the server room.

Alex Pandian: “There must be a better way to streamline all this information.”

Despite using several monitoring tools, the notifications from these systems were scattered and overwhelming. Alex needed a more efficient method to receive alerts only when crucial events occurred, such as script failures or the completion of resource-intensive tasks.

Determined to find a better system, Alex began searching online for a tool that could help consolidate and manage notifications.

After reading through countless forums and reviews, Alex stumbled upon a discussion about, a service praised for its simplicity and flexibility.

“This looks promising,” Alex thought, excited by the ability to publish and subscribe to notifications using a straightforward, topic-based system. The idea of having notifications sent directly to a phone or desktop without needing complex configurations was exactly what Alex was looking for.

Alex decided to consult with Sam, a fellow system admin known for their expertise in automation and monitoring.

Alex Pandian: “Hey Sam, have you ever used”

Sam: “Absolutely, It’s a lifesaver for managing notifications. How do you plan to use it?”

Alex Pandian: “I’m thinking of using it for real-time alerts on script failures and long-running commands, Can you show me how it works?”

Sam: “Of course,”

with a smile, eager to guide Alex through setting up to improve workflow efficiency.

Together, Sam and Alex began configuring for Alex’s environment. They focused on setting up topics and integrating them with existing systems to ensure that important notifications were delivered promptly.

Step 1: Identifying Key Topics

Alex identified the main areas where notifications were needed:

  • script-failures: To receive alerts whenever a script failed.
  • command-completions: To notify when long-running commands finished.
  • server-health: For critical server health alerts.

Step 2: Subscribing to Topics

Sam showed Alex how to subscribe to these topics using on a mobile device and desktop. This ensured that Alex would receive notifications wherever they were, without having to constantly check email or dashboards.

# Subscribe to topics
ntfy subscribe script-failures
ntfy subscribe command-completions
ntfy subscribe server-health

Step 3: Automating Notifications

Sam explained how to use bash scripts and curl to send notifications to whenever specific events occurred.

“For example, if a script fails, you can automatically send an alert to the ‘script-failures’ topic,” Sam demonstrated.

# Notify on script failure
./ || curl -d "Backup script failed!"

Alex was impressed by the simplicity and efficiency of this approach. “I can automate all of this?” Alex asked.

“Definitely,” Sam replied. “You can integrate it with cron jobs, monitoring tools, and more. It’s a great way to keep track of important events without getting bogged down by noise.”

With the basics in place, Alex began applying to various real-world scenarios, streamlining the notification process and improving overall efficiency.

Monitoring Script Failures

Alex set up automated alerts for critical scripts that ran daily, ensuring that any failures were immediately reported. This allowed Alex to address issues quickly, minimizing downtime and improving system reliability.

# Notify on critical script failure
./ || curl -d "Critical task script failed!"

Tracking Long-Running Commands

Whenever Alex initiated a long-running command, such as a server backup or data migration, notifications were sent upon completion. This enabled Alex to focus on other tasks without constantly checking on progress.

# Notify on long-running command completion
long-command && curl -d "Long command completed successfully."

Server Health Alerts

To monitor server health, Alex integrated with existing monitoring tools, ensuring that any critical issues were immediately flagged.

# Send server health alert
curl -d "Server CPU usage is critically high!"

As with any new tool, there were challenges to overcome. Alex encountered a few hurdles, but with Sam’s guidance, these were quickly resolved.

Challenge: Managing Multiple Notifications

Initially, Alex found it challenging to manage multiple notifications and ensure that only critical alerts were prioritized. Sam suggested using filters and priorities to focus on the most important messages.

# Subscribe with filters for high-priority alerts
ntfy subscribe script-failures --priority=high

Challenge: Scheduling Notifications

Alex wanted to schedule notifications for regular maintenance tasks and reminders. Sam introduced Alex to using cron for scheduling automated alerts.S

# Schedule notification for regular maintenance
echo "Time for weekly server maintenance." | at 8:00 AM next Saturday

Sam gave some more examples to alex,

Monitoring disk space

As a system administrator, you can use to receive alerts when disk space usage reaches a critical level. This helps prevent issues related to insufficient disk space.

# Check disk space and notify if usage is over 80%
disk_usage=$(df / | grep / | awk '{ print $5 }' | sed 's/%//g')
if [ $disk_usage -gt 80 ]; then
  curl -d "Warning: Disk space usage is at ${disk_usage}%."

Alerting on Website Downtime

You can use to monitor the status of a website and receive notifications if it goes down.

# Check website status and notify if it's down
status_code=$(curl -o /dev/null -s -w "%{http_code}\n" $website)

if [ $status_code -ne 200 ]; then
  curl -d "Alert: $website is down! Status code: $status_code."

Reminding for Daily Tasks

You can set up to send you daily reminders for important tasks, ensuring that you stay on top of your schedule.

# Schedule daily reminders
echo "Time to review your daily tasks!" | at 9:00 AM
echo "Stand-up meeting at 10:00 AM." | at 9:50 AM

Alerting on High System Load

Monitor system load and receive notifications when it exceeds a certain threshold, allowing you to take action before it impacts performance.

# Check system load and notify if it's high
load=$(uptime | awk '{ print $10 }' | sed 's/,//')

if (( $(echo "$load > $threshold" | bc -l) )); then
  curl -d "Warning: System load is high: $load"

Notify on Backup Completion

Receive a notification when a backup process completes, allowing you to verify its success.

# Notify on backup completion
$backup_command && curl -d "Backup completed successfully." || curl -d "Backup failed!"

Notifying on Container Events with Docker

Integrate with Docker to send alerts for specific container events, such as when a container stops unexpectedly.

# Notify on Docker container stop event
container_status=$(docker inspect -f '{{.State.Status}}' $container_name)

if [ "$container_status" != "running" ]; then
  curl -d "Alert: Docker container $container_name has stopped."

Integrating with CI/CD Pipelines

Use to notify you about the status of CI/CD pipeline stages, ensuring you stay informed about build successes or failures.

# Example GitLab CI/CD YAML snippet
  - build

  stage: build
    - make build
    - if [ "$CI_JOB_STATUS" == "success" ]; then
        curl -d "Build succeeded for commit $CI_COMMIT_SHORT_SHA.";
        curl -d "Build failed for commit $CI_COMMIT_SHORT_SHA.";

Notification on ssh login to server

Lets try with docker,

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
# Set root password for SSH access (change 'your_password' to your desired password)
RUN echo 'root:password' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN sed 's@session\s*required\s* optional' -i /etc/pam.d/sshd
COPY /usr/bin/
RUN chmod +x /usr/bin/
RUN echo "session optional /usr/bin/" >> /etc/pam.d/sshd
RUN apt-get -y update; apt-get -y install curl
CMD ["/usr/sbin/sshd", "-D"]

script to send notification,

if [ "${PAM_TYPE}" = "open_session" ]; then
  echo "here"
  curl \
    -H prio:high \
    -H tags:warning \
    -d "SSH login: ${PAM_USER} from ${PAM_RHOST}" \

With as an integral part of daily operations, Alex found a renewed sense of balance and control. The once overwhelming chaos of notifications was now a manageable stream of valuable information.

As Alex reflected on the journey, it was clear that had transformed not just the way notifications were managed, but also the overall approach to system administration.

In a world full of noise, had provided a clear and effective way to stay informed without distractions. For Alex, it was more than just a tool—it was a new way of managing systems efficiently.
