❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #8 – SLI, SLA, SLO

25 December 2024 at 16:11

In this blog, i write about SLI, SLA, SLO . I got a refreshing session from a podcast https://open.spotify.com/episode/2Ags7x1WrxaFLRd3KBU50K?si=vbYtW_YVQpOi8HwT9AOM1g. This blog is about that.

In the world of service reliability and performance, the terms SLO, SLA, and SLI are often used interchangeably but have distinct meanings. This blog explains these terms in detail, their importance, and how they relate to each other with practical examples.

1. What are SLIs, SLOs, and SLAs?

Service Level Indicators (SLIs)

An SLI is a metric that quantifies the level of service provided by a system. It measures specific aspects of performance or reliability, such as response time, uptime, or error rate.

Example:

  • Percentage of successful HTTP requests over a time window.
  • Average latency of API responses.

Service Level Objectives (SLOs)

An SLO is a target value or range for an SLI. It defines what β€œacceptable” performance or reliability looks like from the perspective of the service provider or user.

Example:

  • β€œ99.9% of HTTP requests must succeed within 500ms.”
  • β€œThe application should have 99.95% uptime per quarter.”

Service Level Agreements (SLAs)

An SLA is a formal contract between a service provider and a customer that specifies the agreed-upon SLOs and the consequences of failing to meet them, such as penalties or compensations.

Example:

  • β€œIf the uptime drops below 99.5% in a calendar month, the customer will receive a 10% credit on their monthly bill.”

2. Relationship Between SLIs, SLOs, and SLAs

  • SLIs are the metrics measured.
  • SLOs are the goals or benchmarks derived from SLIs.
  • SLAs are agreements that formalize SLOs and include penalties or incentives.

SLI: Average latency of API requests.
SLO: 95% of API requests should have latency under 200ms.
SLA: If latency exceeds the SLO for two consecutive weeks, the provider will issue service credits.

3. Practical Examples

Example 1: Web Hosting Service

  • SLI: Percentage of time the website is available.
  • SLO: The website must be available 99.9% of the time per month.
  • SLA: If uptime falls below 99.9%, the customer will receive a refund of 20% of their monthly fee.

Example 2: Cloud Storage Service

  • SLI: Time taken to retrieve a file from storage.
  • SLO: 95% of retrieval requests must complete within 300ms.
  • SLA: If retrieval times exceed 300ms for more than 5% of requests in a billing cycle, customers will get free additional storage for the next month.

Example 3: API Service

  • SLI: Error rate of API responses.
  • SLO: Error rate must be below 0.1% for all requests in a day.
  • SLA: If the error rate exceeds 0.1% for more than three days in a row, the customer is entitled to a credit worth 5% of their monthly subscription fee.

Tool: Serial Activity – Remote SSH Manager

20 August 2024 at 02:16

Why this tool was created ?

During our college times, we had a crash course on Machine Learning. Our coordinators has arranged an ML Engineer to take class for 3 days. He insisted to install packages to have hands-on experience. But unfortunately many of our people were not sure about the installations of the packages. So we need to find a solution to install all necessary packages in all machines.

We had a scenario like, all the machines had one specific same user account with same password for all the machines. So we were like; if we are able to automate it in one machine then it would be easy for rest of the machines ( Just a for-loop iterating the x.0.0.1 to x.0.0.255 ). This is the birthplace of this tool.

Code=-

#!/usr/bin/env python
import sys
import os.path
from multiprocessing.pool import ThreadPool

import paramiko

BASE_ADDRESS = "192.168.7."
USERNAME = "t1"
PASSWORD = "uni1"


def create_client(hostname):
    """Create a SSH connection to a given hostname."""
    ssh_client = paramiko.SSHClient()
    ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh_client.connect(hostname=hostname, username=USERNAME, password=PASSWORD)
    ssh_client.invoke_shell()
    return ssh_client


def kill_computer(ssh_client):
    """Power off a computer."""
    ssh_client.exec_command("poweroff")


def install_python_modules(ssh_client):
    """Install the programs specified in requirements.txt"""
    ftp_client = ssh_client.open_sftp()

    # Move over get-pip.py
    local_getpip = os.path.expanduser("~/lab_freak/get-pip.py")
    remote_getpip = "/home/%s/Documents/get-pip.py" % USERNAME
    ftp_client.put(local_getpip, remote_getpip)

    # Move over requirements.txt
    local_requirements = os.path.expanduser("~/lab_freak/requirements.txt")
    remote_requirements = "/home/%s/Documents/requirements.txt" % USERNAME
    ftp_client.put(local_requirements, remote_requirements)

    ftp_client.close()

    # Install pip and the desired modules.
    ssh_client.exec_command("python %s --user" % remote_getpip)
    ssh_client.exec_command("python -m pip install --user -r %s" % remote_requirements)


def worker(action, hostname):
    try:
        ssh_client = create_client(hostname)

        if action == "kill":
            kill_computer(ssh_client)
        elif action == "install":
            install_python_modules(ssh_client)
        else:
            raise ValueError("Unknown action %r" % action)
    except BaseException as e:
        print("Running the payload on %r failed with %r" % (hostname, action))


def main():
    if len(sys.argv) < 2:
        print("USAGE: python kill.py ACTION")
        sys.exit(1)

    hostnames = [str(BASE_ADDRESS) + str(i) for i in range(30, 60)]

    with ThreadPool() as pool:
        pool.map(lambda hostname: worker(sys.argv[1], hostname), hostnames)


if __name__ == "__main__":
    main()


❌
❌