❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

RabbitMQ – All You Need To Know To Start Building Scalable Platforms

1 February 2025 at 02:39

  1. Introduction
  2. What is a Message Queue ?
  3. So Problem Solved !!! Not Yet
  4. RabbitMQ: Installation
  5. RabbitMQ: An Introduction (Optional)
    1. What is RabbitMQ?
    2. Why Use RabbitMQ?
    3. Key Features and Use Cases
  6. Building Blocks of Message Broker
    1. Connection & Channels
    2. Queues – Message Store
    3. Exchanges – Message Distributor and Binding
  7. Producing, Consuming and Acknowledging
  8. Problem #1 – Task Queue for Background Job Processing
    1. Context
    2. Problem
    3. Proposed Solution
  9. Problem #2 – Broadcasting NEWS to all subscribers
    1. Problem
    2. Solution Overview
    3. Step 1: Producer (Publisher)
    4. Step 2: Consumers (Subscribers)
      1. Consumer 1: Mobile App Notifications
      2. Consumer 2: Email Alerts
      3. Consumer 3: Web Notifications
      4. How It Works
  10. Intermediate Resources
    1. Prefetch Count
    2. Request Reply Pattern
    3. Dead Letter Exchange
    4. Alternate Exchanges
    5. Lazy Queues
    6. Quorom Queues
    7. Change Data Capture
    8. Handling Backpressure in Distributed Systems
    9. Choreography Pattern
    10. Outbox Pattern
    11. Queue Based Loading
    12. Two Phase Commit Protocol
    13. Competing Consumer
    14. Retry Pattern
    15. Can We Use Database as a Queue
  11. Let’s Connect

Introduction

Let’s take the example of an online food ordering system like Swiggy or Zomato. Suppose a user places an order through the mobile app. If the application follows a synchronous approach, it would first send the order request to the restaurant’s system and then wait for confirmation. If the restaurant is busy, the app will have to keep waiting until it receives a response.

If the restaurant’s system crashes or temporarily goes offline, the order will fail, and the user may have to restart the process.

This approach leads to a poor user experience, increases the chances of failures, and makes the system less scalable, as multiple users waiting simultaneously can cause a bottleneck.

In a traditional synchronous communication model, one service directly interacts with another and waits for a response before proceeding. While this approach is simple and works for small-scale applications, it introduces several challenges, especially in systems that require high availability and scalability.

The main problems with synchronous communication include slow performance, system failures, and scalability issues. If the receiving service is slow or temporarily unavailable, the sender has no choice but to wait, which can degrade the overall performance of the application.

Moreover, if the receiving service crashes, the entire process fails, leading to potential data loss or incomplete transactions.

In this book, we are going to solve how this can be solved with a message queue.

What is a Message Queue ?

A message queue is a system that allows different parts of an application (or different applications) to communicate with each other asynchronously by sending and receiving messages.

It acts like a buffer or an intermediary where messages are stored until the receiving service is ready to process them.

How It Works

  1. A producer (sender) creates a message and sends it to the queue.
  2. The message sits in the queue until a consumer (receiver) picks it up.
  3. The consumer processes the message and removes it from the queue.

This process ensures that the sender does not have to wait for the receiver to be available, making the system faster, more reliable, and scalable.

Real-Life Example

Imagine a fast-food restaurant where customers place orders at the counter. Instead of waiting at the counter for their food, customers receive a token number and move aside. The kitchen prepares the order in the background, and when it’s ready, the token number is called for pickup.

In this analogy,

  • The counter is the producer (sending orders).
  • The queue is the token system (storing orders).
  • The kitchen is the consumer (processing orders).
  • The customer picks up the food when ready (message is consumed).

Similarly, in applications, a message queue helps decouple systems, allowing them to work at their own pace without blocking each other. RabbitMQ, Apache Kafka, and Redis are popular message queue systems used in modern software development. πŸš€

So Problem Solved !!! Not Yet

It seems like problem is solved, but the message life cycle in the queue is need to handled.

  • Message Routing & Binding (Optional) – How a message is routed ?. If an exchange is used, the message is routed based on predefined rules.
  • Message Storage (Queue Retention) – How long a message stays in the queue. The message stays in the queue until a consumer picks it up.
  • If the consumer successfully processes the message, it sends an acknowledgment (ACK), and the message is removed. If the consumer fails, the message requeues or moves to a dead-letter queue (DLQ).
  • Messages that fail multiple times, are not acknowledged, or expire may be moved to a Dead-Letter Queue for further analysis.
  • Messages stored only in memory can be lost if RabbitMQ crashes.
  • Messages not consumed within their TTL expire.
  • If a consumer fails to acknowledge a message, it may be reprocessed twice.
  • Messages failing multiple times may be moved to a DLQ.
  • Too many messages in the queue due to slow consumers can cause system slowdowns.
  • Network failures can disrupt message delivery between producers, RabbitMQ, and consumers.
  • Messages with corrupt or bad data may cause repeated consumer failures.

To handle all the above problems, we need a tool. Stable, Battle tested, Reliable tool. RabbitMQ is one kind of that tool. In this book we will cover the basics of RabbitMQ.

RabbitMQ: Installation

For RabbitMQ Installation please refer to https://www.rabbitmq.com/docs/download. In this book we will go with RabbitMQ docker.

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:4.0-management


RabbitMQ: An Introduction (Optional)

What is RabbitMQ?

Imagine you’re sending messages between friends, but instead of delivering them directly, you drop them in a mailbox, and your friend picks them up when they are ready. RabbitMQ acts like this mailbox, but for computer programs. It helps applications communicate asynchronously, meaning they don’t have to wait for each other to process data.

RabbitMQ is a message broker, which means it handles and routes messages between different parts of an application. It ensures that messages are delivered efficiently, even when some components are running at different speeds or go offline temporarily.

Why Use RabbitMQ?

Modern applications often consist of multiple services that need to exchange data. Sometimes, one service produces data faster than another can consume it. Instead of forcing the slower service to catch up or making the faster service wait, RabbitMQ allows the fast service to place messages in a queue. The slow service can then process them at its own pace.

Some key benefits of using RabbitMQ include,

  • Decoupling services: Components communicate via messages rather than direct calls, reducing dependencies.
  • Scalability: RabbitMQ allows multiple consumers to process messages in parallel.
  • Reliability: It supports message durability and acknowledgments, preventing message loss.
  • Flexibility: Works with many programming languages and integrates well with different systems.
  • Efficient Load Balancing: Multiple consumers can share the message load to prevent overload on a single component.

Key Features and Use Cases

RabbitMQ is widely used in different applications, including

  • Chat applications: Messages are queued and delivered asynchronously to users.
  • Payment processing: Orders are placed in a queue and processed sequentially.
  • Event-driven systems: Used for microservices communication and event notification.
  • IoT systems: Devices publish data to RabbitMQ, which is then processed by backend services.
  • Job queues: Background tasks such as sending emails or processing large files.

Building Blocks of Message Broker

Connection & Channels

In RabbitMQ, connections and channels are fundamental concepts for communication between applications and the broker,

Connections: A connection is a TCP link between a client (producer or consumer) and the RabbitMQ broker. Each connection consumes system resources and is relatively expensive to create and maintain.

Channels: A channel is a virtual communication path inside a connection. It allows multiple logical streams of data over a single TCP connection, reducing overhead. Channels are lightweight and preferred for performing operations like publishing and consuming messages.

Queues – Message Store

A queue is a message buffer that temporarily holds messages until a consumer retrieves and processes them.

1. Queues operate on a FIFO (First In, First Out) basis, meaning messages are processed in the order they arrive (unless priorities or other delivery strategies are set).

2. Queues persist messages if they are declared as durable and the messages are marked as persistent, ensuring reliability even if RabbitMQ restarts.

3. Multiple consumers can subscribe to a queue, and messages can be distributed among them in a round-robin manner.

Consumption by multiple consumers,

Can also be broadcasted,

4. If no consumers are available, messages remain in the queue until a consumer connects.

Analogy: Think of a queue as a to-do list where tasks (messages) are stored until someone (a worker/consumer) picks them up and processes them.

Exchanges – Message Distributor and Binding

An exchange is responsible for routing messages to one or more queues based on routing rules.

When a producer sends a message, it doesn’t go directly to a queue but first reaches an exchange, which decides where to forward it.πŸ”₯

The blue color line is called as Binding. A binding is the link between the exchange and the queue, guiding messages to the right place.

RabbitMQ supports different types of exchanges

Direct Exchange (direct)

  • Routes messages to queues based on an exact match between the routing key and the queue’s binding key.
  • Example: Sending messages to a specific queue based on a severity level (info, error, warning).


Fanout Exchange (fanout)

  • Routes messages to all bound queues, ignoring routing keys.
  • Example: Broadcasting notifications to multiple services at once.

Topic Exchange (topic)

  • Routes messages based on pattern matching using * (matches one word) and # (matches multiple words).
  • Example: Routing logs where log.info goes to one queue, log.error goes to another, and log.* captures all.

Headers Exchange (headers)

  • Routes messages based on message headers instead of routing keys.
  • Example: Delivering messages based on metadata like device: mobile or region: US.

Analogy: An exchange is like a traffic controller that decides which road (queue) a vehicle (message) should take based on predefined rules.

Binding

A binding is a link between an exchange and a queue that defines how messages should be routed.

  • When a queue is bound to an exchange with a binding key, messages with a matching routing key are delivered to that queue.
  • A queue can have multiple bindings to different exchanges, allowing it to receive messages from multiple sources.

Example:

  • A queue named error_logs can be bound to a direct exchange with a binding key error.
  • Another queue, all_logs, can be bound to the same exchange with a binding key # (wildcard in a topic exchange) to receive all logs.

Analogy: A binding is like a GPS route guiding messages (vehicles) from the exchange (traffic controller) to the right queue (destination).

Producing, Consuming and Acknowledging

RabbitMQ follows the producer-exchange-queue-consumer model,

  • Producing messages (Publishing): A producer creates a message and sends it to RabbitMQ, which routes it to the correct queue.
  • Consuming messages (Subscribing): A consumer listens for messages from the queue and processes them.
  • Acknowledgment: The consumer sends an acknowledgment (ack) after successfully processing a message.
  • Durability: Ensures messages and queues survive RabbitMQ restarts.

Why do we need an Acknowledgement ?

  1. Ensures message reliability – Prevents messages from being lost if a consumer crashes.
  2. Prevents message loss – Messages are redelivered if no ACK is received.
  3. Avoids unintentional message deletion – Messages stay in the queue until properly processed.
  4. Supports at-least-once delivery – Ensures every message is processed at least once.
  5. Enables load balancing – Distributes messages fairly among multiple consumers.
  6. Allows manual control – Consumers can acknowledge only after successful processing.
  7. Handles redelivery – Messages can be requeued and sent to another consumer if needed.

Problem #1 – Task Queue for Background Job Processing

Context

A company runs an image processing application where users upload images that need to be resized, watermarked, and optimized before they can be served. Processing these images synchronously would slow down the user experience, so the company decides to implement an asynchronous task queue using RabbitMQ.

Problem

  • Users upload large images that require multiple processing steps.
  • Processing each image synchronously blocks the application, leading to slow response times.
  • High traffic results in queue buildup, making it challenging to scale the system efficiently.

Proposed Solution

1. Producer Service

  • Publishes image processing tasks to a RabbitMQ exchange (task_exchange).
  • Sends the image filename as the message body to the queue (image_queue).

2. Worker Consumers

  • Listen for new image processing tasks from the queue.
  • Process each image (resize, watermark, optimize, etc.).
  • Acknowledge completion to ensure no duplicate processing.

3. Scalability

  • Multiple workers can run in parallel to process images faster.

producer.py

import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare exchange and queue
channel.exchange_declare(exchange='task_exchange', exchange_type='direct')
channel.queue_declare(queue='image_queue')

# Bind queue to exchange
channel.queue_bind(exchange='task_exchange', queue='image_queue', routing_key='image_task')

# List of images to process
images = ["image1.jpg", "image2.jpg", "image3.jpg"]

for image in images:
    channel.basic_publish(exchange='task_exchange', routing_key='image_task', body=image)
    print(f" [x] Sent {image}")

connection.close()

consumer.py

import pika
import time

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare exchange and queue
channel.exchange_declare(exchange='task_exchange', exchange_type='direct')
channel.queue_declare(queue='image_queue')

# Bind queue to exchange
channel.queue_bind(exchange='task_exchange', queue='image_queue', routing_key='image_task')

def process_image(ch, method, properties, body):
    print(f" [x] Processing {body.decode()}")
    time.sleep(2)  # Simulate processing time
    print(f" [x] Finished {body.decode()}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

# Start consuming
channel.basic_consume(queue='image_queue', on_message_callback=process_image)
print(" [*] Waiting for image tasks. To exit press CTRL+C")
channel.start_consuming()

Problem #2 – Broadcasting NEWS to all subscribers

Problem

A news application wants to send breaking news alerts to all subscribers, regardless of their location or interest.

Use a fanout exchange (news_alerts_exchange) to broadcast messages to all connected queues, ensuring all users receive the alert.

πŸ”Ή Example

  • mobile_app_queue (for users receiving push notifications)
  • email_alert_queue (for users receiving email alerts)
  • web_notification_queue (for users receiving notifications on the website)

Solution Overview

  • We create a fanout exchange called news_alerts_exchange.
  • Multiple queues (mobile_app_queue, email_alert_queue, and web_notification_queue) are bound to this exchange.
  • A producer publishes messages to the exchange.
  • Each consumer listens to its respective queue and receives the alert.

Step 1: Producer (Publisher)

This script publishes a breaking news alert to the fanout exchange.

import pika

# Establish connection
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()

# Declare a fanout exchange
channel.exchange_declare(exchange="news_alerts_exchange", exchange_type="fanout")

# Publish a message
message = "Breaking News: Major event happening now!"
channel.basic_publish(exchange="news_alerts_exchange", routing_key="", body=message)

print(f" [x] Sent: {message}")

# Close connection
connection.close()

Step 2: Consumers (Subscribers)

Each consumer listens to its respective queue and processes the alert.

Consumer 1: Mobile App Notifications

import pika

# Establish connection
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()

# Declare exchange
channel.exchange_declare(exchange="news_alerts_exchange", exchange_type="fanout")

# Declare a queue (auto-delete if no consumers)
queue_name = "mobile_app_queue"
channel.queue_declare(queue=queue_name)
channel.queue_bind(exchange="news_alerts_exchange", queue=queue_name)

# Callback function
def callback(ch, method, properties, body):
    print(f" [Mobile App] Received: {body.decode()}")

# Consume messages
channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)
print(" [*] Waiting for news alerts...")
channel.start_consuming()

Consumer 2: Email Alerts

import pika

connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()

channel.exchange_declare(exchange="news_alerts_exchange", exchange_type="fanout")

queue_name = "email_alert_queue"
channel.queue_declare(queue=queue_name)
channel.queue_bind(exchange="news_alerts_exchange", queue=queue_name)

def callback(ch, method, properties, body):
    print(f" [Email Alert] Received: {body.decode()}")

channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)
print(" [*] Waiting for news alerts...")
channel.start_consuming()

Consumer 3: Web Notifications

import pika

connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()

channel.exchange_declare(exchange="news_alerts_exchange", exchange_type="fanout")

queue_name = "web_notification_queue"
channel.queue_declare(queue=queue_name)
channel.queue_bind(exchange="news_alerts_exchange", queue=queue_name)

def callback(ch, method, properties, body):
    print(f" [Web Notification] Received: {body.decode()}")

channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)
print(" [*] Waiting for news alerts...")
channel.start_consuming()

How It Works

  1. The producer sends a news alert to the fanout exchange (news_alerts_exchange).
  2. All queues (mobile_app_queue, email_alert_queue, web_notification_queue) bound to the exchange receive the message.
  3. Each consumer listens to its queue and processes the alert.

This setup ensures all users receive the alert simultaneously across different platforms. πŸš€

Intermediate Resources

Prefetch Count

Prefetch is a mechanism that defines how many messages can be delivered to a consumer at a time before the consumer sends an acknowledgment back to the broker. This ensures that the consumer does not get overwhelmed with too many unprocessed messages, which could lead to high memory usage and potential performance issues.

To Know More: https://parottasalna.com/2024/12/29/learning-notes-16-prefetch-count-rabbitmq/

Request Reply Pattern

The Request-Reply Pattern is a fundamental communication style in distributed systems, where a requester sends a message to a responder and waits for a reply. It’s widely used in systems that require synchronous communication, enabling the requester to receive a response for further processing.

To Know More: https://parottasalna.com/2024/12/28/learning-notes-15-request-reply-pattern-rabbitmq/

Dead Letter Exchange

A dead letter is a message that cannot be delivered to its intended queue or is rejected by a consumer. Common scenarios where messages are dead lettered include,

  1. Message Rejection: A consumer explicitly rejects a message without requeuing it.
  2. Message TTL (Time-To-Live) Expiry: The message remains in the queue longer than its TTL.
  3. Queue Length Limit: The queue has reached its maximum capacity, and new messages are dropped.
  4. Routing Failures: Messages that cannot be routed to any queue from an exchange.

To Know More: https://parottasalna.com/2024/12/28/learning-notes-14-dead-letter-exchange-rabbitmq/

Alternate Exchanges

An alternate exchange in RabbitMQ is a fallback exchange configured for another exchange. If a message cannot be routed to any queue bound to the primary exchange, RabbitMQ will publish the message to the alternate exchange instead. This mechanism ensures that undeliverable messages are not lost but can be processed in a different way, such as logging, alerting, or storing them for later inspection.

To Know More: https://parottasalna.com/2024/12/27/learning-notes-12-alternate-exchanges-rabbitmq/

Lazy Queues

  • Lazy Queues are designed to store messages primarily on disk rather than in memory.
  • They are optimized for use cases involving large message backlogs where minimizing memory usage is critical.

To Know More: https://parottasalna.com/2024/12/26/learning-notes-10-lazy-queues-rabbitmq/

Quorom Queues

  • Quorum Queues are distributed queues built on the Raft consensus algorithm.
  • They are designed for high availability, durability, and data safety by replicating messages across multiple nodes in a RabbitMQ cluster.
  • Its a replacement of Mirrored Queues.

To Know More: https://parottasalna.com/2024/12/25/learning-notes-9-quorum-queues-rabbitmq/

Change Data Capture

CDC stands for Change Data Capture. It’s a technique that listens to a database and captures every change that happens in it. These changes can then be sent to other systems to,

  • Keep data in sync across multiple databases.
  • Power real-time analytics dashboards.
  • Trigger notifications for certain database events.
  • Process data streams in real time.

To Know More: https://parottasalna.com/2025/01/19/learning-notes-63-change-data-capture-what-does-it-do/

Handling Backpressure in Distributed Systems

Backpressure occurs when a downstream system (consumer) cannot keep up with the rate of data being sent by an upstream system (producer). In distributed systems, this can arise in scenarios such as

  • A message queue filling up faster than it is drained.
  • A database struggling to handle the volume of write requests.
  • A streaming system overwhelmed by incoming data.

To Know More: https://parottasalna.com/2025/01/07/learning-notes-45-backpressure-handling-in-distributed-systems/

Choreography Pattern

In the Choreography Pattern, services communicate directly with each other via asynchronous events, without a central controller. Each service is responsible for a specific part of the workflow and responds to events produced by other services. This pattern allows for a more autonomous and loosely coupled system.

To Know More: https://parottasalna.com/2025/01/05/learning-notes-38-choreography-pattern-cloud-pattern/

Outbox Pattern

The Outbox Pattern is a proven architectural solution to this problem, helping developers manage data consistency, especially when dealing with events, messaging systems, or external APIs.

To Know More: https://parottasalna.com/2025/01/03/learning-notes-31-outbox-pattern-cloud-pattern/

Queue Based Loading

The Queue-Based Loading Pattern leverages message queues to decouple and coordinate tasks between producers (such as applications or services generating data) and consumers (services or workers processing that data). By using queues as intermediaries, this pattern allows systems to manage workloads efficiently, ensuring seamless and scalable operation.

To Know More: https://parottasalna.com/2025/01/03/learning-notes-30-queue-based-loading-cloud-patterns/

Two Phase Commit Protocol

The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in transactions spanning multiple nodes or databases. Atomicity ensures that either all parts of a transaction are committed or none are, maintaining consistency in distributed systems.

To Know More: https://parottasalna.com/2025/01/03/learning-notes-29-two-phase-commit-protocol-acid-in-distributed-systems/

Competing Consumer

The competing consumer pattern involves multiple consumers that independently compete to process messages or tasks from a shared queue. This pattern is particularly effective in scenarios where the rate of incoming tasks is variable or high, as it allows multiple consumers to process tasks concurrently.

To Know More: https://parottasalna.com/2025/01/01/learning-notes-24-competing-consumer-messaging-queue-patterns/

Retry Pattern

The Retry Pattern is a design strategy used to manage transient failures by retrying failed operations. Instead of immediately failing an operation after an error, the pattern retries it with an optional delay or backoff strategy. This is particularly useful in distributed systems where failures are often temporary.

To Know More: https://parottasalna.com/2024/12/31/learning-notes-23-retry-pattern-cloud-patterns/

Can We Use Database as a Queue

Developers try to use their RDBMS as a way to do background processing or service communication. While this can often appear to β€˜get the job done’, there are a number of limitations and concerns with this approach.

There are two divisions to any asynchronous processing: the service(s) that create processing tasks and the service(s) that consume and process these tasks accordingly.

To Know More: https://parottasalna.com/2024/06/15/can-we-use-database-as-queue-in-asynchronous-process/

Let’s Connect

Telegram: https://t.me/parottasalna/1

LinkedIn: https://www.linkedin.com/in/syedjaferk/

Whatsapp Channel: https://whatsapp.com/channel/0029Vavu8mF2v1IpaPd9np0s

Youtube: https://www.youtube.com/@syedjaferk

Github: https://github.com/syedjaferk/

Learning Notes #55 – API Keys and Tokens

14 January 2025 at 05:27

Tokens and API keys are foundational tools that ensure secure communication between systems. They enable authentication, authorization, and access control, facilitating secure data exchange.

What Are Tokens?

Tokens are digital objects that represent a specific set of permissions or claims. They are often used in authentication and authorization processes to verify a user’s identity or grant access to resources. Tokens can be time-bound and carry information like:

  1. User Identity: Information about the user or system initiating the request.
  2. Scope of Access: Details about what actions or resources the token permits.
  3. Validity Period: Start and expiry times for the token.

Common Types of Tokens:

  • JWT (JSON Web Tokens): Compact, URL-safe tokens containing a payload, signature, and header.
  • Opaque Tokens: Tokens without embedded information; they require validation against a server.
  • Refresh Tokens: Used to obtain a new access token when the current one expires.

What Are API Keys?

API keys are unique identifiers used to authenticate applications or systems accessing APIs. They are simple to use and act as a credential to allow systems to make authorized API calls.

Key Characteristics:

  • Static Credential: Unlike tokens, API keys do not typically expire unless explicitly revoked.
  • Simple to Use: They are easy to implement and often passed in headers or query parameters.
  • Application-Specific: Keys are tied to specific applications rather than user accounts.

Functionalities and Usage

Both tokens and API keys enable secure interaction between systems, but their application depends on the scenario

1. Authentication

  • Tokens: Often used for user authentication in web apps and APIs.
    • Example: A JWT issued after login is included in subsequent API requests to validate the user’s session.
  • API Keys: Authenticate applications rather than users.
    • Example: A weather app uses an API key to fetch data from a weather API.

2. Authorization

  • Tokens: Define user-specific permissions and roles.
    • Example: A token allows read-only access to specific resources for a particular user.
  • API Keys: Grant access to predefined resources for the application.
    • Example: An API key allows access to public datasets but restricts write operations.

3. Rate Limiting and Monitoring

Both tokens and API keys can be used to

  • Enforce usage limits.
  • Monitor and log API usage for analytics and security.

Considerations for Secure Implementation

1. For Tokens

  • Use HTTPS: Always transmit tokens over HTTPS to prevent interception.
  • Implement Expiry: Set reasonable expiry times to minimize risks.
  • Adopt Refresh Tokens: Allow users to obtain new tokens securely when access tokens expire.
  • Validate Signatures: For JWTs, validate the signature to ensure the token’s integrity.

2. For API Keys

  • Restrict IP Usage: Limit the key’s use to specific IPs or networks.
  • Set Permissions: Assign the minimum required permissions for the API key.
  • Regenerate Periodically: Refresh keys periodically to mitigate risks.
  • Monitor Usage: Track API key usage for anomalies and revoke compromised keys promptly.

3. For Both

  • Avoid Hardcoding: Never embed tokens or keys in source code. Use environment variables or secure vaults.
  • Audit and Rotate: Regularly audit and rotate keys and tokens to maintain security.
  • Educate Users: Ensure users and developers understand secure handling practices.

Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

  1. Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
  2. Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

  1. Fixed Number of Partitions
    • The number of partitions is determined during system initialization (e.g., 8 partitions).
    • Data is assigned to these partitions based on a consistent hashing algorithm.
  2. Stable Data Mapping
    • Each piece of data is permanently mapped to a specific partition.
    • This eliminates the need for large-scale data reshuffling when scaling the system.
  3. Adjustable Partition-to-Server Mapping
    • Partitions can be reassigned to different servers as the system scales.
    • Only the physical location of the partitions changes; the fixed mapping remains intact.
  4. Balanced Load Distribution
    • Partitions are distributed evenly across servers to balance the workload.
    • Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

  • Server 1: transactions_p0, transactions_p1
  • Server 2: transactions_p2, transactions_p3
  • Server 3: transactions_p4, transactions_p5
  • Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

  • Server 1: transactions_p0
  • Server 2: transactions_p1
  • Server 3: transactions_p2, transactions_p3
  • Server 4: transactions_p4
  • Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

  • During the migration, transactions_p5 is copied from Server 3 to Server 5.
  • Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

  1. Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
  2. Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
  3. Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

  1. The copying is limited to the partition being reassigned.
  2. No data within the partition is reorganized.
  3. Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

  • Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
  • Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

Learning Notes #48 – Common Pitfalls in Event Driven Architecture

8 January 2025 at 15:04

Today, i came across Raul Junco post on mistakes in Event Driven Architecture – https://www.linkedin.com/posts/raul-junco_after-years-building-event-driven-systems-activity-7278770394046631936-zu3-?utm_source=share&utm_medium=member_desktop. In this blog i am highlighting the same for future reference.

Event-driven architectures are awesome, but they come with their own set of challenges. Missteps can lead to unreliable systems, inconsistent data, and frustrated users. Let’s explore some of the most common pitfalls and how to address them effectively.

1. Duplication

Idempotent APIs – https://parottasalna.com/2025/01/08/learning-notes-47-idempotent-post-requests/

Events often get re-delivered due to retries or system failures. Without proper handling, duplicate events can,

  • Charge a customer twice for the same transaction: Imagine a scenario where a payment service retries a payment event after a temporary network glitch, resulting in a duplicate charge.
  • Cause duplicate inventory updates: For example, an e-commerce platform might update stock levels twice for a single order, leading to overestimating available stock.
  • Create inconsistent or broken system states: Duplicates can cascade through downstream systems, introducing mismatched or erroneous data.

Solution:

  • Assign unique IDs: Ensure every event has a globally unique identifier. Consumers can use these IDs to detect and discard duplicates.
  • Design idempotent processing: Structure your operations so they produce the same outcome even when executed multiple times. For instance, an API updating inventory could always set stock levels to a specific value rather than incrementing or decrementing.

2. Not Guaranteeing Order

Events can arrive out of order when distributed across partitions or queues. This can lead to

  • Processing a refund before the payment: If a refund event is processed before the corresponding payment event, the system might show a negative balance or fail to reconcile properly.
  • Breaking logic that relies on correct sequence: Certain workflows, such as assembling logs or transactional data, depend on a strict event order to function correctly.

Solution

  • Use brokers with ordering guarantees: Message brokers like Apache Kafka support partition-level ordering. Design your topics and partitions to align with entities requiring ordered processing (e.g., user or account ID).
  • Add sequence numbers or timestamps: Include metadata in events to indicate their position in a sequence. Consumers can use this data to reorder events if necessary, ensuring logical consistency.

3. The Dual Write Problem

Outbox Pattern: https://parottasalna.com/2025/01/03/learning-notes-31-outbox-pattern-cloud-pattern/

When writing to a database and publishing an event, one might succeed while the other fails. This can

  • Lose events: If the event is not published after the database write, downstream systems might remain unaware of critical changes, such as a new order or a status update.
  • Cause mismatched states: For instance, a transaction might be logged in a database but not propagated to analytical or monitoring systems, creating inconsistencies.

Solution

  • Use the Transactional Outbox Pattern: In this pattern, events are written to an β€œoutbox” table within the same database transaction as the main data write. A separate process then reads from the outbox and publishes events reliably.
  • Adopt Change Data Capture (CDC) tools: CDC tools like Debezium can monitor database changes and publish them as events automatically, ensuring no changes are missed.

4. Non-Backward-Compatible Changes

Changing event schemas without considering existing consumers can break systems. For example:

  • Removing a field: A consumer relying on this field might encounter null values or fail altogether.
  • Renaming or changing field types: This can lead to deserialization errors or misinterpretation of data.

Solution:

  • Maintain versioned schemas: Introduce new schema versions incrementally and ensure consumers can continue using older versions during the transition.
  • Use schema evolution-friendly formats: Formats like Avro or Protobuf natively support schema evolution, allowing you to add fields or make other non-breaking changes easily.
  • Add adapters for compatibility: Build adapters or translators that transform events from new schemas to older formats, ensuring backward compatibility for legacy systems.

Learning Notes #40 – SAGA Pattern | Cloud Patterns

5 January 2025 at 17:08

Today, I learnt about SAGA Pattern, followed by Compensation Pattern, Orchestration Pattern, Choreography Pattern and Two Phase Commit. SAGA is a combination of all the above. In this blog, i jot down notes on SAGA, for my future self.

Modern software applications often require the coordination of multiple distributed services to perform complex business operations. In such systems, ensuring consistency and reliability can be challenging, especially when a failure occurs in one of the services. The SAGA design pattern offers a robust solution to manage distributed transactions while maintaining data consistency.

What is the SAGA Pattern?

The SAGA pattern is a distributed transaction management mechanism where a series of independent operations (or steps) are executed sequentially across multiple services. Each operation in the sequence has a corresponding compensating action to roll back changes if a failure occurs. This approach avoids the complexities of distributed transactions, such as two-phase commits, by breaking down the process into smaller, manageable units.

Key Characteristics

  1. Decentralized Control: Transactions are managed across services without a central coordinator.
  2. Compensating Transactions: Every operation has an undo or rollback mechanism.
  3. Asynchronous Communication: Services communicate asynchronously in most implementations, ensuring loose coupling.

Types of SAGA Patterns

There are two primary types of SAGA patterns:

1. Choreography-Based SAGA

  • In this approach, services communicate with each other directly to coordinate the workflow.
  • Each service knows which operation to trigger next after completing its own task.
  • If a failure occurs, each service initiates its compensating action to roll back changes.

Advantages:

  • Simple implementation.
  • No central coordinator required.

Disadvantages:

  • Difficult to manage and debug in complex workflows.
  • Tight coupling between services.
import pika

class RabbitMQHandler:
    def __init__(self, queue):
        self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
        self.channel = self.connection.channel()
        self.channel.queue_declare(queue=queue)
        self.queue = queue

    def publish(self, message):
        self.channel.basic_publish(exchange='', routing_key=self.queue, body=message)

    def consume(self, callback):
        self.channel.basic_consume(queue=self.queue, on_message_callback=callback, auto_ack=True)
        self.channel.start_consuming()

# Define services
class FlightService:
    def book_flight(self):
        print("Flight booked.")
        RabbitMQHandler('hotel_queue').publish("flight_booked")

class HotelService:
    def on_flight_booked(self, ch, method, properties, body):
        try:
            print("Hotel booked.")
            RabbitMQHandler('invoice_queue').publish("hotel_booked")
        except Exception:
            print("Failed to book hotel. Rolling back flight.")
            FlightService().cancel_flight()

    def cancel_flight(self):
        print("Flight booking canceled.")

# Setup RabbitMQ
flight_service = FlightService()
hotel_service = HotelService()

RabbitMQHandler('hotel_queue').consume(hotel_service.on_flight_booked)

# Trigger the workflow
flight_service.book_flight()

2. Orchestration-Based SAGA

  • A central orchestrator service manages the workflow and coordinates between the services.
  • The orchestrator determines the sequence of operations and handles compensating actions in case of failures.

Advantages:

  • Clear control and visibility of the workflow.
  • Easier to debug and manage.

Disadvantages:

  • The orchestrator can become a single point of failure.
  • More complex implementation.
import pika

class Orchestrator:
    def __init__(self):
        self.rabbitmq = RabbitMQHandler('orchestrator_queue')

    def execute_saga(self):
        try:
            self.reserve_inventory()
            self.process_payment()
            self.generate_invoice()
        except Exception as e:
            print(f"Error occurred: {e}. Initiating rollback.")
            self.compensate()

    def reserve_inventory(self):
        print("Inventory reserved.")
        self.rabbitmq.publish("inventory_reserved")

    def process_payment(self):
        print("Payment processed.")
        self.rabbitmq.publish("payment_processed")

    def generate_invoice(self):
        print("Invoice generated.")
        self.rabbitmq.publish("invoice_generated")

    def compensate(self):
        print("Rolling back invoice.")
        print("Rolling back payment.")
        print("Rolling back inventory.")

# Trigger the workflow
Orchestrator().execute_saga()

How SAGA Works

  1. Transaction Initiation: The first operation is executed by one of the services.
  2. Service Communication: Subsequent services execute their operations based on the outcome of the previous step.
  3. Failure Handling: If an operation fails, compensating transactions are triggered in reverse order to undo any changes.
  4. Completion: Once all operations are successfully executed, the transaction is considered complete.

Benefits of the SAGA Pattern

  1. Improved Resilience: Allows partial rollbacks in case of failure.
  2. Scalability: Suitable for microservices and distributed systems.
  3. Flexibility: Works well with event-driven architectures.
  4. No Global Locks: Unlike traditional transactions, SAGA does not require global locking of resources.

Challenges and Limitations

  1. Complexity in Rollbacks: Designing compensating transactions for every operation can be challenging.
  2. Data Consistency: Achieving eventual consistency may require additional effort.
  3. Debugging Issues: Debugging failures in a distributed environment can be cumbersome.
  4. Latency: Sequential execution may increase overall latency.

When to Use the SAGA Pattern

  • Distributed systems where global ACID transactions are infeasible.
  • Microservices architectures with independent services.
  • Applications requiring high resilience and eventual consistency.

Real-World Applications

  1. E-Commerce Platforms: Managing orders, payments, and inventory updates.
  2. Travel Booking Systems: Coordinating flight, hotel, and car rental reservations.
  3. Banking Systems: Handling distributed account updates and transfers.
  4. Healthcare: Coordinating appointment scheduling and insurance claims.

Learning Notes #39 – Compensation Pattern | Cloud Pattern

5 January 2025 at 12:50

Today i learnt about compensation pattern, where it rollback a transactions when it face some failures. In this blog i jot down notes on compensating pattern and how it relates with SAGA pattern.

Distributed systems often involve multiple services working together to perform a business operation. Ensuring data consistency and reliability across these services is challenging, especially in cases of failure. One solution is the use of compensation transactions, a mechanism designed to maintain consistency by reversing the effects of previous operations when errors occur.

What Are Compensation Transactions?

A compensation transaction is an operation that undoes the effect of a previously executed operation. Unlike traditional rollback mechanisms in centralized databases, compensation transactions are explicitly defined and executed in distributed systems to maintain consistency after a failure.

Key Characteristics

  • Explicit Definition: Compensation logic must be explicitly implemented.
  • Independent Execution: Compensation operations are separate from the main transaction.
  • Eventual Consistency: Ensures the system reaches a consistent state over time.
  • Asynchronous Nature: Often triggered asynchronously to avoid blocking main processes.

Why Are Compensation Transactions Important?

1. Handling Failures in Distributed Systems

In a distributed architecture, such as microservices, different services may succeed or fail independently. Compensation transactions allow partial rollbacks to maintain overall consistency.

2. Avoiding Global Locking

Traditional transactions with global locks (e.g., two-phase commits) are not feasible in distributed systems due to performance and scalability concerns. Compensation transactions provide a more flexible alternative.

3. Resilience and Fault Tolerance

Compensation mechanisms make systems more resilient by allowing recovery from failures without manual intervention.

How Compensation Transactions Work

  1. Perform Main Operations: Each service performs its assigned operation, such as creating a record or updating a database.
  2. Log Operations: Log actions and context to enable compensating transactions if needed.
  3. Detect Failure: Monitor the workflow for errors or failures in any service.
  4. Trigger Compensation: If a failure occurs, execute compensation transactions for all successfully completed operations to undo their effects.

Example Workflow

Imagine an e-commerce checkout process involving three steps

  • Step 1: Reserve inventory.
  • Step 2: Deduct payment.
  • Step 3: Confirm order.

If Step 3 fails, compensation transactions for Steps 1 and 2 might include

  • Releasing the reserved inventory.
  • Refunding the payment.

Design Considerations for Compensation Transactions

1. Idempotency

Ensure compensating actions are idempotent, meaning they can be executed multiple times without unintended side effects. This is crucial in distributed systems where retries are common.

2. Consistency Model

Adopt an eventual consistency model to align with the asynchronous nature of compensation transactions.

3. Error Handling

Design robust error-handling mechanisms for compensating actions, as these too can fail.

4. Service Communication

Use reliable communication protocols (e.g., message queues) to trigger and manage compensation transactions.

5. Isolation of Compensation Logic

Keep compensation logic isolated from the main business logic to maintain clarity and modularity.

Use Cases for Compensation Transactions

1. Financial Systems

  • Reversing failed fund transfers or unauthorized transactions.
  • Refunding payments in e-commerce platforms.

2. Travel and Booking Systems

  • Canceling a hotel reservation if flight booking fails.
  • Releasing blocked seats if payment is not completed.

3. Healthcare Systems

  • Undoing scheduled appointments if insurance validation fails.
  • Revoking prescriptions if a linked process encounters errors.

4. Supply Chain Management

  • Canceling shipment orders if inventory updates fail.
  • Restocking items if order fulfillment is aborted.

Challenges of Compensation Transactions

  1. Complexity in Implementation: Designing compensating logic for every operation can be tedious and error-prone.
  2. Performance Overhead: Logging operations and executing compensations can introduce latency.
  3. Partial Rollbacks: It may not always be possible to fully undo certain operations, such as sending emails or notifications.
  4. Failure in Compensating Actions: Compensation transactions themselves can fail, requiring additional mechanisms to handle such scenarios.

Best Practices

  1. Plan for Compensation Early: Design compensating transactions as part of the initial development process.
  2. Use SAGA Pattern: Combine compensation transactions with the SAGA pattern to manage distributed workflows effectively.
  3. Test Extensively: Simulate failures and test compensating logic under various conditions.
  4. Monitor and Log: Maintain detailed logs of operations and compensations for debugging and audits.

Learning Notes #37 – Orchestrator Pattern | Cloud Pattern

5 January 2025 at 11:16

Today, i learnt about orchestrator pattern, while l was learning about SAGA Pattern. It simplifies the coordination of these workflows, making the system more efficient and easier to manage. In this blog i jot down notes on Orchestrator Pattern for better understanding.

What is the Orchestrator Pattern?

The Orchestrator Pattern is a design strategy where a central orchestrator coordinates interactions between various services or components to execute a workflow.

Unlike the Choreography Pattern, where services interact with each other independently and are aware of their peers, the orchestrator acts as the central decision-maker, directing how and when services interact.

Key Features

  • Centralized control of workflows.
  • Simplified service communication.
  • Enhanced error handling and monitoring.

When to Use the Orchestrator Pattern

  • Complex Workflows: When multiple services or steps need to be executed in a defined sequence.
  • Error Handling: When failures in one step require recovery strategies or compensating transactions.
  • Centralized Logic: When you want to encapsulate business logic in a single place for easier maintenance.

Benefits of the Orchestrator Pattern

  1. Simplifies Service Communication: Services remain focused on their core functionality while the orchestrator manages interactions.
  2. Improves Scalability: Workflows can be scaled independently from services.
  3. Centralized Monitoring: Makes it easier to track the progress of workflows and debug issues.
  4. Flexibility: Changing a workflow involves modifying the orchestrator, not the services.

Example: Order Processing Workflow

Problem

A fictional e-commerce platform needs to process orders. The workflow involves:

  1. Validating the order.
  2. Reserving inventory.
  3. Processing payment.
  4. Notifying the user.

Each step is handled by a separate microservice.

Solution

We implement an orchestrator to manage this workflow. Let’s see how this works in practice.


import requests

class OrderOrchestrator:
    def __init__(self):
        self.services = {
            "validate_order": "http://order-service/validate",
            "reserve_inventory": "http://inventory-service/reserve",
            "process_payment": "http://payment-service/process",
            "notify_user": "http://notification-service/notify",
        }

    def execute_workflow(self, order_id):
        try:
            # Step 1: Validate Order
            self.call_service("validate_order", {"order_id": order_id})

            # Step 2: Reserve Inventory
            self.call_service("reserve_inventory", {"order_id": order_id})

            # Step 3: Process Payment
            self.call_service("process_payment", {"order_id": order_id})

            # Step 4: Notify User
            self.call_service("notify_user", {"order_id": order_id})

            print(f"Order {order_id} processed successfully!")
        except Exception as e:
            print(f"Error processing order {order_id}: {e}")

    def call_service(self, service_name, payload):
        url = self.services[service_name]
        response = requests.post(url, json=payload)
        if response.status_code != 200:
            raise Exception(f"{service_name} failed: {response.text}")

Key Tactics for Implementation

  1. Services vs. Serverless: Use serverless functions for steps that are triggered occasionally and don’t need always-on services, reducing costs.
  2. Recovery from Failures:
    • Retry Mechanism: Configure retries with limits and delays to handle transient failures.
    • Circuit Breaker Pattern: Detect and isolate failing services to allow recovery.
    • Graceful Degradation: Use fallbacks like cached results or alternate services to ensure continuity.
  3. Monitoring and Alerting:
    • Implement real-time monitoring with automated recovery strategies.
    • Set up alerts for exceptions and utilize logs for troubleshooting.
  4. Orchestration Service Failures:
    • Service Replication: Deploy multiple instances of the orchestrator for failover.
    • Data Replication: Ensure data consistency for seamless recovery.
    • Request Queues: Use queues to buffer requests during downtime and process them later.

Important Considerations

The primary goal of this architectural pattern is to decompose the entire business workflow into multiple services, making it more flexible and scalable. Due to this, it’s crucial to analyze and comprehend the business processes in detail before implementation. A poorly defined and overly complicated business process will lead to a system that would be hard to maintain and scale.

Secondly, it’s easy to fall into the trap of adding business logic into the orchestration service. Sometimes it’s inevitable because certain functionalities are too small to create their separate service. But the risk here is that if the orchestration service becomes too intelligent and performs too much business logic, it can evolve into a monolithic application that also happens to talk to microservices. So, it’s crucial to keep track of every addition to the orchestration service and ensure that its work remains within the boundaries of orchestration. Maintaining the scope of the orchestration service will prevent it from becoming a burden on the system, leading to decreased scalability and flexibility.

Why Use the Orchestration Pattern

The pattern comes with the following advantages

  • Orchestration makes it easier to understand, monitor, and observe the application, resulting in a better understanding of the core part of the system with less effort.
  • The pattern promotes loose coupling. Each downstream service exposes an API interface and is self-contained, without any need to know about the other services.
  • The pattern simplifies the business workflows and improves the separation of concerns. Each service participates in a long-running transaction without any need to know about it.
  • The orchestrator service can decide what to do in case of failure, making the system fault-tolerant and reliable.

Learning Notes #36 – Active Active / Active Passive Patterns | HA Patterns

4 January 2025 at 18:04

Today, i learnt about High Availability patterns. Basically on how to handle the clusters for high availability. In this blog i jot down notes on Active Active and Active Passive Patterns for better understanding.

Active-Active Configuration

In an Active-Active setup, all nodes in the cluster are actively processing requests. This configuration maximizes resource utilization and ensures high throughput. If one node fails, the remaining active nodes take over the load.

Example Scenario

Consider a web application with two servers:

  1. Server 1: IP 192.168.1.10
  2. Server 2: IP 192.168.1.11
  3. Server 3: IP 192.168.1.12
  4. Server 4: IP 192.168.1.13

Both servers handle incoming requests simultaneously. A load balancer distributes traffic between these servers to balance the load.

Pros and Cons

Pros:

  • Higher resource utilization.
  • Better scalability and performance.

Cons:

  • Increased complexity in handling data consistency and synchronization.
  • Potential for split-brain issues in certain setups.

Sample HAProxy config


frontend http_front
    bind *:80
    default_backend http_back

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

backend http_back
    balance roundrobin
    server server_a 192.168.1.10:80 check
    server server_b 192.168.1.11:80 check

Active-Passive Configuration

In an Active-Passive setup, one node (Active) handles all the requests, while the other node (Passive) acts as a standby. If the active node fails, the passive node takes over.

Example Scenario

Using the same servers:

  1. Server 1: IP 192.168.1.10 (Active)
  2. Server 2: IP 192.168.1.11 (Active)
  3. Server 3: IP 192.168.1.12 (Passive)
  4. Server 4: IP 192.168.1.13 (Passive)

Server B remains idle until Server A becomes unavailable, at which point Server B assumes the active role.

Pros and Cons

Pros:

  • Simplified consistency management.
  • Reliable failover mechanism.

Cons:

  • Underutilized resources (passive node is idle most of the time).
  • Slight delay during failover.

Sample HA Proxy Config


frontend http_front
    bind *:80
    default_backend http_back

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

backend http_back
    server server_a 192.168.1.10:80 check
    server server_b 192.168.1.11:80 check backup

Learning Notes #30 – Queue Based Loading | Cloud Patterns

3 January 2025 at 14:47

Today, i learnt about Queue Based Loading pattern, which helps to manage intermittent peak load to a service via queues. Basically decoupling Tasks from Services. In this blog i jot down notes on this pattern for my future self.

In today’s digital landscape, applications are expected to handle large-scale operations efficiently. Whether it’s processing massive data streams, ensuring real-time responsiveness, or integrating with multiple third-party services, scalability and reliability are paramount. One pattern that elegantly addresses these challenges is the Queue-Based Loading Pattern.

What Is the Queue-Based Loading Pattern?

The Queue-Based Loading Pattern leverages message queues to decouple and coordinate tasks between producers (such as applications or services generating data) and consumers (services or workers processing that data). By using queues as intermediaries, this pattern allows systems to manage workloads efficiently, ensuring seamless and scalable operation.

Key Components of the Pattern

  1. Producers: Producers are responsible for generating tasks or data. They send these tasks to a message queue instead of directly interacting with consumers. Examples include:
    • Web applications logging user activity.
    • IoT devices sending sensor data.
  2. Message Queue: The queue acts as a buffer, storing tasks until consumers are ready to process them. Popular tools for implementing queues include RabbitMQ, Apache Kafka, AWS SQS, and Redis.
  3. Consumers: Consumers retrieve messages from the queue and process them asynchronously. They are typically designed to handle tasks independently and at their own pace.
  4. Processing Logic: This is the core functionality that processes the tasks retrieved by consumers. For example, resizing images, sending notifications, or updating a database.

How It Works

  1. Task Generation: Producers push tasks to the queue as they are generated.
  2. Message Storage: The queue stores tasks in a structured manner (FIFO, priority-based, etc.) and ensures reliable delivery.
  3. Task Consumption: Consumers pull tasks from the queue, process them, and optionally acknowledge completion.
  4. Scalability: New consumers can be added dynamically to handle increased workloads, ensuring the system remains responsive.

Benefits of the Queue-Based Loading Pattern

  1. Decoupling: Producers and consumers operate independently, reducing tight coupling and improving system maintainability.
  2. Scalability: By adding more consumers, systems can easily scale to handle higher workloads.
  3. Fault Tolerance: If a consumer fails, messages remain in the queue, ensuring no data is lost.
  4. Load Balancing: Tasks are distributed evenly among consumers, preventing any single consumer from becoming a bottleneck.
  5. Asynchronous Processing: Consumers can process tasks in the background, freeing producers to continue generating data without delay.

Issues and Considerations

  1. Rate Limiting: Implement logic to control the rate at which services handle messages to prevent overwhelming the target resource. Test the system under load and adjust the number of queues or service instances to manage demand effectively.
  2. One-Way Communication: Message queues are inherently one-way. If tasks require responses, you may need to implement a separate mechanism for replies.
  3. Autoscaling Challenges: Be cautious when autoscaling consumers, as it can lead to increased contention for shared resources, potentially reducing the effectiveness of load leveling.
  4. Traffic Variability: Consider the variability of incoming traffic to avoid situations where tasks pile up faster than they are processed, creating a perpetual backlog.
  5. Queue Persistence: Ensure your queue is durable and capable of persisting messages. Crashes or system limits could lead to dropped messages, risking data loss.

Use Cases

  1. Email and Notification Systems: Sending bulk emails or push notifications without overloading the main application.
  2. Data Pipelines: Ingesting, transforming, and analyzing large datasets in real-time or batch processing.
  3. Video Processing: Queues facilitate tasks like video encoding and thumbnail generation.
  4. Microservices Communication: Ensures reliable and scalable communication between microservices.

Best Practices

  1. Message Durability: Configure your queue to persist messages to disk, ensuring they are not lost during system failures.
  2. Monitoring and Metrics: Use monitoring tools to track queue lengths, processing rates, and consumer health.
  3. Idempotency: Design consumers to handle duplicate messages gracefully.
  4. Error Handling and Dead Letter Queues (DLQs): Route failed messages to DLQs for later analysis and reprocessing.

Learning Notes #23 – Retry Pattern | Cloud Patterns

31 December 2024 at 17:34

Today, i refreshed Retry pattern. It handles transient failures ( network issues, throttling, or temporary unavailability of a service).

The Retry Pattern provides a structured approach to handle these failures gracefully, ensuring system reliability and fault tolerance. It is often used in conjunction with related patterns like the Circuit Breaker, which prevents repeated retries during prolonged failures, and the Bulkhead Pattern, which isolates system components to prevent cascading failures.

In this blog, i jot down my notes on Retry pattern for better understanding.

What is the Retry Pattern?

The Retry Pattern is a design strategy used to manage transient failures by retrying failed operations. Instead of immediately failing an operation after an error, the pattern retries it with an optional delay or backoff strategy. This is particularly useful in distributed systems where failures are often temporary.

Key Components of the Retry Pattern

  • Retry Logic: The mechanism that determines how many times to retry and under what conditions.
  • Backoff Strategy: A delay mechanism to space out retries. Common strategies include fixed, incremental, and exponential backoff.
  • Termination Policy: A limit on the number of retries or a timeout to prevent infinite retry loops.
  • Error Handling: A fallback mechanism to gracefully handle persistent failures after retries are exhausted.

Retry Pattern Strategies

1. Fixed Interval Retry

  • Retries are performed at regular intervals.
  • Example: Retry every 2 seconds for up to 5 attempts.

2. Incremental Backoff

  • Retry intervals increase linearly.
  • Example: Retry after 1, 2, 3, 4, and 5 seconds.

3. Exponential Backoff

  • Retry intervals grow exponentially, often with jitter to randomize delays.
  • Example: Retry after 1, 2, 4, 8, and 16 seconds.

4. Custom Backoff

  • Tailored to specific use cases, combining strategies or using domain-specific logic.

Implementing the Retry Pattern in Python with Tenacity

tenacity is a powerful Python library that simplifies the implementation of the Retry Pattern. It provides built-in support for various retry strategies, including fixed interval, incremental backoff, and exponential backoff with jitter.

Example with Fixed Interval Retry


from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Example with Exponential Backoff and Jitter


from tenacity import retry, stop_after_attempt, wait_exponential_jitter

@retry(stop=stop_after_attempt(5), wait=wwmultiplier=1, max=10))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Example with Custom Termination Policy


from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(10), wait=wait_exponential(multiplier=1))
def example_operation():
    print("Trying operation...")
    raise Exception("Transient error")

try:
    example_operation()
except Exception as e:
    print(f"Operation failed after retries: {e}")

Real-World Use Cases

  • API Rate Limiting – Retrying failed API calls when encountering HTTP 429 errors.
  • Database Operations – Retrying failed database queries due to deadlocks or transient connectivity issues.
  • File Uploads/Downloads – Retrying uploads or downloads in case of network interruptions.
  • Message Processing – Retries for message processing failures in systems like RabbitMQ or Kafka.

Learning Notes #22 – Claim Check Pattern | Cloud Pattern

31 December 2024 at 17:03

Today, i learnt about claim check pattern, which tells how to handle a big message into the queue. Every message broker has a defined message size limit. If our message size exceeds the size, it wont work.

The Claim Check Pattern emerges as a pivotal architectural design to address challenges in managing large payloads in a decoupled and efficient manner. In this blog, i jot down notes on my learning for my future self.

What is the Claim Check Pattern?

The Claim Check Pattern is a messaging pattern used in distributed systems to manage large messages efficiently. Instead of transmitting bulky data directly between services, this pattern extracts and stores the payload in a dedicated storage system (e.g., object storage or a database).

A lightweight reference or β€œclaim check” is then sent through the message queue, which the receiving service can use to retrieve the full data from the storage.

This pattern is inspired by the physical process of checking in luggage at an airport: you hand over your luggage, receive a claim check (a token), and later use it to retrieve your belongings.

How Does the Claim Check Pattern Work?

The process typically involves the following steps

  1. Data Submission The sender service splits a message into two parts:
    • Metadata: A small piece of information that provides context about the data.
    • Payload: The main body of data that is too large or sensitive to send through the message queue.
  2. Storing the Payload
    • The sender uploads the payload to a storage service (e.g., AWS S3, Azure Blob Storage, or Google Cloud Storage).
    • The storage service returns a unique identifier (e.g., a URL or object key).
  3. Sending the Claim Check
    • The sender service places the metadata and the unique identifier (claim check) onto the message queue.
  4. Receiving the Claim Check
    • The receiver service consumes the message from the queue, extracts the claim check, and retrieves the payload from the storage system.
  5. Processing
    • The receiver processes the payload alongside the metadata as required.

Use Cases

1. Media Processing Pipelines In video transcoding systems, raw video files can be uploaded to storage while metadata (e.g., video format and length) is passed through the message queue.

2. IoT Systems – IoT devices generate large datasets. Using the Claim Check Pattern ensures efficient transmission and processing of these data chunks.

3. Data Processing Workflows – In big data systems, datasets can be stored in object storage while processing metadata flows through orchestration tools like Apache Airflow.

4. Event-Driven Architectures – For systems using event-driven models, large event payloads can be offloaded to storage to avoid overloading the messaging layer.

Example with RabbitMQ

1.Sender Service


import boto3
import pika

s3 = boto3.client('s3')
bucket_name = 'my-bucket'
object_key = 'data/large-file.txt'

response = s3.upload_file('large-file.txt', bucket_name, object_key)
claim_check = f's3://{bucket_name}/{object_key}'

# Connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare a queue
channel.queue_declare(queue='claim_check_queue')

# Send the claim check
message = {
    'metadata': 'Some metadata',
    'claim_check': claim_check
}
channel.basic_publish(exchange='', routing_key='claim_check_queue', body=str(message))

connection.close()

2. Consumer


import boto3
import pika

s3 = boto3.client('s3')

# Connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare a queue
channel.queue_declare(queue='claim_check_queue')

# Callback function to process messages
def callback(ch, method, properties, body):
    message = eval(body)
    claim_check = message['claim_check']

    bucket_name, object_key = claim_check.replace('s3://', '').split('/', 1)
    s3.download_file(bucket_name, object_key, 'retrieved-large-file.txt')
    print("Payload retrieved and processed.")

# Consume messages
channel.basic_consume(queue='claim_check_queue', on_message_callback=callback, auto_ack=True)

print('Waiting for messages. To exit press CTRL+C')
channel.start_consuming()

References

  1. https://learn.microsoft.com/en-us/azure/architecture/patterns/claim-check
  2. https://medium.com/@dmosyan/claim-check-design-pattern-603dc1f3796d

Learning Notes #18 – Bulk Head Pattern (Resource Isolation) | Cloud Pattern

30 December 2024 at 17:48

Today, i learned about bulk head pattern and how it makes the system resilient to failure, resource exhaustion. In this blog i jot down notes on this pattern for better understanding.

In today’s world of distributed systems and microservices, resiliency is key to ensuring applications are robust and can withstand failures.

The Bulkhead Pattern is a design principle used to improve system resilience by isolating different parts of a system to prevent failure in one component from cascading to others.

What is the Bulkhead Pattern?

The term β€œbulkhead” originates from shipbuilding, where bulkheads are partitions that divide a ship into separate compartments. If one compartment is breached, the others remain intact, preventing the entire ship from sinking. Similarly, in software design, the Bulkhead Pattern isolates components or services so that a failure in one part does not bring down the entire system.

In software systems, bulkheads:

  • Isolate resources (e.g., threads, database connections, or network calls) for different components.
  • Limit the scope of failures.
  • Allow other parts of the system to continue functioning even if one part is degraded or completely unavailable.

Example

Consider an e-commerce application with a product-service that has two endpoints

  1. /product/{id} – This endpoint gives detailed information about a specific product, including ratings and reviews. It depends on the rating-service.
  2. /products – This endpoint provides a catalog of products based on search criteria. It does not depend on any external services.

Consider, with a fixed amount of resource allocated to product-service is loaded with /product/{id} calls, then they can monopolize the thread pool. This delays /products requests, causing users to experience slowness even though these requests are independent. Which leads to resource exhaustion and failures.

With bulkhead pattern, we can allocate separate client, connection pools to isolate the service interaction. we can implement bulkhead by allocating some connection pool (10) to /product/{id} requests and /products requests have a different connection pool (5) .

Even if /product/{id} requests are slow or encounter high traffic, /products requests remain unaffected.

Scenarios Where the Bulkhead Pattern is Needed

  1. Microservices with Shared Resources – In a microservices architecture, multiple services might share limited resources such as database connections or threads. If one service experiences a surge in traffic or a failure, it can exhaust these shared resources, impacting all other services. Bulkheading ensures each service gets a dedicated pool of resources, isolating the impact of failures.
  2. Prioritizing Critical Workloads – In systems with mixed workloads (e.g., processing user transactions and generating reports), critical operations like transaction processing must not be delayed or blocked by less critical tasks. Bulkheading allocates separate resources to ensure critical tasks have priority.
  3. Third-Party API Integration – When an application depends on multiple external APIs, one slow or failing API can delay the entire application if not isolated. Using bulkheads ensures that issues with one API do not affect interactions with others.
  4. Multi-Tenant Systems – In SaaS applications serving multiple tenants, a single tenant’s high resource consumption or failure should not degrade the experience for others. Bulkheads can segregate resources per tenant to maintain service quality.
  5. Cloud-Native Applications – In cloud environments, services often scale independently. A spike in one service’s load should not overwhelm shared backend systems. Bulkheads help isolate and manage these spikes.
  6. Event-Driven Systems – In event-driven architectures with message queues, processing backlogs for one type of event can delay others. By applying the Bulkhead Pattern, separate processing pipelines can handle different event types independently.

What are the Key Points of the Bulkhead Pattern? (Simplified)

  • Define Partitions – (Think of a ship) it’s divided into compartments (partitions) to keep water from flooding the whole ship if one section gets damaged. In software, these partitions are designed around how the application works and its technical needs.
  • Designing with Context – If you’re using a design approach like DDD (Domain-Driven Design), make sure your bulkheads (partitions) match the business logic boundaries.
  • Choosing Isolation Levels – Decide how much isolation is needed. For example: Threads for lightweight tasks. Separate containers or virtual machines for more critical separations. Balance between keeping things separate and the costs or extra effort involved.
  • Combining Other Techniques – Bulkheads work even better with patterns like Retry, Circuit Breaker, Throttling.
  • Monitoring – Keep an eye on each partition’s performance. If one starts getting overloaded, you can adjust resources or change limits.

When Should You Use the Bulkhead Pattern?

  • To Isolate Critical Resources – If one part of your system fails, other parts can keep working. For example, you don’t want search functionality to stop working because the reviews section is down.
  • To Prioritize Important Work – For example, make sure payment processing (critical) is separate from background tasks like sending emails.
  • To Avoid Cascading Failures – If one part of the system gets overwhelmed, it won’t drag down everything else.

When Should You Avoid It?

  • Complexity Isn’t Needed – If your system is simple, adding bulkheads might just make it harder to manage.
  • Resource Efficiency is Critical – Sometimes, splitting resources into separate pools can mean less efficient use of those resources. If every thread, connection, or container is underutilized, this might not be the best approach.

Challenges and Best Practices

  1. Overhead: Maintaining separate resource pools can increase system complexity and resource utilization.
  2. Resource Sizing: Properly sizing the pools is critical to ensure resources are efficiently utilized without bottlenecks.
  3. Monitoring: Use tools to monitor the health and performance of each resource pool to detect bottlenecks or saturation.

References:

  1. AWS https://aws.amazon.com/blogs/containers/building-a-fault-tolerant-architecture-with-a-bulkhead-pattern-on-aws-app-mesh/
  2. Resilience https://resilience4j.readme.io/docs/bulkhead
  3. https://medium.com/nerd-for-tech/bulkhead-pattern-distributed-design-pattern-c673d5e81523
  4. Microsoft https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead

Learning Notes #13 – Gateway Aggregator Pattern

27 December 2024 at 14:56

As part of cloud design patterns, today i learned about Gateway Aggregation Pattern. It seems like a motivation for GraphQL. In this blog, i write down the notes on Gateway Aggregation Pattern for my future self.

In the world of microservices, applications are often broken down into smaller, independent services, each responsible for a specific functionality.

While this architecture promotes scalability and maintainability, it can complicate communication between services. The Gateway Aggregation Pattern emerges as a solution, enabling streamlined interactions between clients and services.

What is the Gateway Aggregation Pattern?

The Gateway Aggregation Pattern involves introducing a gateway layer to handle requests from clients. Instead of the client making multiple calls to different services, the gateway aggregates the data by making calls to the relevant services and then returning a unified response to the client.

This pattern is particularly useful for:

  • Reducing the number of round-trips between clients and services.
  • Simplifying client logic.
  • Improving performance by centralizing the communication and aggregation logic.

How It Works

  1. Client Request: The client sends a single request to the gateway.
  2. Gateway Processing: The gateway makes multiple requests to the required services, aggregates their responses, and applies any necessary transformation.
  3. Unified Response: The gateway sends a unified response back to the client.

This approach abstracts the complexity of service interactions from the client, improving the overall user experience.

Example Use Case

Imagine an e-commerce application where a client needs to display a product’s details, reviews, and availability. Without a gateway, the client must call three different microservices

  1. Product Service: Provides details like name, description, and price.
  2. Review Service: Returns customer reviews and ratings.
  3. Inventory Service: Indicates product availability.

Using the Gateway Aggregation Pattern, the client makes a single request to the gateway. The gateway calls the three services, aggregates their responses, and returns a combined result, such as

{
  "product": {
    "id": "123",
    "name": "Smartphone",
    "description": "Latest model with advanced features",
    "price": 699.99
  },
  "reviews": [
    {
      "user": "Alice",
      "rating": 4,
      "comment": "Great product!"
    },
    {
      "user": "Bob",
      "rating": 5,
      "comment": "Excellent value for money."
    }
  ],
  "availability": {
    "inStock": true,
    "warehouse": "Warehouse A"
  }
}

Tools to implement Gateway Aggregation Pattern

1. Kong Gateway

Kong is a popular API gateway that supports custom plugins for advanced use cases like aggregation.

Example:

Implement a custom Lua plugin to fetch and aggregate data from multiple services.

Use Kong’s Route and Upstream configurations to direct traffic.

2. GraphQL

GraphQL can act as a natural gateway by fetching and aggregating data from multiple sources.

const { ApolloServer, gql } = require('apollo-server');
const { RESTDataSource } = require('apollo-datasource-rest');

class ProductAPI extends RESTDataSource {
  constructor() {
    super();
    this.baseURL = 'http://product-service/';
  }
  async getProduct(id) {
    return this.get(`products/${id}`);
  }
}

class ReviewAPI extends RESTDataSource {
  constructor() {
    super();
    this.baseURL = 'http://review-service/';
  }
  async getReviews(productId) {
    return this.get(`reviews/${productId}`);
  }
}

const typeDefs = gql`
  type Product {
    id: ID!
    name: String
    description: String
    price: Float
  }

  type Review {
    user: String
    rating: Int
    comment: String
  }

  type AggregatedData {
    product: Product
    reviews: [Review]
  }

  type Query {
    aggregatedData(productId: ID!): AggregatedData
  }
`;

const resolvers = {
  Query: {
    aggregatedData: async (_, { productId }, { dataSources }) => {
      const product = await dataSources.productAPI.getProduct(productId);
      const reviews = await dataSources.reviewAPI.getReviews(productId);
      return { product, reviews };
    },
  },
};

const server = new ApolloServer({
  typeDefs,
  resolvers,
  dataSources: () => ({
    productAPI: new ProductAPI(),
    reviewAPI: new ReviewAPI(),
  }),
});

server.listen().then(({ url }) => {
  console.log(`Server ready at ${url}`);
});

By consolidating service calls and centralizing the aggregation logic, this pattern enhances performance and reduces complexity. Open-source tools like Express.js, Apache APISIX, Kong Gateway, and GraphQL make it easy to implement the pattern in diverse environments.

Learning Notes #11 – Sidecar Pattern | Cloud Patterns

26 December 2024 at 17:40

Today, I learnt about Sidecar Pattern. Its seems like offloading the common functionalities (logging, networking, …) aside within a pod to be used by other apps within the pod.

Its just not only about pods, but other deployments aswell. In this blog, i am going to curate the items i have learnt for my future self. Its a pattern, not an strict rule.

What is a Sidecar?

Imagine you’re riding a motorbike, and you attach a little sidecar to carry your friend or groceries. The sidecar isn’t part of the motorbike’s engine or core mechanism, but it helps you achieve your goalsβ€”whether it’s carrying more stuff or having a buddy ride along.

In the software world, a sidecar is a similar concept. It’s a separate process or container that runs alongside a primary application. Like the motorbike’s sidecar, it supports the main application by offloading or enhancing certain tasks without interfering with its core functionality.

Why Use a Sidecar?

In traditional applications, all responsibilities (logging, communication, monitoring, etc.) are bundled into the main application. This approach can make the application complex and harder to manage. Sidecars address this by handling auxiliary tasks separately, so the main application can focus on its primary purpose.

Here are some key reasons to use a sidecar

  1. Modularity: Sidecars separate responsibilities, making the system easier to develop, test, and maintain.
  2. Reusability: The same sidecar can be used across multiple services. And its language agnostic.
  3. Scalability: You can scale the sidecar independently from the main application.
  4. Isolation: Sidecars provide a level of isolation, reducing the risk of one part affecting the other.

Real-Life Analogies

To make the concept clearer, here are some real-world analogies:

  1. Coffee Maker with a Milk Frother:
    • The coffee maker (main application) brews coffee.
    • The milk frother (sidecar) prepares frothed milk for your latte.
    • Both work independently but combine their outputs for a better experience.
  2. Movie Subtitles:
    • The movie (main application) provides the visuals and sound.
    • The subtitles (sidecar) add clarity for those who need them.
    • You can watch the movie with or without subtitlesβ€”they’re optional but enhance the experience.
  3. A School with a Sports Coach:
    • The school (main application) handles education.
    • The sports coach (sidecar) focuses on physical training.
    • Both have distinct roles but contribute to the overall development of students.

Some Random Sidecar Ideas in Software

Let’s look at how sidecars are used in actual software scenarios

  1. Service Meshes (e.g., Istio, Linkerd):
    • A service mesh helps microservices communicate with each other reliably and securely.
    • The sidecar (proxy like Envoy) handles tasks like load balancing, encryption, and monitoring, so the main application doesn’t have to.
  2. Logging and Monitoring:
    • Instead of the main application generating and managing logs, a sidecar can collect, format, and send logs to a centralized system like Elasticsearch or Splunk.
  3. Authentication and Security:
    • A sidecar can act as a gatekeeper, handling user authentication and ensuring that only authorized requests reach the main application.
  4. Data Caching:
    • If an application frequently queries a database, a sidecar can serve as a local cache, reducing database load and speeding up responses.
  5. Service Discovery:
    • Sidecars can aid in service discovery by automatically registering the main application with a registry service or load balancer, ensuring seamless communication in dynamic environments.

How Sidecars Work

In modern environments like Kubernetes, sidecars are often deployed as separate containers within the same pod as the main application. They share the same network and storage, making communication between the two seamless.

Here’s a simplified workflow

  1. The main application focuses on its core tasks (e.g., serving a web page).
  2. The sidecar handles auxiliary tasks (e.g., compressing and encrypting logs).
  3. The two communicate over local connections within the pod.

Pros and Cons of Sidecars

Pros:

  • Simplifies the main application.
  • Encourages reusability and modular design.
  • Improves scalability and flexibility.
  • Enhances observability with centralized logging and metrics.
  • Facilitates experimentationβ€”you can deploy or update sidecars independently.

Cons:

  • Adds complexity to deployment and orchestration.
  • Consumes additional resources (CPU, memory).
  • Requires careful design to avoid tight coupling between the sidecar and the main application.
  • Latency (You are adding an another hop).

Do we always need to use sidecars

No. Not at all.

a. When there is a latency between the parent application and sidecar, then Reconsider.

b. If your application is small, then reconsider.

c. When you are scaling differently or independently from the parent application, then Reconsider.

Some other examples

1. Adding HTTPS to a Legacy Application

Consider a legacy web service which services requests over unencrypted HTTP. We have a requirement to enhance the same legacy system to service requests with HTTPS in future.

The legacy app is configured to serve request exclusively on localhost, which means that only services that share the local network with the server able to access legacy application. In addition to the main container (legacy app) we can add Nginx Sidecar container which runs in the same network namespace as the main container so that it can access the service running on localhost.

2. For Logging (Image from ByteByteGo)

Sidecars are not just technical solutions; they embody the principle of collaboration and specialization. By dividing responsibilities, they empower the main application to shine while ensuring auxiliary tasks are handled efficiently. Next time you hear about sidecars, you’ll know they’re more than just cool attachments for motorcycle they’re an essential part of scalable, maintainable software systems.

Also, do you feel its closely related to Adapter and Ambassador Pattern ? I Do.

References:

  1. Hussein Nasser – https://www.youtube.com/watch?v=zcJWvhzkPsw&pp=ygUHc2lkZWNhcg%3D%3D
  2. Sudo Code – https://www.youtube.com/watch?v=QU5WcwuFpZU&pp=ygUPc2lkZWNhciBwYXR0ZXJu
  3. Software Dude – https://www.youtube.com/watch?v=poPUzN33Oug&pp=ygUPc2lkZWNhciBwYXR0ZXJu
  4. https://medium.com/nerd-for-tech/microservice-design-pattern-sidecar-sidekick-pattern-dbcea9bed783
  5. https://dzone.com/articles/sidecar-design-pattern-in-your-microservices-ecosy-1

Pattern Printing – Others

By: Sugirtha
21 October 2024 at 02:01
public class Patterns {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		//PRINTING 1's and 0's IN SPECIFIC PATTERN
		int n=5;
		Patterns ptn = new Patterns();
		ptn.pattern1(n);
		System.out.println();
		ptn.pattern2(n);
		System.out.println();
		ptn.pattern3("SUGIRTHA");
	}
	private void pattern1(int n) {
		int val=0;
		for (int r=1; r<=n; r++) {
			val = r%2==0?0:1;
			for (int c=1; c<=n-r+1; c++) {
				System.out.print(" "+val);
				val=1-val;
			}
			System.out.println();
		}
	}
	private void pattern2(int n) {
		int val=1;
		for (int r=1; r<=n; r++) {
			for (int c=1; c<=n-r+1; c++) {
				System.out.print(" "+val);
			}
			val=1-val;
			System.out.println();
		}
	}
	private void pattern3(String name) {
		int n = name.length();
		for (int r=1; r<=n; r++) {
			for (int c=0; c<r; c++) {
				System.out.print(" "+name.charAt(c));  //TBD 
			}
			System.out.println();
		}
	}

}

OUTPUT:

Patterns – Printing Name

By: Sugirtha
20 October 2024 at 15:03
public class PatternMyName {
	//  MY NAME - IN PATTERN PRINTING

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		PatternMyName pattern = new PatternMyName();
		int n=9;
		pattern.printLtr("S",n);
		pattern.printLtr("U",n);
		pattern.printLtr("G",n);
		pattern.printLtr("I",n);
		pattern.printLtr("R",n);
		pattern.printLtr("T",n);
		pattern.printLtr("H",n);
		pattern.printLtr("A",n);
	}
	private void printLtr(String ltr,int n) {
		int m=n/2 +1;
		switch(ltr) {
			case "S":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if (r==1 || r==n || r==m || (c==1 && r<m) || (c==9 && r>m)) {
							if ((c==1 && r==1) || (r==n && c==n) || (r==m && (c==1 || c==n))) System.out.print("  ");
							else System.out.print(" *");
						}
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
				
			case "U":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ((c==1 && r<n)|| (c>1 && c<n && r==n) || (r<n && c==n))
								System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
			
			case "G":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ((r==1 && c!=1) || (c==1 && r!=1 && r!=n) || (r==n && c!=1 && c!=n) || (c==n && r>=m && r!=n) || (r==m && c>m)) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;

			case "I":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ( c==m || r==1 || r==n) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
				
			case "R":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ( (c==1 && r!=1) || (r==1 && c!=n && c!=1) || (c==n && r<m && r!=1) || (r==m && c!=n) || (r>m && c==m+(r-m))) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
				
			case "T":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ( c==m || r==1 ) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
				
			case "H":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ( c==1 || c==n || r==m) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;
			
			case "A":
				for (int r=1; r<=n; r++) {
					for (int c=1; c<=n; c++) {
						if ( (r<=m && (c==m-r+1 || c==m+r-1)) || r==m || (r>m && (c==1 || c==n))) System.out.print(" *");
						else System.out.print("  ");
					}
					System.out.println();
				}
				System.out.println();
				break;

		}
	}
}

OUTPUT:

❌
❌