❌

Reading view

There are new articles available, click to refresh the page.

Learning Notes #39 – Compensation Pattern | Cloud Pattern

Today i learnt about compensation pattern, where it rollback a transactions when it face some failures. In this blog i jot down notes on compensating pattern and how it relates with SAGA pattern.

Distributed systems often involve multiple services working together to perform a business operation. Ensuring data consistency and reliability across these services is challenging, especially in cases of failure. One solution is the use of compensation transactions, a mechanism designed to maintain consistency by reversing the effects of previous operations when errors occur.

What Are Compensation Transactions?

A compensation transaction is an operation that undoes the effect of a previously executed operation. Unlike traditional rollback mechanisms in centralized databases, compensation transactions are explicitly defined and executed in distributed systems to maintain consistency after a failure.

Key Characteristics

  • Explicit Definition: Compensation logic must be explicitly implemented.
  • Independent Execution: Compensation operations are separate from the main transaction.
  • Eventual Consistency: Ensures the system reaches a consistent state over time.
  • Asynchronous Nature: Often triggered asynchronously to avoid blocking main processes.

Why Are Compensation Transactions Important?

1. Handling Failures in Distributed Systems

In a distributed architecture, such as microservices, different services may succeed or fail independently. Compensation transactions allow partial rollbacks to maintain overall consistency.

2. Avoiding Global Locking

Traditional transactions with global locks (e.g., two-phase commits) are not feasible in distributed systems due to performance and scalability concerns. Compensation transactions provide a more flexible alternative.

3. Resilience and Fault Tolerance

Compensation mechanisms make systems more resilient by allowing recovery from failures without manual intervention.

How Compensation Transactions Work

  1. Perform Main Operations: Each service performs its assigned operation, such as creating a record or updating a database.
  2. Log Operations: Log actions and context to enable compensating transactions if needed.
  3. Detect Failure: Monitor the workflow for errors or failures in any service.
  4. Trigger Compensation: If a failure occurs, execute compensation transactions for all successfully completed operations to undo their effects.

Example Workflow

Imagine an e-commerce checkout process involving three steps

  • Step 1: Reserve inventory.
  • Step 2: Deduct payment.
  • Step 3: Confirm order.

If Step 3 fails, compensation transactions for Steps 1 and 2 might include

  • Releasing the reserved inventory.
  • Refunding the payment.

Design Considerations for Compensation Transactions

1. Idempotency

Ensure compensating actions are idempotent, meaning they can be executed multiple times without unintended side effects. This is crucial in distributed systems where retries are common.

2. Consistency Model

Adopt an eventual consistency model to align with the asynchronous nature of compensation transactions.

3. Error Handling

Design robust error-handling mechanisms for compensating actions, as these too can fail.

4. Service Communication

Use reliable communication protocols (e.g., message queues) to trigger and manage compensation transactions.

5. Isolation of Compensation Logic

Keep compensation logic isolated from the main business logic to maintain clarity and modularity.

Use Cases for Compensation Transactions

1. Financial Systems

  • Reversing failed fund transfers or unauthorized transactions.
  • Refunding payments in e-commerce platforms.

2. Travel and Booking Systems

  • Canceling a hotel reservation if flight booking fails.
  • Releasing blocked seats if payment is not completed.

3. Healthcare Systems

  • Undoing scheduled appointments if insurance validation fails.
  • Revoking prescriptions if a linked process encounters errors.

4. Supply Chain Management

  • Canceling shipment orders if inventory updates fail.
  • Restocking items if order fulfillment is aborted.

Challenges of Compensation Transactions

  1. Complexity in Implementation: Designing compensating logic for every operation can be tedious and error-prone.
  2. Performance Overhead: Logging operations and executing compensations can introduce latency.
  3. Partial Rollbacks: It may not always be possible to fully undo certain operations, such as sending emails or notifications.
  4. Failure in Compensating Actions: Compensation transactions themselves can fail, requiring additional mechanisms to handle such scenarios.

Best Practices

  1. Plan for Compensation Early: Design compensating transactions as part of the initial development process.
  2. Use SAGA Pattern: Combine compensation transactions with the SAGA pattern to manage distributed workflows effectively.
  3. Test Extensively: Simulate failures and test compensating logic under various conditions.
  4. Monitor and Log: Maintain detailed logs of operations and compensations for debugging and audits.

Learning Notes #30 – Queue Based Loading | Cloud Patterns

Today, i learnt about Queue Based Loading pattern, which helps to manage intermittent peak load to a service via queues. Basically decoupling Tasks from Services. In this blog i jot down notes on this pattern for my future self.

In today’s digital landscape, applications are expected to handle large-scale operations efficiently. Whether it’s processing massive data streams, ensuring real-time responsiveness, or integrating with multiple third-party services, scalability and reliability are paramount. One pattern that elegantly addresses these challenges is the Queue-Based Loading Pattern.

What Is the Queue-Based Loading Pattern?

The Queue-Based Loading Pattern leverages message queues to decouple and coordinate tasks between producers (such as applications or services generating data) and consumers (services or workers processing that data). By using queues as intermediaries, this pattern allows systems to manage workloads efficiently, ensuring seamless and scalable operation.

Key Components of the Pattern

  1. Producers: Producers are responsible for generating tasks or data. They send these tasks to a message queue instead of directly interacting with consumers. Examples include:
    • Web applications logging user activity.
    • IoT devices sending sensor data.
  2. Message Queue: The queue acts as a buffer, storing tasks until consumers are ready to process them. Popular tools for implementing queues include RabbitMQ, Apache Kafka, AWS SQS, and Redis.
  3. Consumers: Consumers retrieve messages from the queue and process them asynchronously. They are typically designed to handle tasks independently and at their own pace.
  4. Processing Logic: This is the core functionality that processes the tasks retrieved by consumers. For example, resizing images, sending notifications, or updating a database.

How It Works

  1. Task Generation: Producers push tasks to the queue as they are generated.
  2. Message Storage: The queue stores tasks in a structured manner (FIFO, priority-based, etc.) and ensures reliable delivery.
  3. Task Consumption: Consumers pull tasks from the queue, process them, and optionally acknowledge completion.
  4. Scalability: New consumers can be added dynamically to handle increased workloads, ensuring the system remains responsive.

Benefits of the Queue-Based Loading Pattern

  1. Decoupling: Producers and consumers operate independently, reducing tight coupling and improving system maintainability.
  2. Scalability: By adding more consumers, systems can easily scale to handle higher workloads.
  3. Fault Tolerance: If a consumer fails, messages remain in the queue, ensuring no data is lost.
  4. Load Balancing: Tasks are distributed evenly among consumers, preventing any single consumer from becoming a bottleneck.
  5. Asynchronous Processing: Consumers can process tasks in the background, freeing producers to continue generating data without delay.

Issues and Considerations

  1. Rate Limiting: Implement logic to control the rate at which services handle messages to prevent overwhelming the target resource. Test the system under load and adjust the number of queues or service instances to manage demand effectively.
  2. One-Way Communication: Message queues are inherently one-way. If tasks require responses, you may need to implement a separate mechanism for replies.
  3. Autoscaling Challenges: Be cautious when autoscaling consumers, as it can lead to increased contention for shared resources, potentially reducing the effectiveness of load leveling.
  4. Traffic Variability: Consider the variability of incoming traffic to avoid situations where tasks pile up faster than they are processed, creating a perpetual backlog.
  5. Queue Persistence: Ensure your queue is durable and capable of persisting messages. Crashes or system limits could lead to dropped messages, risking data loss.

Use Cases

  1. Email and Notification Systems: Sending bulk emails or push notifications without overloading the main application.
  2. Data Pipelines: Ingesting, transforming, and analyzing large datasets in real-time or batch processing.
  3. Video Processing: Queues facilitate tasks like video encoding and thumbnail generation.
  4. Microservices Communication: Ensures reliable and scalable communication between microservices.

Best Practices

  1. Message Durability: Configure your queue to persist messages to disk, ensuring they are not lost during system failures.
  2. Monitoring and Metrics: Use monitoring tools to track queue lengths, processing rates, and consumer health.
  3. Idempotency: Design consumers to handle duplicate messages gracefully.
  4. Error Handling and Dead Letter Queues (DLQs): Route failed messages to DLQs for later analysis and reprocessing.

Learning Notes #22 – Claim Check Pattern | Cloud Pattern

Today, i learnt about claim check pattern, which tells how to handle a big message into the queue. Every message broker has a defined message size limit. If our message size exceeds the size, it wont work.

The Claim Check Pattern emerges as a pivotal architectural design to address challenges in managing large payloads in a decoupled and efficient manner. In this blog, i jot down notes on my learning for my future self.

What is the Claim Check Pattern?

The Claim Check Pattern is a messaging pattern used in distributed systems to manage large messages efficiently. Instead of transmitting bulky data directly between services, this pattern extracts and stores the payload in a dedicated storage system (e.g., object storage or a database).

A lightweight reference or β€œclaim check” is then sent through the message queue, which the receiving service can use to retrieve the full data from the storage.

This pattern is inspired by the physical process of checking in luggage at an airport: you hand over your luggage, receive a claim check (a token), and later use it to retrieve your belongings.

How Does the Claim Check Pattern Work?

The process typically involves the following steps

  1. Data Submission The sender service splits a message into two parts:
    • Metadata: A small piece of information that provides context about the data.
    • Payload: The main body of data that is too large or sensitive to send through the message queue.
  2. Storing the Payload
    • The sender uploads the payload to a storage service (e.g., AWS S3, Azure Blob Storage, or Google Cloud Storage).
    • The storage service returns a unique identifier (e.g., a URL or object key).
  3. Sending the Claim Check
    • The sender service places the metadata and the unique identifier (claim check) onto the message queue.
  4. Receiving the Claim Check
    • The receiver service consumes the message from the queue, extracts the claim check, and retrieves the payload from the storage system.
  5. Processing
    • The receiver processes the payload alongside the metadata as required.

Use Cases

1. Media Processing Pipelines In video transcoding systems, raw video files can be uploaded to storage while metadata (e.g., video format and length) is passed through the message queue.

2. IoT Systems – IoT devices generate large datasets. Using the Claim Check Pattern ensures efficient transmission and processing of these data chunks.

3. Data Processing Workflows – In big data systems, datasets can be stored in object storage while processing metadata flows through orchestration tools like Apache Airflow.

4. Event-Driven Architectures – For systems using event-driven models, large event payloads can be offloaded to storage to avoid overloading the messaging layer.

Example with RabbitMQ

1.Sender Service


import boto3
import pika

s3 = boto3.client('s3')
bucket_name = 'my-bucket'
object_key = 'data/large-file.txt'

response = s3.upload_file('large-file.txt', bucket_name, object_key)
claim_check = f's3://{bucket_name}/{object_key}'

# Connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare a queue
channel.queue_declare(queue='claim_check_queue')

# Send the claim check
message = {
    'metadata': 'Some metadata',
    'claim_check': claim_check
}
channel.basic_publish(exchange='', routing_key='claim_check_queue', body=str(message))

connection.close()

2. Consumer


import boto3
import pika

s3 = boto3.client('s3')

# Connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Declare a queue
channel.queue_declare(queue='claim_check_queue')

# Callback function to process messages
def callback(ch, method, properties, body):
    message = eval(body)
    claim_check = message['claim_check']

    bucket_name, object_key = claim_check.replace('s3://', '').split('/', 1)
    s3.download_file(bucket_name, object_key, 'retrieved-large-file.txt')
    print("Payload retrieved and processed.")

# Consume messages
channel.basic_consume(queue='claim_check_queue', on_message_callback=callback, auto_ack=True)

print('Waiting for messages. To exit press CTRL+C')
channel.start_consuming()

References

  1. https://learn.microsoft.com/en-us/azure/architecture/patterns/claim-check
  2. https://medium.com/@dmosyan/claim-check-design-pattern-603dc1f3796d

Learning Notes #13 – Gateway Aggregator Pattern

As part of cloud design patterns, today i learned about Gateway Aggregation Pattern. It seems like a motivation for GraphQL. In this blog, i write down the notes on Gateway Aggregation Pattern for my future self.

In the world of microservices, applications are often broken down into smaller, independent services, each responsible for a specific functionality.

While this architecture promotes scalability and maintainability, it can complicate communication between services. The Gateway Aggregation Pattern emerges as a solution, enabling streamlined interactions between clients and services.

What is the Gateway Aggregation Pattern?

The Gateway Aggregation Pattern involves introducing a gateway layer to handle requests from clients. Instead of the client making multiple calls to different services, the gateway aggregates the data by making calls to the relevant services and then returning a unified response to the client.

This pattern is particularly useful for:

  • Reducing the number of round-trips between clients and services.
  • Simplifying client logic.
  • Improving performance by centralizing the communication and aggregation logic.

How It Works

  1. Client Request: The client sends a single request to the gateway.
  2. Gateway Processing: The gateway makes multiple requests to the required services, aggregates their responses, and applies any necessary transformation.
  3. Unified Response: The gateway sends a unified response back to the client.

This approach abstracts the complexity of service interactions from the client, improving the overall user experience.

Example Use Case

Imagine an e-commerce application where a client needs to display a product’s details, reviews, and availability. Without a gateway, the client must call three different microservices

  1. Product Service: Provides details like name, description, and price.
  2. Review Service: Returns customer reviews and ratings.
  3. Inventory Service: Indicates product availability.

Using the Gateway Aggregation Pattern, the client makes a single request to the gateway. The gateway calls the three services, aggregates their responses, and returns a combined result, such as

{
  "product": {
    "id": "123",
    "name": "Smartphone",
    "description": "Latest model with advanced features",
    "price": 699.99
  },
  "reviews": [
    {
      "user": "Alice",
      "rating": 4,
      "comment": "Great product!"
    },
    {
      "user": "Bob",
      "rating": 5,
      "comment": "Excellent value for money."
    }
  ],
  "availability": {
    "inStock": true,
    "warehouse": "Warehouse A"
  }
}

Tools to implement Gateway Aggregation Pattern

1. Kong Gateway

Kong is a popular API gateway that supports custom plugins for advanced use cases like aggregation.

Example:

Implement a custom Lua plugin to fetch and aggregate data from multiple services.

Use Kong’s Route and Upstream configurations to direct traffic.

2. GraphQL

GraphQL can act as a natural gateway by fetching and aggregating data from multiple sources.

const { ApolloServer, gql } = require('apollo-server');
const { RESTDataSource } = require('apollo-datasource-rest');

class ProductAPI extends RESTDataSource {
  constructor() {
    super();
    this.baseURL = 'http://product-service/';
  }
  async getProduct(id) {
    return this.get(`products/${id}`);
  }
}

class ReviewAPI extends RESTDataSource {
  constructor() {
    super();
    this.baseURL = 'http://review-service/';
  }
  async getReviews(productId) {
    return this.get(`reviews/${productId}`);
  }
}

const typeDefs = gql`
  type Product {
    id: ID!
    name: String
    description: String
    price: Float
  }

  type Review {
    user: String
    rating: Int
    comment: String
  }

  type AggregatedData {
    product: Product
    reviews: [Review]
  }

  type Query {
    aggregatedData(productId: ID!): AggregatedData
  }
`;

const resolvers = {
  Query: {
    aggregatedData: async (_, { productId }, { dataSources }) => {
      const product = await dataSources.productAPI.getProduct(productId);
      const reviews = await dataSources.reviewAPI.getReviews(productId);
      return { product, reviews };
    },
  },
};

const server = new ApolloServer({
  typeDefs,
  resolvers,
  dataSources: () => ({
    productAPI: new ProductAPI(),
    reviewAPI: new ReviewAPI(),
  }),
});

server.listen().then(({ url }) => {
  console.log(`Server ready at ${url}`);
});

By consolidating service calls and centralizing the aggregation logic, this pattern enhances performance and reduces complexity. Open-source tools like Express.js, Apache APISIX, Kong Gateway, and GraphQL make it easy to implement the pattern in diverse environments.

Learning Notes #11 – Sidecar Pattern | Cloud Patterns

Today, I learnt about Sidecar Pattern. Its seems like offloading the common functionalities (logging, networking, …) aside within a pod to be used by other apps within the pod.

Its just not only about pods, but other deployments aswell. In this blog, i am going to curate the items i have learnt for my future self. Its a pattern, not an strict rule.

What is a Sidecar?

Imagine you’re riding a motorbike, and you attach a little sidecar to carry your friend or groceries. The sidecar isn’t part of the motorbike’s engine or core mechanism, but it helps you achieve your goalsβ€”whether it’s carrying more stuff or having a buddy ride along.

In the software world, a sidecar is a similar concept. It’s a separate process or container that runs alongside a primary application. Like the motorbike’s sidecar, it supports the main application by offloading or enhancing certain tasks without interfering with its core functionality.

Why Use a Sidecar?

In traditional applications, all responsibilities (logging, communication, monitoring, etc.) are bundled into the main application. This approach can make the application complex and harder to manage. Sidecars address this by handling auxiliary tasks separately, so the main application can focus on its primary purpose.

Here are some key reasons to use a sidecar

  1. Modularity: Sidecars separate responsibilities, making the system easier to develop, test, and maintain.
  2. Reusability: The same sidecar can be used across multiple services. And its language agnostic.
  3. Scalability: You can scale the sidecar independently from the main application.
  4. Isolation: Sidecars provide a level of isolation, reducing the risk of one part affecting the other.

Real-Life Analogies

To make the concept clearer, here are some real-world analogies:

  1. Coffee Maker with a Milk Frother:
    • The coffee maker (main application) brews coffee.
    • The milk frother (sidecar) prepares frothed milk for your latte.
    • Both work independently but combine their outputs for a better experience.
  2. Movie Subtitles:
    • The movie (main application) provides the visuals and sound.
    • The subtitles (sidecar) add clarity for those who need them.
    • You can watch the movie with or without subtitlesβ€”they’re optional but enhance the experience.
  3. A School with a Sports Coach:
    • The school (main application) handles education.
    • The sports coach (sidecar) focuses on physical training.
    • Both have distinct roles but contribute to the overall development of students.

Some Random Sidecar Ideas in Software

Let’s look at how sidecars are used in actual software scenarios

  1. Service Meshes (e.g., Istio, Linkerd):
    • A service mesh helps microservices communicate with each other reliably and securely.
    • The sidecar (proxy like Envoy) handles tasks like load balancing, encryption, and monitoring, so the main application doesn’t have to.
  2. Logging and Monitoring:
    • Instead of the main application generating and managing logs, a sidecar can collect, format, and send logs to a centralized system like Elasticsearch or Splunk.
  3. Authentication and Security:
    • A sidecar can act as a gatekeeper, handling user authentication and ensuring that only authorized requests reach the main application.
  4. Data Caching:
    • If an application frequently queries a database, a sidecar can serve as a local cache, reducing database load and speeding up responses.
  5. Service Discovery:
    • Sidecars can aid in service discovery by automatically registering the main application with a registry service or load balancer, ensuring seamless communication in dynamic environments.

How Sidecars Work

In modern environments like Kubernetes, sidecars are often deployed as separate containers within the same pod as the main application. They share the same network and storage, making communication between the two seamless.

Here’s a simplified workflow

  1. The main application focuses on its core tasks (e.g., serving a web page).
  2. The sidecar handles auxiliary tasks (e.g., compressing and encrypting logs).
  3. The two communicate over local connections within the pod.

Pros and Cons of Sidecars

Pros:

  • Simplifies the main application.
  • Encourages reusability and modular design.
  • Improves scalability and flexibility.
  • Enhances observability with centralized logging and metrics.
  • Facilitates experimentationβ€”you can deploy or update sidecars independently.

Cons:

  • Adds complexity to deployment and orchestration.
  • Consumes additional resources (CPU, memory).
  • Requires careful design to avoid tight coupling between the sidecar and the main application.
  • Latency (You are adding an another hop).

Do we always need to use sidecars

No. Not at all.

a. When there is a latency between the parent application and sidecar, then Reconsider.

b. If your application is small, then reconsider.

c. When you are scaling differently or independently from the parent application, then Reconsider.

Some other examples

1. Adding HTTPS to a Legacy Application

Consider a legacy web service which services requests over unencrypted HTTP. We have a requirement to enhance the same legacy system to service requests with HTTPS in future.

The legacy app is configured to serve request exclusively on localhost, which means that only services that share the local network with the server able to access legacy application. In addition to the main container (legacy app) we can add Nginx Sidecar container which runs in the same network namespace as the main container so that it can access the service running on localhost.

2. For Logging (Image from ByteByteGo)

Sidecars are not just technical solutions; they embody the principle of collaboration and specialization. By dividing responsibilities, they empower the main application to shine while ensuring auxiliary tasks are handled efficiently. Next time you hear about sidecars, you’ll know they’re more than just cool attachments for motorcycle they’re an essential part of scalable, maintainable software systems.

Also, do you feel its closely related to Adapter and Ambassador Pattern ? I Do.

References:

  1. Hussein Nasser – https://www.youtube.com/watch?v=zcJWvhzkPsw&pp=ygUHc2lkZWNhcg%3D%3D
  2. Sudo Code – https://www.youtube.com/watch?v=QU5WcwuFpZU&pp=ygUPc2lkZWNhciBwYXR0ZXJu
  3. Software Dude – https://www.youtube.com/watch?v=poPUzN33Oug&pp=ygUPc2lkZWNhciBwYXR0ZXJu
  4. https://medium.com/nerd-for-tech/microservice-design-pattern-sidecar-sidekick-pattern-dbcea9bed783
  5. https://dzone.com/articles/sidecar-design-pattern-in-your-microservices-ecosy-1

❌