❌

Reading view

There are new articles available, click to refresh the page.

Learning Notes #39 – Compensation Pattern | Cloud Pattern

Today i learnt about compensation pattern, where it rollback a transactions when it face some failures. In this blog i jot down notes on compensating pattern and how it relates with SAGA pattern.

Distributed systems often involve multiple services working together to perform a business operation. Ensuring data consistency and reliability across these services is challenging, especially in cases of failure. One solution is the use of compensation transactions, a mechanism designed to maintain consistency by reversing the effects of previous operations when errors occur.

What Are Compensation Transactions?

A compensation transaction is an operation that undoes the effect of a previously executed operation. Unlike traditional rollback mechanisms in centralized databases, compensation transactions are explicitly defined and executed in distributed systems to maintain consistency after a failure.

Key Characteristics

  • Explicit Definition: Compensation logic must be explicitly implemented.
  • Independent Execution: Compensation operations are separate from the main transaction.
  • Eventual Consistency: Ensures the system reaches a consistent state over time.
  • Asynchronous Nature: Often triggered asynchronously to avoid blocking main processes.

Why Are Compensation Transactions Important?

1. Handling Failures in Distributed Systems

In a distributed architecture, such as microservices, different services may succeed or fail independently. Compensation transactions allow partial rollbacks to maintain overall consistency.

2. Avoiding Global Locking

Traditional transactions with global locks (e.g., two-phase commits) are not feasible in distributed systems due to performance and scalability concerns. Compensation transactions provide a more flexible alternative.

3. Resilience and Fault Tolerance

Compensation mechanisms make systems more resilient by allowing recovery from failures without manual intervention.

How Compensation Transactions Work

  1. Perform Main Operations: Each service performs its assigned operation, such as creating a record or updating a database.
  2. Log Operations: Log actions and context to enable compensating transactions if needed.
  3. Detect Failure: Monitor the workflow for errors or failures in any service.
  4. Trigger Compensation: If a failure occurs, execute compensation transactions for all successfully completed operations to undo their effects.

Example Workflow

Imagine an e-commerce checkout process involving three steps

  • Step 1: Reserve inventory.
  • Step 2: Deduct payment.
  • Step 3: Confirm order.

If Step 3 fails, compensation transactions for Steps 1 and 2 might include

  • Releasing the reserved inventory.
  • Refunding the payment.

Design Considerations for Compensation Transactions

1. Idempotency

Ensure compensating actions are idempotent, meaning they can be executed multiple times without unintended side effects. This is crucial in distributed systems where retries are common.

2. Consistency Model

Adopt an eventual consistency model to align with the asynchronous nature of compensation transactions.

3. Error Handling

Design robust error-handling mechanisms for compensating actions, as these too can fail.

4. Service Communication

Use reliable communication protocols (e.g., message queues) to trigger and manage compensation transactions.

5. Isolation of Compensation Logic

Keep compensation logic isolated from the main business logic to maintain clarity and modularity.

Use Cases for Compensation Transactions

1. Financial Systems

  • Reversing failed fund transfers or unauthorized transactions.
  • Refunding payments in e-commerce platforms.

2. Travel and Booking Systems

  • Canceling a hotel reservation if flight booking fails.
  • Releasing blocked seats if payment is not completed.

3. Healthcare Systems

  • Undoing scheduled appointments if insurance validation fails.
  • Revoking prescriptions if a linked process encounters errors.

4. Supply Chain Management

  • Canceling shipment orders if inventory updates fail.
  • Restocking items if order fulfillment is aborted.

Challenges of Compensation Transactions

  1. Complexity in Implementation: Designing compensating logic for every operation can be tedious and error-prone.
  2. Performance Overhead: Logging operations and executing compensations can introduce latency.
  3. Partial Rollbacks: It may not always be possible to fully undo certain operations, such as sending emails or notifications.
  4. Failure in Compensating Actions: Compensation transactions themselves can fail, requiring additional mechanisms to handle such scenarios.

Best Practices

  1. Plan for Compensation Early: Design compensating transactions as part of the initial development process.
  2. Use SAGA Pattern: Combine compensation transactions with the SAGA pattern to manage distributed workflows effectively.
  3. Test Extensively: Simulate failures and test compensating logic under various conditions.
  4. Monitor and Log: Maintain detailed logs of operations and compensations for debugging and audits.

Learning Notes #38 – Choreography Pattern | Cloud Pattern

Today i learnt about Choreography pattern, where each and every service is communicating using a messaging queue. In this blog, i jot down notes on choreography pattern for my future self.

What is the Choreography Pattern?

In the Choreography Pattern, services communicate directly with each other via asynchronous events, without a central controller. Each service is responsible for a specific part of the workflow and responds to events produced by other services. This pattern allows for a more autonomous and loosely coupled system.

Key Features

  • High scalability and independence of services.
  • Decentralized control.
  • Services respond to events they subscribe to.

When to Use the Choreography Pattern

  • Event-Driven Systems: When workflows can be modeled as events triggering responses.
  • High Scalability: When services need to operate independently and scale autonomously.
  • Loose Coupling: When minimizing dependencies between services is critical.

Benefits of the Choreography Pattern

  1. Decentralized Control: No single point of failure or bottleneck.
  2. Increased Flexibility: Services can be added or modified without affecting others.
  3. Better Scalability: Services operate independently and scale based on their workloads.
  4. Resilience: The system can handle partial failures more gracefully, as services continue independently.

Example: E-Commerce Order Fulfillment

Problem

A fictional e-commerce platform needs to manage the following workflow:

  1. Accepting an order.
  2. Validating payment.
  3. Reserving inventory.
  4. Sending notifications to the customer.

Each step is handled by an independent service.

Solution

Using the Choreography Pattern, each service listens for specific events and publishes new events as needed. The workflow emerges naturally from the interaction of these services.

Implementation

Step 1: Define the Workflow as Events

  • OrderPlaced: Triggered when a customer places an order.
  • PaymentProcessed: Triggered after successful payment.
  • InventoryReserved: Triggered after reserving inventory.
  • NotificationSent: Triggered when the customer is notified.

Step 2: Implement Services

Each service subscribes to events and performs its task.

shared_utility.py

import pika
import json

def publish_event(exchange, event_type, data):
    connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
    channel = connection.channel()
    channel.exchange_declare(exchange=exchange, exchange_type='fanout')
    message = json.dumps({"event_type": event_type, "data": data})
    channel.basic_publish(exchange=exchange, routing_key='', body=message)
    connection.close()

def subscribe_to_event(exchange, callback):
    connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
    channel = connection.channel()
    channel.exchange_declare(exchange=exchange, exchange_type='fanout')
    queue = channel.queue_declare('', exclusive=True).method.queue
    channel.queue_bind(exchange=exchange, queue=queue)
    channel.basic_consume(queue=queue, on_message_callback=callback, auto_ack=True)
    print(f"Subscribed to events on exchange '{exchange}'")
    channel.start_consuming()

Order Service


from shared_utils import publish_event

def place_order(order_id, customer):
    print(f"Placing order {order_id} for {customer}")
    publish_event("order_exchange", "OrderPlaced", {"order_id": order_id, "customer": customer})

if __name__ == "__main__":
    # Simulate placing an order
    place_order(order_id=101, customer="John Doe")

Payment Service


from shared_utils import publish_event, subscribe_to_event
import time

def handle_order_placed(ch, method, properties, body):
    event = json.loads(body)
    if event["event_type"] == "OrderPlaced":
        order_id = event["data"]["order_id"]
        print(f"Processing payment for order {order_id}")
        time.sleep(1)  # Simulate payment processing
        publish_event("payment_exchange", "PaymentProcessed", {"order_id": order_id})

if __name__ == "__main__":
    subscribe_to_event("order_exchange", handle_order_placed)

Inventory Service


from shared_utils import publish_event, subscribe_to_event
import time

def handle_payment_processed(ch, method, properties, body):
    event = json.loads(body)
    if event["event_type"] == "PaymentProcessed":
        order_id = event["data"]["order_id"]
        print(f"Reserving inventory for order {order_id}")
        time.sleep(1)  # Simulate inventory reservation
        publish_event("inventory_exchange", "InventoryReserved", {"order_id": order_id})

if __name__ == "__main__":
    subscribe_to_event("payment_exchange", handle_payment_processed)

Notification Service


from shared_utils import subscribe_to_event
import time

def handle_inventory_reserved(ch, method, properties, body):
    event = json.loads(body)
    if event["event_type"] == "InventoryReserved":
        order_id = event["data"]["order_id"]
        print(f"Notifying customer for order {order_id}")
        time.sleep(1)  # Simulate notification
        print(f"Customer notified for order {order_id}")

if __name__ == "__main__":
    subscribe_to_event("inventory_exchange", handle_inventory_reserved)

Step 3: Run the Workflow

  1. Start RabbitMQ using Docker as described above.
  2. Run the services in the following order:
    • Notification Service: python notification_service.py
    • Inventory Service: python inventory_service.py
    • Payment Service: python payment_service.py
    • Order Service: python order_service.py
  3. Place an order by running the Order Service. The workflow will propagate through the services as events are handled.

Key Considerations

  1. Event Bus: Use an event broker like RabbitMQ, Kafka, or AWS SNS to manage communication between services.
  2. Event Versioning: Include versioning to handle changes in event formats over time.
  3. Idempotency: Ensure services handle repeated events gracefully to avoid duplication.
  4. Monitoring and Tracing: Use tools like OpenTelemetry to trace and debug distributed workflows.
  5. Error Handling:
    • Dead Letter Queues (DLQs) to capture failed events.
    • Retries with backoff for transient errors.

Advantages of the Choreography Pattern

  1. Loose Coupling: Services interact via events without direct knowledge of each other.
  2. Resilience: Failures in one service don’t block the entire workflow.
  3. High Autonomy: Services operate independently and can be deployed or scaled separately.
  4. Dynamic Workflows: Adding new services to the workflow requires subscribing them to relevant events.

Challenges of the Choreography Pattern

  1. Complex Debugging: Tracing errors across distributed services can be difficult.
  2. Event Storms: Poorly designed workflows may generate excessive events, overwhelming the system.
  3. Coordination Overhead: Decentralized logic can lead to inconsistent behavior if not carefully managed.

Orchestrator vs. Choreography: When to Choose?

  • Use Orchestrator Pattern when workflows are complex, require central control, or involve many dependencies.
  • Use Choreography Pattern when you need high scalability, loose coupling, or event-driven workflows.

Learning Notes #37 – Orchestrator Pattern | Cloud Pattern

Today, i learnt about orchestrator pattern, while l was learning about SAGA Pattern. It simplifies the coordination of these workflows, making the system more efficient and easier to manage. In this blog i jot down notes on Orchestrator Pattern for better understanding.

What is the Orchestrator Pattern?

The Orchestrator Pattern is a design strategy where a central orchestrator coordinates interactions between various services or components to execute a workflow.

Unlike the Choreography Pattern, where services interact with each other independently and are aware of their peers, the orchestrator acts as the central decision-maker, directing how and when services interact.

Key Features

  • Centralized control of workflows.
  • Simplified service communication.
  • Enhanced error handling and monitoring.

When to Use the Orchestrator Pattern

  • Complex Workflows: When multiple services or steps need to be executed in a defined sequence.
  • Error Handling: When failures in one step require recovery strategies or compensating transactions.
  • Centralized Logic: When you want to encapsulate business logic in a single place for easier maintenance.

Benefits of the Orchestrator Pattern

  1. Simplifies Service Communication: Services remain focused on their core functionality while the orchestrator manages interactions.
  2. Improves Scalability: Workflows can be scaled independently from services.
  3. Centralized Monitoring: Makes it easier to track the progress of workflows and debug issues.
  4. Flexibility: Changing a workflow involves modifying the orchestrator, not the services.

Example: Order Processing Workflow

Problem

A fictional e-commerce platform needs to process orders. The workflow involves:

  1. Validating the order.
  2. Reserving inventory.
  3. Processing payment.
  4. Notifying the user.

Each step is handled by a separate microservice.

Solution

We implement an orchestrator to manage this workflow. Let’s see how this works in practice.


import requests

class OrderOrchestrator:
    def __init__(self):
        self.services = {
            "validate_order": "http://order-service/validate",
            "reserve_inventory": "http://inventory-service/reserve",
            "process_payment": "http://payment-service/process",
            "notify_user": "http://notification-service/notify",
        }

    def execute_workflow(self, order_id):
        try:
            # Step 1: Validate Order
            self.call_service("validate_order", {"order_id": order_id})

            # Step 2: Reserve Inventory
            self.call_service("reserve_inventory", {"order_id": order_id})

            # Step 3: Process Payment
            self.call_service("process_payment", {"order_id": order_id})

            # Step 4: Notify User
            self.call_service("notify_user", {"order_id": order_id})

            print(f"Order {order_id} processed successfully!")
        except Exception as e:
            print(f"Error processing order {order_id}: {e}")

    def call_service(self, service_name, payload):
        url = self.services[service_name]
        response = requests.post(url, json=payload)
        if response.status_code != 200:
            raise Exception(f"{service_name} failed: {response.text}")

Key Tactics for Implementation

  1. Services vs. Serverless: Use serverless functions for steps that are triggered occasionally and don’t need always-on services, reducing costs.
  2. Recovery from Failures:
    • Retry Mechanism: Configure retries with limits and delays to handle transient failures.
    • Circuit Breaker Pattern: Detect and isolate failing services to allow recovery.
    • Graceful Degradation: Use fallbacks like cached results or alternate services to ensure continuity.
  3. Monitoring and Alerting:
    • Implement real-time monitoring with automated recovery strategies.
    • Set up alerts for exceptions and utilize logs for troubleshooting.
  4. Orchestration Service Failures:
    • Service Replication: Deploy multiple instances of the orchestrator for failover.
    • Data Replication: Ensure data consistency for seamless recovery.
    • Request Queues: Use queues to buffer requests during downtime and process them later.

Important Considerations

The primary goal of this architectural pattern is to decompose the entire business workflow into multiple services, making it more flexible and scalable. Due to this, it’s crucial to analyze and comprehend the business processes in detail before implementation. A poorly defined and overly complicated business process will lead to a system that would be hard to maintain and scale.

Secondly, it’s easy to fall into the trap of adding business logic into the orchestration service. Sometimes it’s inevitable because certain functionalities are too small to create their separate service. But the risk here is that if the orchestration service becomes too intelligent and performs too much business logic, it can evolve into a monolithic application that also happens to talk to microservices. So, it’s crucial to keep track of every addition to the orchestration service and ensure that its work remains within the boundaries of orchestration. Maintaining the scope of the orchestration service will prevent it from becoming a burden on the system, leading to decreased scalability and flexibility.

Why Use the Orchestration Pattern

The pattern comes with the following advantages

  • Orchestration makes it easier to understand, monitor, and observe the application, resulting in a better understanding of the core part of the system with less effort.
  • The pattern promotes loose coupling. Each downstream service exposes an API interface and is self-contained, without any need to know about the other services.
  • The pattern simplifies the business workflows and improves the separation of concerns. Each service participates in a long-running transaction without any need to know about it.
  • The orchestrator service can decide what to do in case of failure, making the system fault-tolerant and reliable.

❌