❌

Reading view

There are new articles available, click to refresh the page.

Learning Notes #39 – Compensation Pattern | Cloud Pattern

Today i learnt about compensation pattern, where it rollback a transactions when it face some failures. In this blog i jot down notes on compensating pattern and how it relates with SAGA pattern.

Distributed systems often involve multiple services working together to perform a business operation. Ensuring data consistency and reliability across these services is challenging, especially in cases of failure. One solution is the use of compensation transactions, a mechanism designed to maintain consistency by reversing the effects of previous operations when errors occur.

What Are Compensation Transactions?

A compensation transaction is an operation that undoes the effect of a previously executed operation. Unlike traditional rollback mechanisms in centralized databases, compensation transactions are explicitly defined and executed in distributed systems to maintain consistency after a failure.

Key Characteristics

  • Explicit Definition: Compensation logic must be explicitly implemented.
  • Independent Execution: Compensation operations are separate from the main transaction.
  • Eventual Consistency: Ensures the system reaches a consistent state over time.
  • Asynchronous Nature: Often triggered asynchronously to avoid blocking main processes.

Why Are Compensation Transactions Important?

1. Handling Failures in Distributed Systems

In a distributed architecture, such as microservices, different services may succeed or fail independently. Compensation transactions allow partial rollbacks to maintain overall consistency.

2. Avoiding Global Locking

Traditional transactions with global locks (e.g., two-phase commits) are not feasible in distributed systems due to performance and scalability concerns. Compensation transactions provide a more flexible alternative.

3. Resilience and Fault Tolerance

Compensation mechanisms make systems more resilient by allowing recovery from failures without manual intervention.

How Compensation Transactions Work

  1. Perform Main Operations: Each service performs its assigned operation, such as creating a record or updating a database.
  2. Log Operations: Log actions and context to enable compensating transactions if needed.
  3. Detect Failure: Monitor the workflow for errors or failures in any service.
  4. Trigger Compensation: If a failure occurs, execute compensation transactions for all successfully completed operations to undo their effects.

Example Workflow

Imagine an e-commerce checkout process involving three steps

  • Step 1: Reserve inventory.
  • Step 2: Deduct payment.
  • Step 3: Confirm order.

If Step 3 fails, compensation transactions for Steps 1 and 2 might include

  • Releasing the reserved inventory.
  • Refunding the payment.

Design Considerations for Compensation Transactions

1. Idempotency

Ensure compensating actions are idempotent, meaning they can be executed multiple times without unintended side effects. This is crucial in distributed systems where retries are common.

2. Consistency Model

Adopt an eventual consistency model to align with the asynchronous nature of compensation transactions.

3. Error Handling

Design robust error-handling mechanisms for compensating actions, as these too can fail.

4. Service Communication

Use reliable communication protocols (e.g., message queues) to trigger and manage compensation transactions.

5. Isolation of Compensation Logic

Keep compensation logic isolated from the main business logic to maintain clarity and modularity.

Use Cases for Compensation Transactions

1. Financial Systems

  • Reversing failed fund transfers or unauthorized transactions.
  • Refunding payments in e-commerce platforms.

2. Travel and Booking Systems

  • Canceling a hotel reservation if flight booking fails.
  • Releasing blocked seats if payment is not completed.

3. Healthcare Systems

  • Undoing scheduled appointments if insurance validation fails.
  • Revoking prescriptions if a linked process encounters errors.

4. Supply Chain Management

  • Canceling shipment orders if inventory updates fail.
  • Restocking items if order fulfillment is aborted.

Challenges of Compensation Transactions

  1. Complexity in Implementation: Designing compensating logic for every operation can be tedious and error-prone.
  2. Performance Overhead: Logging operations and executing compensations can introduce latency.
  3. Partial Rollbacks: It may not always be possible to fully undo certain operations, such as sending emails or notifications.
  4. Failure in Compensating Actions: Compensation transactions themselves can fail, requiring additional mechanisms to handle such scenarios.

Best Practices

  1. Plan for Compensation Early: Design compensating transactions as part of the initial development process.
  2. Use SAGA Pattern: Combine compensation transactions with the SAGA pattern to manage distributed workflows effectively.
  3. Test Extensively: Simulate failures and test compensating logic under various conditions.
  4. Monitor and Log: Maintain detailed logs of operations and compensations for debugging and audits.

Learning Notes #37 – Orchestrator Pattern | Cloud Pattern

Today, i learnt about orchestrator pattern, while l was learning about SAGA Pattern. It simplifies the coordination of these workflows, making the system more efficient and easier to manage. In this blog i jot down notes on Orchestrator Pattern for better understanding.

What is the Orchestrator Pattern?

The Orchestrator Pattern is a design strategy where a central orchestrator coordinates interactions between various services or components to execute a workflow.

Unlike the Choreography Pattern, where services interact with each other independently and are aware of their peers, the orchestrator acts as the central decision-maker, directing how and when services interact.

Key Features

  • Centralized control of workflows.
  • Simplified service communication.
  • Enhanced error handling and monitoring.

When to Use the Orchestrator Pattern

  • Complex Workflows: When multiple services or steps need to be executed in a defined sequence.
  • Error Handling: When failures in one step require recovery strategies or compensating transactions.
  • Centralized Logic: When you want to encapsulate business logic in a single place for easier maintenance.

Benefits of the Orchestrator Pattern

  1. Simplifies Service Communication: Services remain focused on their core functionality while the orchestrator manages interactions.
  2. Improves Scalability: Workflows can be scaled independently from services.
  3. Centralized Monitoring: Makes it easier to track the progress of workflows and debug issues.
  4. Flexibility: Changing a workflow involves modifying the orchestrator, not the services.

Example: Order Processing Workflow

Problem

A fictional e-commerce platform needs to process orders. The workflow involves:

  1. Validating the order.
  2. Reserving inventory.
  3. Processing payment.
  4. Notifying the user.

Each step is handled by a separate microservice.

Solution

We implement an orchestrator to manage this workflow. Let’s see how this works in practice.


import requests

class OrderOrchestrator:
    def __init__(self):
        self.services = {
            "validate_order": "http://order-service/validate",
            "reserve_inventory": "http://inventory-service/reserve",
            "process_payment": "http://payment-service/process",
            "notify_user": "http://notification-service/notify",
        }

    def execute_workflow(self, order_id):
        try:
            # Step 1: Validate Order
            self.call_service("validate_order", {"order_id": order_id})

            # Step 2: Reserve Inventory
            self.call_service("reserve_inventory", {"order_id": order_id})

            # Step 3: Process Payment
            self.call_service("process_payment", {"order_id": order_id})

            # Step 4: Notify User
            self.call_service("notify_user", {"order_id": order_id})

            print(f"Order {order_id} processed successfully!")
        except Exception as e:
            print(f"Error processing order {order_id}: {e}")

    def call_service(self, service_name, payload):
        url = self.services[service_name]
        response = requests.post(url, json=payload)
        if response.status_code != 200:
            raise Exception(f"{service_name} failed: {response.text}")

Key Tactics for Implementation

  1. Services vs. Serverless: Use serverless functions for steps that are triggered occasionally and don’t need always-on services, reducing costs.
  2. Recovery from Failures:
    • Retry Mechanism: Configure retries with limits and delays to handle transient failures.
    • Circuit Breaker Pattern: Detect and isolate failing services to allow recovery.
    • Graceful Degradation: Use fallbacks like cached results or alternate services to ensure continuity.
  3. Monitoring and Alerting:
    • Implement real-time monitoring with automated recovery strategies.
    • Set up alerts for exceptions and utilize logs for troubleshooting.
  4. Orchestration Service Failures:
    • Service Replication: Deploy multiple instances of the orchestrator for failover.
    • Data Replication: Ensure data consistency for seamless recovery.
    • Request Queues: Use queues to buffer requests during downtime and process them later.

Important Considerations

The primary goal of this architectural pattern is to decompose the entire business workflow into multiple services, making it more flexible and scalable. Due to this, it’s crucial to analyze and comprehend the business processes in detail before implementation. A poorly defined and overly complicated business process will lead to a system that would be hard to maintain and scale.

Secondly, it’s easy to fall into the trap of adding business logic into the orchestration service. Sometimes it’s inevitable because certain functionalities are too small to create their separate service. But the risk here is that if the orchestration service becomes too intelligent and performs too much business logic, it can evolve into a monolithic application that also happens to talk to microservices. So, it’s crucial to keep track of every addition to the orchestration service and ensure that its work remains within the boundaries of orchestration. Maintaining the scope of the orchestration service will prevent it from becoming a burden on the system, leading to decreased scalability and flexibility.

Why Use the Orchestration Pattern

The pattern comes with the following advantages

  • Orchestration makes it easier to understand, monitor, and observe the application, resulting in a better understanding of the core part of the system with less effort.
  • The pattern promotes loose coupling. Each downstream service exposes an API interface and is self-contained, without any need to know about the other services.
  • The pattern simplifies the business workflows and improves the separation of concerns. Each service participates in a long-running transaction without any need to know about it.
  • The orchestrator service can decide what to do in case of failure, making the system fault-tolerant and reliable.

Learning Notes #29 – Two Phase Commit Protocol | ACID in Distributed Systems

Today, i learnt about compensating transaction pattern which leads to two phase commit protocol which helps in maintaining the Atomicity of a distributed transactions. Distributed transactions are hard.

In this blog, i jot down notes on Two Phase Commit protocol for better understanding.

The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in transactions spanning multiple nodes or databases. Atomicity ensures that either all parts of a transaction are committed or none are, maintaining consistency in distributed systems.

Why Two-Phase Commit?

In distributed systems, a transaction might involve several independent nodes, each maintaining its own database. Without a mechanism like 2PC, failures in one node can leave the system in an inconsistent state.

For example, consider an e-commerce platform where a customer places an order.

The transaction involves updating the inventory in one database, recording the payment in another, and generating a shipment request in a third system. If the payment database successfully commits but the inventory database fails, the system becomes inconsistent, potentially causing issues like double selling or incomplete orders. 2PC mitigates this by providing a coordinated protocol to commit or abort transactions across all nodes.

The Phases of 2PC

The protocol operates in two main phases

1. Prepare Phase (Voting Phase)

The coordinator node initiates the transaction and prepares to commit it across all participating nodes.

  1. Request to Prepare: The coordinator sends a PREPARE request to all participant nodes.
  2. Vote: Each participant checks if it can commit the transaction (e.g., no constraints violated, resources available). It logs its decision (YES or NO) locally and sends its vote to the coordinator. If any participant votes NO, the transaction cannot be committed.

2. Commit Phase (Decision Phase)

Based on the votes received in the prepare phase, the coordinator decides the final outcome.

Commit Decision:

If all participants vote YES, the coordinator logs a COMMIT decision, sends COMMIT messages to all participants, and participants apply the changes and confirm with an acknowledgment.

Abort Decision:

If any participant votes NO, the coordinator logs an ABORT decision, sends ABORT messages to all participants, and participants roll back any changes made during the transaction.

Implementation:

For a simple implementation of 2PC, we can try out the below flow using RabbitMQ as a medium for Co-Ordinator.

Basically, we need not to write this from scratch, we have tools,

1. Relational Databases

Most relational databases have built-in support for distributed transactions and 2PC.

  • PostgreSQL: Implements distributed transactions using foreign data wrappers (FDWs) with PREPARE TRANSACTION and COMMIT PREPARED.
  • MySQL: Supports XA transactions, which follow the 2PC protocol.
  • Oracle Database: Offers robust distributed transaction support using XA.
  • Microsoft SQL Server: Provides distributed transactions through MS-DTC.

2. Distributed Transaction Managers

These tools manage distributed transactions across multiple systems.

  • Atomikos: A popular Java-based transaction manager supporting JTA/XA for distributed systems.
  • Bitronix: Another lightweight transaction manager for Java applications supporting JTA/XA.
  • JBoss Transactions (Narayana): A robust Java transaction manager that supports 2PC, often used in conjunction with JBoss servers.

3. Message Brokers

Message brokers provide transaction capabilities with 2PC.

  • RabbitMQ: Supports the 2PC protocol using transactional channels.
  • Apache Kafka: Supports transactions, ensuring β€œexactly-once” semantics across producers and consumers.
  • ActiveMQ: Provides distributed transaction support through JTA integration

4. Workflow Engines

Workflow engines can orchestrate 2PC across distributed systems.

  • Apache Camel: Can coordinate 2PC transactions using its transaction policy.
  • Camunda: Provides BPMN-based orchestration that can include transactional boundaries.
  • Zeebe: Supports distributed transaction workflows in modern architectures.

Key Properties of 2PC

  1. Atomicity: Ensures all-or-nothing transaction behavior.
  2. Consistency: Guarantees system consistency across all nodes.
  3. Durability: Uses logs to ensure decisions survive node failures.

Challenges of 2PC

  1. Blocking Nature: If the coordinator fails during the commit phase, participants must wait indefinitely unless a timeout or external mechanism is implemented.
  2. Performance Overhead: Multiple message exchanges and logging operations introduce latency.
  3. Single Point of Failure: The coordinator’s failure can stall the entire transaction.

❌