❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #48 – Common Pitfalls in Event Driven Architecture

8 January 2025 at 15:04

Today, i came across Raul Junco post on mistakes in Event Driven Architecture – https://www.linkedin.com/posts/raul-junco_after-years-building-event-driven-systems-activity-7278770394046631936-zu3-?utm_source=share&utm_medium=member_desktop. In this blog i am highlighting the same for future reference.

Event-driven architectures are awesome, but they come with their own set of challenges. Missteps can lead to unreliable systems, inconsistent data, and frustrated users. Let’s explore some of the most common pitfalls and how to address them effectively.

1. Duplication

Idempotent APIs – https://parottasalna.com/2025/01/08/learning-notes-47-idempotent-post-requests/

Events often get re-delivered due to retries or system failures. Without proper handling, duplicate events can,

  • Charge a customer twice for the same transaction: Imagine a scenario where a payment service retries a payment event after a temporary network glitch, resulting in a duplicate charge.
  • Cause duplicate inventory updates: For example, an e-commerce platform might update stock levels twice for a single order, leading to overestimating available stock.
  • Create inconsistent or broken system states: Duplicates can cascade through downstream systems, introducing mismatched or erroneous data.

Solution:

  • Assign unique IDs: Ensure every event has a globally unique identifier. Consumers can use these IDs to detect and discard duplicates.
  • Design idempotent processing: Structure your operations so they produce the same outcome even when executed multiple times. For instance, an API updating inventory could always set stock levels to a specific value rather than incrementing or decrementing.

2. Not Guaranteeing Order

Events can arrive out of order when distributed across partitions or queues. This can lead to

  • Processing a refund before the payment: If a refund event is processed before the corresponding payment event, the system might show a negative balance or fail to reconcile properly.
  • Breaking logic that relies on correct sequence: Certain workflows, such as assembling logs or transactional data, depend on a strict event order to function correctly.

Solution

  • Use brokers with ordering guarantees: Message brokers like Apache Kafka support partition-level ordering. Design your topics and partitions to align with entities requiring ordered processing (e.g., user or account ID).
  • Add sequence numbers or timestamps: Include metadata in events to indicate their position in a sequence. Consumers can use this data to reorder events if necessary, ensuring logical consistency.

3. The Dual Write Problem

Outbox Pattern: https://parottasalna.com/2025/01/03/learning-notes-31-outbox-pattern-cloud-pattern/

When writing to a database and publishing an event, one might succeed while the other fails. This can

  • Lose events: If the event is not published after the database write, downstream systems might remain unaware of critical changes, such as a new order or a status update.
  • Cause mismatched states: For instance, a transaction might be logged in a database but not propagated to analytical or monitoring systems, creating inconsistencies.

Solution

  • Use the Transactional Outbox Pattern: In this pattern, events are written to an β€œoutbox” table within the same database transaction as the main data write. A separate process then reads from the outbox and publishes events reliably.
  • Adopt Change Data Capture (CDC) tools: CDC tools like Debezium can monitor database changes and publish them as events automatically, ensuring no changes are missed.

4. Non-Backward-Compatible Changes

Changing event schemas without considering existing consumers can break systems. For example:

  • Removing a field: A consumer relying on this field might encounter null values or fail altogether.
  • Renaming or changing field types: This can lead to deserialization errors or misinterpretation of data.

Solution:

  • Maintain versioned schemas: Introduce new schema versions incrementally and ensure consumers can continue using older versions during the transition.
  • Use schema evolution-friendly formats: Formats like Avro or Protobuf natively support schema evolution, allowing you to add fields or make other non-breaking changes easily.
  • Add adapters for compatibility: Build adapters or translators that transform events from new schemas to older formats, ensuring backward compatibility for legacy systems.

Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres

6 January 2025 at 14:07

Today, I learnt about various locking mechanism to prevent double update. In this blog, i make notes on Shared Lock and Exclusive Lock for my future self.

What Are Locks in Databases?

Locks are mechanisms used by a DBMS to control access to data. They ensure that transactions are executed in a way that maintains the ACID (Atomicity, Consistency, Isolation, Durability) properties of the database. Locks can be classified into several types, including

  • Shared Locks (S Locks): Allow multiple transactions to read a resource simultaneously but prevent any transaction from writing to it.
  • Exclusive Locks (X Locks): Allow a single transaction to modify a resource, preventing both reading and writing by other transactions.
  • Intent Locks: Used to signal the type of lock a transaction intends to acquire at a lower level.
  • Deadlock Prevention Locks: Special locks aimed at preventing deadlock scenarios.

Shared Lock

A shared lock is used when a transaction needs to read a resource (e.g., a database row or table) without altering it. Multiple transactions can acquire a shared lock on the same resource simultaneously. However, as long as one or more shared locks exist on a resource, no transaction can acquire an exclusive lock on that resource.


-- Transaction A: Acquire a shared lock on a row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Transaction B: Acquire a shared lock on the same row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Both transactions can read the row concurrently
-- Transaction C: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 1;
-- Transaction C will be blocked until Transactions A and B release their locks

Key Characteristics of Shared Locks

1. Concurrent Reads

  • Shared locks allow multiple transactions to read the same resource at the same time.
  • This is ideal for operations like SELECT queries that do not modify data.

2. Write Blocking

  • While a shared lock is active, no transaction can modify the locked resource.
  • Prevents dirty writes and ensures read consistency.

3. Compatibility

  • Shared locks are compatible with other shared locks but not with exclusive locks.

When Are Shared Locks Used?

Shared locks are typically employed in read operations under certain isolation levels. For instance,

1. Read Committed Isolation Level:

  • Shared locks are held for the duration of the read operation.
  • Prevents dirty reads by ensuring the data being read is not modified by other transactions during the read.

2. Repeatable Read Isolation Level:

  • Shared locks are held until the transaction completes.
  • Ensures that the data read during a transaction remains consistent and unmodified.

3. Snapshot Isolation:

  • Shared locks may not be explicitly used, as the DBMS creates a consistent snapshot of the data for the transaction.

    Exclusive Locks

    An exclusive lock is used when a transaction needs to modify a resource. Only one transaction can hold an exclusive lock on a resource at a time, ensuring no other transactions can read or write to the locked resource.

    
    -- Transaction X: Acquire an exclusive lock to update a row
    BEGIN;
    UPDATE employees SET salary = salary + 1000 WHERE id = 2;
    -- Transaction Y: Attempt to read the same row
    BEGIN;
    SELECT * FROM employees WHERE id = 2;
    -- Transaction Y will be blocked until Transaction X completes
    -- Transaction Z: Attempt to update the same row
    BEGIN;
    UPDATE employees SET salary = salary + 500 WHERE id = 2;
    -- Transaction Z will also be blocked until Transaction X completes
    

    Key Characteristics of Exclusive Locks

    1. Write Operations: Exclusive locks are essential for operations like INSERT, UPDATE, and DELETE.

    2. Blocking Reads and Writes: While an exclusive lock is active, no other transaction can read or write to the resource.

    3. Isolation: Ensures that changes made by one transaction are not visible to others until the transaction is complete.

      When Are Exclusive Locks Used?

      Exclusive locks are typically employed in write operations or any operation that modifies the database. For instance:

      1. Transactional Updates – A transaction that updates a row acquires an exclusive lock to ensure no other transaction can access or modify the row during the update.

      2. Table Modifications – When altering a table structure, the DBMS may place an exclusive lock on the entire table.

      Benefits of Shared and Exclusive Locks

      Benefits of Shared Locks

      1. Consistency in Multi-User Environments – Ensure that data being read is not altered by other transactions, preserving consistency.
      2. Concurrency Support – Allow multiple transactions to read data simultaneously, improving system performance.
      3. Data Integrity – Prevent dirty reads and writes, ensuring that operations yield reliable results.

      Benefits of Exclusive Locks

      1. Data Integrity During Modifications – Prevents other transactions from accessing data being modified, ensuring changes are applied safely.
      2. Isolation of Transactions – Ensures that modifications by one transaction are not visible to others until committed.

      Limitations and Challenges

      Shared Locks

      1. Potential for Deadlocks – Deadlocks can occur if two transactions simultaneously hold shared locks and attempt to upgrade to exclusive locks.
      2. Blocking Writes – Shared locks can delay write operations, potentially impacting performance in write-heavy systems.
      3. Lock Escalation – In systems with high concurrency, shared locks may escalate to table-level locks, reducing granularity and concurrency.

      Exclusive Locks

      1. Reduced Concurrency – Exclusive locks prevent other transactions from accessing the locked resource, which can lead to bottlenecks in highly concurrent systems.
      2. Risk of Deadlocks – Deadlocks can occur if two transactions attempt to acquire exclusive locks on resources held by each other.

      Lock Compatibility

      Learning Notes #40 – SAGA Pattern | Cloud Patterns

      5 January 2025 at 17:08

      Today, I learnt about SAGA Pattern, followed by Compensation Pattern, Orchestration Pattern, Choreography Pattern and Two Phase Commit. SAGA is a combination of all the above. In this blog, i jot down notes on SAGA, for my future self.

      Modern software applications often require the coordination of multiple distributed services to perform complex business operations. In such systems, ensuring consistency and reliability can be challenging, especially when a failure occurs in one of the services. The SAGA design pattern offers a robust solution to manage distributed transactions while maintaining data consistency.

      What is the SAGA Pattern?

      The SAGA pattern is a distributed transaction management mechanism where a series of independent operations (or steps) are executed sequentially across multiple services. Each operation in the sequence has a corresponding compensating action to roll back changes if a failure occurs. This approach avoids the complexities of distributed transactions, such as two-phase commits, by breaking down the process into smaller, manageable units.

      Key Characteristics

      1. Decentralized Control: Transactions are managed across services without a central coordinator.
      2. Compensating Transactions: Every operation has an undo or rollback mechanism.
      3. Asynchronous Communication: Services communicate asynchronously in most implementations, ensuring loose coupling.

      Types of SAGA Patterns

      There are two primary types of SAGA patterns:

      1. Choreography-Based SAGA

      • In this approach, services communicate with each other directly to coordinate the workflow.
      • Each service knows which operation to trigger next after completing its own task.
      • If a failure occurs, each service initiates its compensating action to roll back changes.

      Advantages:

      • Simple implementation.
      • No central coordinator required.

      Disadvantages:

      • Difficult to manage and debug in complex workflows.
      • Tight coupling between services.
      import pika
      
      class RabbitMQHandler:
          def __init__(self, queue):
              self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
              self.channel = self.connection.channel()
              self.channel.queue_declare(queue=queue)
              self.queue = queue
      
          def publish(self, message):
              self.channel.basic_publish(exchange='', routing_key=self.queue, body=message)
      
          def consume(self, callback):
              self.channel.basic_consume(queue=self.queue, on_message_callback=callback, auto_ack=True)
              self.channel.start_consuming()
      
      # Define services
      class FlightService:
          def book_flight(self):
              print("Flight booked.")
              RabbitMQHandler('hotel_queue').publish("flight_booked")
      
      class HotelService:
          def on_flight_booked(self, ch, method, properties, body):
              try:
                  print("Hotel booked.")
                  RabbitMQHandler('invoice_queue').publish("hotel_booked")
              except Exception:
                  print("Failed to book hotel. Rolling back flight.")
                  FlightService().cancel_flight()
      
          def cancel_flight(self):
              print("Flight booking canceled.")
      
      # Setup RabbitMQ
      flight_service = FlightService()
      hotel_service = HotelService()
      
      RabbitMQHandler('hotel_queue').consume(hotel_service.on_flight_booked)
      
      # Trigger the workflow
      flight_service.book_flight()
      

      2. Orchestration-Based SAGA

      • A central orchestrator service manages the workflow and coordinates between the services.
      • The orchestrator determines the sequence of operations and handles compensating actions in case of failures.

      Advantages:

      • Clear control and visibility of the workflow.
      • Easier to debug and manage.

      Disadvantages:

      • The orchestrator can become a single point of failure.
      • More complex implementation.
      import pika
      
      class Orchestrator:
          def __init__(self):
              self.rabbitmq = RabbitMQHandler('orchestrator_queue')
      
          def execute_saga(self):
              try:
                  self.reserve_inventory()
                  self.process_payment()
                  self.generate_invoice()
              except Exception as e:
                  print(f"Error occurred: {e}. Initiating rollback.")
                  self.compensate()
      
          def reserve_inventory(self):
              print("Inventory reserved.")
              self.rabbitmq.publish("inventory_reserved")
      
          def process_payment(self):
              print("Payment processed.")
              self.rabbitmq.publish("payment_processed")
      
          def generate_invoice(self):
              print("Invoice generated.")
              self.rabbitmq.publish("invoice_generated")
      
          def compensate(self):
              print("Rolling back invoice.")
              print("Rolling back payment.")
              print("Rolling back inventory.")
      
      # Trigger the workflow
      Orchestrator().execute_saga()
      

      How SAGA Works

      1. Transaction Initiation: The first operation is executed by one of the services.
      2. Service Communication: Subsequent services execute their operations based on the outcome of the previous step.
      3. Failure Handling: If an operation fails, compensating transactions are triggered in reverse order to undo any changes.
      4. Completion: Once all operations are successfully executed, the transaction is considered complete.

      Benefits of the SAGA Pattern

      1. Improved Resilience: Allows partial rollbacks in case of failure.
      2. Scalability: Suitable for microservices and distributed systems.
      3. Flexibility: Works well with event-driven architectures.
      4. No Global Locks: Unlike traditional transactions, SAGA does not require global locking of resources.

      Challenges and Limitations

      1. Complexity in Rollbacks: Designing compensating transactions for every operation can be challenging.
      2. Data Consistency: Achieving eventual consistency may require additional effort.
      3. Debugging Issues: Debugging failures in a distributed environment can be cumbersome.
      4. Latency: Sequential execution may increase overall latency.

      When to Use the SAGA Pattern

      • Distributed systems where global ACID transactions are infeasible.
      • Microservices architectures with independent services.
      • Applications requiring high resilience and eventual consistency.

      Real-World Applications

      1. E-Commerce Platforms: Managing orders, payments, and inventory updates.
      2. Travel Booking Systems: Coordinating flight, hotel, and car rental reservations.
      3. Banking Systems: Handling distributed account updates and transfers.
      4. Healthcare: Coordinating appointment scheduling and insurance claims.

      Learning Notes #38 – Choreography Pattern | Cloud Pattern

      5 January 2025 at 12:21

      Today i learnt about Choreography pattern, where each and every service is communicating using a messaging queue. In this blog, i jot down notes on choreography pattern for my future self.

      What is the Choreography Pattern?

      In the Choreography Pattern, services communicate directly with each other via asynchronous events, without a central controller. Each service is responsible for a specific part of the workflow and responds to events produced by other services. This pattern allows for a more autonomous and loosely coupled system.

      Key Features

      • High scalability and independence of services.
      • Decentralized control.
      • Services respond to events they subscribe to.

      When to Use the Choreography Pattern

      • Event-Driven Systems: When workflows can be modeled as events triggering responses.
      • High Scalability: When services need to operate independently and scale autonomously.
      • Loose Coupling: When minimizing dependencies between services is critical.

      Benefits of the Choreography Pattern

      1. Decentralized Control: No single point of failure or bottleneck.
      2. Increased Flexibility: Services can be added or modified without affecting others.
      3. Better Scalability: Services operate independently and scale based on their workloads.
      4. Resilience: The system can handle partial failures more gracefully, as services continue independently.

      Example: E-Commerce Order Fulfillment

      Problem

      A fictional e-commerce platform needs to manage the following workflow:

      1. Accepting an order.
      2. Validating payment.
      3. Reserving inventory.
      4. Sending notifications to the customer.

      Each step is handled by an independent service.

      Solution

      Using the Choreography Pattern, each service listens for specific events and publishes new events as needed. The workflow emerges naturally from the interaction of these services.

      Implementation

      Step 1: Define the Workflow as Events

      • OrderPlaced: Triggered when a customer places an order.
      • PaymentProcessed: Triggered after successful payment.
      • InventoryReserved: Triggered after reserving inventory.
      • NotificationSent: Triggered when the customer is notified.

      Step 2: Implement Services

      Each service subscribes to events and performs its task.

      shared_utility.py

      import pika
      import json
      
      def publish_event(exchange, event_type, data):
          connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
          channel = connection.channel()
          channel.exchange_declare(exchange=exchange, exchange_type='fanout')
          message = json.dumps({"event_type": event_type, "data": data})
          channel.basic_publish(exchange=exchange, routing_key='', body=message)
          connection.close()
      
      def subscribe_to_event(exchange, callback):
          connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
          channel = connection.channel()
          channel.exchange_declare(exchange=exchange, exchange_type='fanout')
          queue = channel.queue_declare('', exclusive=True).method.queue
          channel.queue_bind(exchange=exchange, queue=queue)
          channel.basic_consume(queue=queue, on_message_callback=callback, auto_ack=True)
          print(f"Subscribed to events on exchange '{exchange}'")
          channel.start_consuming()
      
      

      Order Service

      
      from shared_utils import publish_event
      
      def place_order(order_id, customer):
          print(f"Placing order {order_id} for {customer}")
          publish_event("order_exchange", "OrderPlaced", {"order_id": order_id, "customer": customer})
      
      if __name__ == "__main__":
          # Simulate placing an order
          place_order(order_id=101, customer="John Doe")
      

      Payment Service

      
      from shared_utils import publish_event, subscribe_to_event
      import time
      
      def handle_order_placed(ch, method, properties, body):
          event = json.loads(body)
          if event["event_type"] == "OrderPlaced":
              order_id = event["data"]["order_id"]
              print(f"Processing payment for order {order_id}")
              time.sleep(1)  # Simulate payment processing
              publish_event("payment_exchange", "PaymentProcessed", {"order_id": order_id})
      
      if __name__ == "__main__":
          subscribe_to_event("order_exchange", handle_order_placed)
      
      

      Inventory Service

      
      from shared_utils import publish_event, subscribe_to_event
      import time
      
      def handle_payment_processed(ch, method, properties, body):
          event = json.loads(body)
          if event["event_type"] == "PaymentProcessed":
              order_id = event["data"]["order_id"]
              print(f"Reserving inventory for order {order_id}")
              time.sleep(1)  # Simulate inventory reservation
              publish_event("inventory_exchange", "InventoryReserved", {"order_id": order_id})
      
      if __name__ == "__main__":
          subscribe_to_event("payment_exchange", handle_payment_processed)
      
      

      Notification Service

      
      from shared_utils import subscribe_to_event
      import time
      
      def handle_inventory_reserved(ch, method, properties, body):
          event = json.loads(body)
          if event["event_type"] == "InventoryReserved":
              order_id = event["data"]["order_id"]
              print(f"Notifying customer for order {order_id}")
              time.sleep(1)  # Simulate notification
              print(f"Customer notified for order {order_id}")
      
      if __name__ == "__main__":
          subscribe_to_event("inventory_exchange", handle_inventory_reserved)
      
      

      Step 3: Run the Workflow

      1. Start RabbitMQ using Docker as described above.
      2. Run the services in the following order:
        • Notification Service: python notification_service.py
        • Inventory Service: python inventory_service.py
        • Payment Service: python payment_service.py
        • Order Service: python order_service.py
      3. Place an order by running the Order Service. The workflow will propagate through the services as events are handled.

      Key Considerations

      1. Event Bus: Use an event broker like RabbitMQ, Kafka, or AWS SNS to manage communication between services.
      2. Event Versioning: Include versioning to handle changes in event formats over time.
      3. Idempotency: Ensure services handle repeated events gracefully to avoid duplication.
      4. Monitoring and Tracing: Use tools like OpenTelemetry to trace and debug distributed workflows.
      5. Error Handling:
        • Dead Letter Queues (DLQs) to capture failed events.
        • Retries with backoff for transient errors.

      Advantages of the Choreography Pattern

      1. Loose Coupling: Services interact via events without direct knowledge of each other.
      2. Resilience: Failures in one service don’t block the entire workflow.
      3. High Autonomy: Services operate independently and can be deployed or scaled separately.
      4. Dynamic Workflows: Adding new services to the workflow requires subscribing them to relevant events.

      Challenges of the Choreography Pattern

      1. Complex Debugging: Tracing errors across distributed services can be difficult.
      2. Event Storms: Poorly designed workflows may generate excessive events, overwhelming the system.
      3. Coordination Overhead: Decentralized logic can lead to inconsistent behavior if not carefully managed.

      Orchestrator vs. Choreography: When to Choose?

      • Use Orchestrator Pattern when workflows are complex, require central control, or involve many dependencies.
      • Use Choreography Pattern when you need high scalability, loose coupling, or event-driven workflows.

      Learning Notes #36 – Active Active / Active Passive Patterns | HA Patterns

      4 January 2025 at 18:04

      Today, i learnt about High Availability patterns. Basically on how to handle the clusters for high availability. In this blog i jot down notes on Active Active and Active Passive Patterns for better understanding.

      Active-Active Configuration

      In an Active-Active setup, all nodes in the cluster are actively processing requests. This configuration maximizes resource utilization and ensures high throughput. If one node fails, the remaining active nodes take over the load.

      Example Scenario

      Consider a web application with two servers:

      1. Server 1: IP 192.168.1.10
      2. Server 2: IP 192.168.1.11
      3. Server 3: IP 192.168.1.12
      4. Server 4: IP 192.168.1.13

      Both servers handle incoming requests simultaneously. A load balancer distributes traffic between these servers to balance the load.

      Pros and Cons

      Pros:

      • Higher resource utilization.
      • Better scalability and performance.

      Cons:

      • Increased complexity in handling data consistency and synchronization.
      • Potential for split-brain issues in certain setups.

      Sample HAProxy config

      
      frontend http_front
          bind *:80
          default_backend http_back
      
      defaults
          mode http
          timeout connect 5000ms
          timeout client 50000ms
          timeout server 50000ms
      
      backend http_back
          balance roundrobin
          server server_a 192.168.1.10:80 check
          server server_b 192.168.1.11:80 check
      

      Active-Passive Configuration

      In an Active-Passive setup, one node (Active) handles all the requests, while the other node (Passive) acts as a standby. If the active node fails, the passive node takes over.

      Example Scenario

      Using the same servers:

      1. Server 1: IP 192.168.1.10 (Active)
      2. Server 2: IP 192.168.1.11 (Active)
      3. Server 3: IP 192.168.1.12 (Passive)
      4. Server 4: IP 192.168.1.13 (Passive)

      Server B remains idle until Server A becomes unavailable, at which point Server B assumes the active role.

      Pros and Cons

      Pros:

      • Simplified consistency management.
      • Reliable failover mechanism.

      Cons:

      • Underutilized resources (passive node is idle most of the time).
      • Slight delay during failover.

      Sample HA Proxy Config

      
      frontend http_front
          bind *:80
          default_backend http_back
      
      defaults
          mode http
          timeout connect 5000ms
          timeout client 50000ms
          timeout server 50000ms
      
      backend http_back
          server server_a 192.168.1.10:80 check
          server server_b 192.168.1.11:80 check backup
      

      Learning Notes #35 – Durability in ACID | Postgres

      4 January 2025 at 12:47

      As part of ACID series, i am refreshing on topic Durability. In this blog i jot down notes on durability for better understanding.

      What is Durability?

      Durability ensures that the effects of a committed transaction are permanently saved to the database. This property prevents data loss by ensuring that committed transactions survive unexpected interruptions such as power outages, crashes, or system reboots.

      PostgreSQL achieves durability through a combination of

      • Write-Ahead Logging (WAL): Changes are written to a log file before they are applied to the database.
      • Checkpointing: Periodic snapshots of the database state.
      • fsync and Synchronous Commit: Ensures data is physically written to disk.

      How PostgreSQL Achieves Durability

      Miro Board for Postgres Architechture – https://miro.com/app/board/uXjVLD2T5os=/

      1. Write-Ahead Logging (WAL)

      PostgreSQL uses WAL to ensure durability. Before modifying the actual data, it writes the changes to a WAL file. This ensures that even if the system crashes, the database can be recovered by replaying the WAL logs.

      
      -- Enable WAL logging (default in PostgreSQL)
      SHOW wal_level;
      

      2. Checkpoints

      A checkpoint is a mechanism where the database writes all changes to disk, ensuring the database’s state is up-to-date. Checkpoints reduce the time required for crash recovery by limiting the number of WAL files that need to be replayed.

      
      -- Force a manual checkpoint
      CHECKPOINT;
      

      3. Synchronous Commit

      By default, PostgreSQL ensures that changes are flushed to disk before a transaction is marked as committed. This is controlled by the synchronous_commit setting.

      
      -- Show current synchronous commit setting
      SHOW synchronous_commit;
      
      -- Change synchronous commit setting
      SET synchronous_commit = 'on';
      

      4. Backup and Replication

      To further ensure durability, PostgreSQL supports backups and replication. Logical and physical backups can be used to restore data in case of catastrophic failures.

      Practical Examples of Durability

      Example 1: Ensuring Transaction Durability

      
      BEGIN;
      
      -- Update an account balance
      UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;
      
      -- Commit the transaction
      COMMIT;
      
      -- Crash the system now; the committed transaction will persist.
      

      Even if the database crashes immediately after the COMMIT, the changes will persist, as the transaction logs have already been written to disk.

      Example 2: WAL Recovery after Crash

      Suppose a crash occurs immediately after a transaction is committed.

      Scenario:

      
      BEGIN;
      INSERT INTO transactions (account_id, amount, transaction_type) VALUES (1, 500, 'credit');
      COMMIT;
      

      During the recovery process, PostgreSQL replays the WAL logs to restore the committed transactions.

      Example 3: Configuring Synchronous Commit

      Control durability settings based on performance and reliability needs.

      
      -- Use asynchronous commit for faster performance (risking durability)
      SET synchronous_commit = 'off';
      
      -- Perform a transaction
      BEGIN;
      UPDATE accounts SET balance = balance + 200 WHERE account_id = 2;
      COMMIT;
      
      -- Changes might be lost if the system crashes before the WAL is flushed.
      

      Trade-offs of Durability

      While durability ensures data persistence, it can affect database performance. For example:

      • Enforcing synchronous commits may slow down transactions.
      • Checkpointing can momentarily impact query performance due to disk I/O.

      For high-performance systems, durability settings can be fine-tuned based on the application’s tolerance for potential data loss.

      Durability and Other ACID Properties

      Durability works closely with the other ACID properties:

      1. Atomicity: Ensures the all-or-nothing nature of transactions.
      2. Consistency: Guarantees the database remains in a valid state after a transaction.
      3. Isolation: Prevents concurrent transactions from interfering with each other.

      Learning Notes #29 – Two Phase Commit Protocol | ACID in Distributed Systems

      3 January 2025 at 13:45

      Today, i learnt about compensating transaction pattern which leads to two phase commit protocol which helps in maintaining the Atomicity of a distributed transactions. Distributed transactions are hard.

      In this blog, i jot down notes on Two Phase Commit protocol for better understanding.

      The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in transactions spanning multiple nodes or databases. Atomicity ensures that either all parts of a transaction are committed or none are, maintaining consistency in distributed systems.

      Why Two-Phase Commit?

      In distributed systems, a transaction might involve several independent nodes, each maintaining its own database. Without a mechanism like 2PC, failures in one node can leave the system in an inconsistent state.

      For example, consider an e-commerce platform where a customer places an order.

      The transaction involves updating the inventory in one database, recording the payment in another, and generating a shipment request in a third system. If the payment database successfully commits but the inventory database fails, the system becomes inconsistent, potentially causing issues like double selling or incomplete orders. 2PC mitigates this by providing a coordinated protocol to commit or abort transactions across all nodes.

      The Phases of 2PC

      The protocol operates in two main phases

      1. Prepare Phase (Voting Phase)

      The coordinator node initiates the transaction and prepares to commit it across all participating nodes.

      1. Request to Prepare: The coordinator sends a PREPARE request to all participant nodes.
      2. Vote: Each participant checks if it can commit the transaction (e.g., no constraints violated, resources available). It logs its decision (YES or NO) locally and sends its vote to the coordinator. If any participant votes NO, the transaction cannot be committed.

      2. Commit Phase (Decision Phase)

      Based on the votes received in the prepare phase, the coordinator decides the final outcome.

      Commit Decision:

      If all participants vote YES, the coordinator logs a COMMIT decision, sends COMMIT messages to all participants, and participants apply the changes and confirm with an acknowledgment.

      Abort Decision:

      If any participant votes NO, the coordinator logs an ABORT decision, sends ABORT messages to all participants, and participants roll back any changes made during the transaction.

      Implementation:

      For a simple implementation of 2PC, we can try out the below flow using RabbitMQ as a medium for Co-Ordinator.

      Basically, we need not to write this from scratch, we have tools,

      1. Relational Databases

      Most relational databases have built-in support for distributed transactions and 2PC.

      • PostgreSQL: Implements distributed transactions using foreign data wrappers (FDWs) with PREPARE TRANSACTION and COMMIT PREPARED.
      • MySQL: Supports XA transactions, which follow the 2PC protocol.
      • Oracle Database: Offers robust distributed transaction support using XA.
      • Microsoft SQL Server: Provides distributed transactions through MS-DTC.

      2. Distributed Transaction Managers

      These tools manage distributed transactions across multiple systems.

      • Atomikos: A popular Java-based transaction manager supporting JTA/XA for distributed systems.
      • Bitronix: Another lightweight transaction manager for Java applications supporting JTA/XA.
      • JBoss Transactions (Narayana): A robust Java transaction manager that supports 2PC, often used in conjunction with JBoss servers.

      3. Message Brokers

      Message brokers provide transaction capabilities with 2PC.

      • RabbitMQ: Supports the 2PC protocol using transactional channels.
      • Apache Kafka: Supports transactions, ensuring β€œexactly-once” semantics across producers and consumers.
      • ActiveMQ: Provides distributed transaction support through JTA integration

      4. Workflow Engines

      Workflow engines can orchestrate 2PC across distributed systems.

      • Apache Camel: Can coordinate 2PC transactions using its transaction policy.
      • Camunda: Provides BPMN-based orchestration that can include transactional boundaries.
      • Zeebe: Supports distributed transaction workflows in modern architectures.

      Key Properties of 2PC

      1. Atomicity: Ensures all-or-nothing transaction behavior.
      2. Consistency: Guarantees system consistency across all nodes.
      3. Durability: Uses logs to ensure decisions survive node failures.

      Challenges of 2PC

      1. Blocking Nature: If the coordinator fails during the commit phase, participants must wait indefinitely unless a timeout or external mechanism is implemented.
      2. Performance Overhead: Multiple message exchanges and logging operations introduce latency.
      3. Single Point of Failure: The coordinator’s failure can stall the entire transaction.

      POTD #2 Search in a Row-Column sorted matrix | Geeks For Geeks

      22 December 2024 at 05:50

      Second day of POTD Geeks For Geeks. https://www.geeksforgeeks.org/problems/search-in-a-matrix17201720/1.

      Given a 2D integer matrix mat[][] of size n x m, where every row and column is sorted in increasing order and a number x,the task is to find whether element x is present in the matrix.

      Examples:

      
      Input: mat[][] = [[3, 30, 38],[20, 52, 54],[35, 60, 69]], x = 62
      Output: false
      Explanation: 62 is not present in the matrix, so output is false.
      
      
      Input: mat[][] = [[18, 21, 27],[38, 55, 67]], x = 55
      Output: true
      Explanation: 55 is present in the matrix.
      

      My Approach

      The question states that every row in the matrix is sorted in ascending order. So we can use the binary search to find the element inside each array.

      So ,

      1. Iterate each array of the matrix.
      2. Find the element in array using binary search.

      #User function Template for python3
      class Solution:
          
          def binary_search(self, arr, x, start, stop):
              if start > stop:
                  return False
              mid = (start + stop) // 2
              if start == stop and arr[start] != x:
                  return False
              if arr[mid] == x:
                  return True
              elif arr[mid] > x:
                  return self.binary_search(arr, x, start, mid)
              else:
                  return self.binary_search(arr, x, mid+1, stop)
              
              
          
          def matSearch(self, mat, x):
              # Complete this function
              for arr in mat:
                  result = self.binary_search(arr, x, 0, len(arr)-1)
                  if result:
                      return True
              return False
      
      
      

      ❌
      ❌