❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #35 – Durability in ACID | Postgres

4 January 2025 at 12:47

As part of ACID series, i am refreshing on topic Durability. In this blog i jot down notes on durability for better understanding.

What is Durability?

Durability ensures that the effects of a committed transaction are permanently saved to the database. This property prevents data loss by ensuring that committed transactions survive unexpected interruptions such as power outages, crashes, or system reboots.

PostgreSQL achieves durability through a combination of

  • Write-Ahead Logging (WAL): Changes are written to a log file before they are applied to the database.
  • Checkpointing: Periodic snapshots of the database state.
  • fsync and Synchronous Commit: Ensures data is physically written to disk.

How PostgreSQL Achieves Durability

Miro Board for Postgres Architechture – https://miro.com/app/board/uXjVLD2T5os=/

1. Write-Ahead Logging (WAL)

PostgreSQL uses WAL to ensure durability. Before modifying the actual data, it writes the changes to a WAL file. This ensures that even if the system crashes, the database can be recovered by replaying the WAL logs.


-- Enable WAL logging (default in PostgreSQL)
SHOW wal_level;

2. Checkpoints

A checkpoint is a mechanism where the database writes all changes to disk, ensuring the database’s state is up-to-date. Checkpoints reduce the time required for crash recovery by limiting the number of WAL files that need to be replayed.


-- Force a manual checkpoint
CHECKPOINT;

3. Synchronous Commit

By default, PostgreSQL ensures that changes are flushed to disk before a transaction is marked as committed. This is controlled by the synchronous_commit setting.


-- Show current synchronous commit setting
SHOW synchronous_commit;

-- Change synchronous commit setting
SET synchronous_commit = 'on';

4. Backup and Replication

To further ensure durability, PostgreSQL supports backups and replication. Logical and physical backups can be used to restore data in case of catastrophic failures.

Practical Examples of Durability

Example 1: Ensuring Transaction Durability


BEGIN;

-- Update an account balance
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;

-- Commit the transaction
COMMIT;

-- Crash the system now; the committed transaction will persist.

Even if the database crashes immediately after the COMMIT, the changes will persist, as the transaction logs have already been written to disk.

Example 2: WAL Recovery after Crash

Suppose a crash occurs immediately after a transaction is committed.

Scenario:


BEGIN;
INSERT INTO transactions (account_id, amount, transaction_type) VALUES (1, 500, 'credit');
COMMIT;

During the recovery process, PostgreSQL replays the WAL logs to restore the committed transactions.

Example 3: Configuring Synchronous Commit

Control durability settings based on performance and reliability needs.


-- Use asynchronous commit for faster performance (risking durability)
SET synchronous_commit = 'off';

-- Perform a transaction
BEGIN;
UPDATE accounts SET balance = balance + 200 WHERE account_id = 2;
COMMIT;

-- Changes might be lost if the system crashes before the WAL is flushed.

Trade-offs of Durability

While durability ensures data persistence, it can affect database performance. For example:

  • Enforcing synchronous commits may slow down transactions.
  • Checkpointing can momentarily impact query performance due to disk I/O.

For high-performance systems, durability settings can be fine-tuned based on the application’s tolerance for potential data loss.

Durability and Other ACID Properties

Durability works closely with the other ACID properties:

  1. Atomicity: Ensures the all-or-nothing nature of transactions.
  2. Consistency: Guarantees the database remains in a valid state after a transaction.
  3. Isolation: Prevents concurrent transactions from interfering with each other.

Learning Notes #34 – Consistency (Correctness) in ACID | Postgres

4 January 2025 at 12:37

As part of the ACID Series, i am refreshing on consistency. In this blog, i jot down notes on consistency (correctness) in postgres database.

What is Consistency?

Consistency ensures that a transaction brings the database from one valid state to another, adhering to predefined rules such as constraints, triggers, and relational integrity. If a transaction violates these rules, it is aborted, and the database remains unchanged. This guarantees that only valid data exists in the database.

Consistency works together with other ACID properties:

  • Atomicity ensures the β€œall-or-nothing” execution of a transaction.
  • Isolation ensures transactions don’t interfere with each other.
  • Durability guarantees committed transactions persist despite system failures

Key Aspects of Consistency in PostgreSQL

  1. Constraints
    • Primary Key: Ensures uniqueness of rows.
    • Foreign Key: Maintains referential integrity.
    • Check Constraints: Enforces custom business rules.
    • Not Null: Ensures that specific columns cannot have null values.
  2. Triggers
    • Custom logic executed before or after specific database events.
  3. Rules
    • Enforce application-specific invariants on the database.
  4. Transactions
    • Changes are made in a controlled environment, ensuring consistency even in the event of errors or system failures.

Practical Examples of Consistency in PostgreSQL

1. Primary Key Constraint

Ensures that no two rows in a table have the same primary key value.


CREATE TABLE accounts (
    account_id SERIAL PRIMARY KEY,
    account_holder_name VARCHAR(255) NOT NULL,
    balance NUMERIC(15, 2) NOT NULL CHECK (balance >= 0)
);

-- Attempt to insert duplicate primary keys.
INSERT INTO accounts (account_id, account_holder_name, balance)
VALUES (1, 'Alice', 1000.00);

INSERT INTO accounts (account_id, account_holder_name, balance)
VALUES (1, 'Bob', 2000.00); -- This will fail.

2. Foreign Key Constraint

Enforces referential integrity between tables.


CREATE TABLE transactions (
    transaction_id SERIAL PRIMARY KEY,
    account_id INT NOT NULL REFERENCES accounts(account_id),
    amount NUMERIC(15, 2) NOT NULL,
    transaction_type VARCHAR(10) NOT NULL CHECK (transaction_type IN ('credit', 'debit')),
    transaction_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Attempt to insert a transaction for a non-existent account.
INSERT INTO transactions (account_id, amount, transaction_type)
VALUES (999, 500, 'credit'); -- This will fail.

3. Check Constraint

Validates custom business rules.


-- Ensure account balance cannot go negative.
INSERT INTO accounts (account_holder_name, balance)
VALUES ('Charlie', -500); -- This will fail due to the CHECK constraint.

4. Trigger for Business Logic

Ensures derived data or additional checks are implemented.


CREATE OR REPLACE FUNCTION enforce_minimum_balance()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.balance < 0 THEN
        RAISE EXCEPTION 'Balance cannot be negative';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER check_balance_before_insert
BEFORE INSERT OR UPDATE ON accounts
FOR EACH ROW EXECUTE FUNCTION enforce_minimum_balance();

-- Attempt to update an account with a negative balance.
UPDATE accounts SET balance = -100 WHERE account_id = 1; -- This will fail.

5. Transactions to Maintain Consistency

A transaction groups multiple operations into a single unit, ensuring all succeed or none.


BEGIN;

-- Deduct from sender's account.
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;

-- Credit to receiver's account.
UPDATE accounts SET balance = balance + 500 WHERE account_id = 2;

-- If any operation fails, rollback the transaction.
COMMIT;

If the system crashes before the COMMIT, the database remains unchanged, ensuring consistency.

How Consistency Works with Other ACID Properties

  1. With Atomicity: If any step in a transaction violates a constraint, the entire transaction is rolled back, ensuring that the database remains consistent.
  2. With Isolation: Concurrent transactions operate independently, preventing inconsistent states caused by interference.
  3. With Durability: Once a transaction is committed, its consistency guarantees persist even in the event of a crash.

Benefits of Consistency

  1. Data Integrity: Prevents invalid data from being stored.
  2. Application Reliability: Reduces the need for additional application-level checks.
  3. Simplified Maintenance: Developers can rely on the database to enforce business rules and relationships.
  4. Error Prevention: Constraints and triggers act as safeguards, catching mistakes early.

Learning Notes #29 – Two Phase Commit Protocol | ACID in Distributed Systems

3 January 2025 at 13:45

Today, i learnt about compensating transaction pattern which leads to two phase commit protocol which helps in maintaining the Atomicity of a distributed transactions. Distributed transactions are hard.

In this blog, i jot down notes on Two Phase Commit protocol for better understanding.

The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in transactions spanning multiple nodes or databases. Atomicity ensures that either all parts of a transaction are committed or none are, maintaining consistency in distributed systems.

Why Two-Phase Commit?

In distributed systems, a transaction might involve several independent nodes, each maintaining its own database. Without a mechanism like 2PC, failures in one node can leave the system in an inconsistent state.

For example, consider an e-commerce platform where a customer places an order.

The transaction involves updating the inventory in one database, recording the payment in another, and generating a shipment request in a third system. If the payment database successfully commits but the inventory database fails, the system becomes inconsistent, potentially causing issues like double selling or incomplete orders. 2PC mitigates this by providing a coordinated protocol to commit or abort transactions across all nodes.

The Phases of 2PC

The protocol operates in two main phases

1. Prepare Phase (Voting Phase)

The coordinator node initiates the transaction and prepares to commit it across all participating nodes.

  1. Request to Prepare: The coordinator sends a PREPARE request to all participant nodes.
  2. Vote: Each participant checks if it can commit the transaction (e.g., no constraints violated, resources available). It logs its decision (YES or NO) locally and sends its vote to the coordinator. If any participant votes NO, the transaction cannot be committed.

2. Commit Phase (Decision Phase)

Based on the votes received in the prepare phase, the coordinator decides the final outcome.

Commit Decision:

If all participants vote YES, the coordinator logs a COMMIT decision, sends COMMIT messages to all participants, and participants apply the changes and confirm with an acknowledgment.

Abort Decision:

If any participant votes NO, the coordinator logs an ABORT decision, sends ABORT messages to all participants, and participants roll back any changes made during the transaction.

Implementation:

For a simple implementation of 2PC, we can try out the below flow using RabbitMQ as a medium for Co-Ordinator.

Basically, we need not to write this from scratch, we have tools,

1. Relational Databases

Most relational databases have built-in support for distributed transactions and 2PC.

  • PostgreSQL: Implements distributed transactions using foreign data wrappers (FDWs) with PREPARE TRANSACTION and COMMIT PREPARED.
  • MySQL: Supports XA transactions, which follow the 2PC protocol.
  • Oracle Database: Offers robust distributed transaction support using XA.
  • Microsoft SQL Server: Provides distributed transactions through MS-DTC.

2. Distributed Transaction Managers

These tools manage distributed transactions across multiple systems.

  • Atomikos: A popular Java-based transaction manager supporting JTA/XA for distributed systems.
  • Bitronix: Another lightweight transaction manager for Java applications supporting JTA/XA.
  • JBoss Transactions (Narayana): A robust Java transaction manager that supports 2PC, often used in conjunction with JBoss servers.

3. Message Brokers

Message brokers provide transaction capabilities with 2PC.

  • RabbitMQ: Supports the 2PC protocol using transactional channels.
  • Apache Kafka: Supports transactions, ensuring β€œexactly-once” semantics across producers and consumers.
  • ActiveMQ: Provides distributed transaction support through JTA integration

4. Workflow Engines

Workflow engines can orchestrate 2PC across distributed systems.

  • Apache Camel: Can coordinate 2PC transactions using its transaction policy.
  • Camunda: Provides BPMN-based orchestration that can include transactional boundaries.
  • Zeebe: Supports distributed transaction workflows in modern architectures.

Key Properties of 2PC

  1. Atomicity: Ensures all-or-nothing transaction behavior.
  2. Consistency: Guarantees system consistency across all nodes.
  3. Durability: Uses logs to ensure decisions survive node failures.

Challenges of 2PC

  1. Blocking Nature: If the coordinator fails during the commit phase, participants must wait indefinitely unless a timeout or external mechanism is implemented.
  2. Performance Overhead: Multiple message exchanges and logging operations introduce latency.
  3. Single Point of Failure: The coordinator’s failure can stall the entire transaction.

❌
❌