❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #35 – Durability in ACID | Postgres

4 January 2025 at 12:47

As part of ACID series, i am refreshing on topic Durability. In this blog i jot down notes on durability for better understanding.

What is Durability?

Durability ensures that the effects of a committed transaction are permanently saved to the database. This property prevents data loss by ensuring that committed transactions survive unexpected interruptions such as power outages, crashes, or system reboots.

PostgreSQL achieves durability through a combination of

  • Write-Ahead Logging (WAL): Changes are written to a log file before they are applied to the database.
  • Checkpointing: Periodic snapshots of the database state.
  • fsync and Synchronous Commit: Ensures data is physically written to disk.

How PostgreSQL Achieves Durability

Miro Board for Postgres Architechture – https://miro.com/app/board/uXjVLD2T5os=/

1. Write-Ahead Logging (WAL)

PostgreSQL uses WAL to ensure durability. Before modifying the actual data, it writes the changes to a WAL file. This ensures that even if the system crashes, the database can be recovered by replaying the WAL logs.


-- Enable WAL logging (default in PostgreSQL)
SHOW wal_level;

2. Checkpoints

A checkpoint is a mechanism where the database writes all changes to disk, ensuring the database’s state is up-to-date. Checkpoints reduce the time required for crash recovery by limiting the number of WAL files that need to be replayed.


-- Force a manual checkpoint
CHECKPOINT;

3. Synchronous Commit

By default, PostgreSQL ensures that changes are flushed to disk before a transaction is marked as committed. This is controlled by the synchronous_commit setting.


-- Show current synchronous commit setting
SHOW synchronous_commit;

-- Change synchronous commit setting
SET synchronous_commit = 'on';

4. Backup and Replication

To further ensure durability, PostgreSQL supports backups and replication. Logical and physical backups can be used to restore data in case of catastrophic failures.

Practical Examples of Durability

Example 1: Ensuring Transaction Durability


BEGIN;

-- Update an account balance
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;

-- Commit the transaction
COMMIT;

-- Crash the system now; the committed transaction will persist.

Even if the database crashes immediately after the COMMIT, the changes will persist, as the transaction logs have already been written to disk.

Example 2: WAL Recovery after Crash

Suppose a crash occurs immediately after a transaction is committed.

Scenario:


BEGIN;
INSERT INTO transactions (account_id, amount, transaction_type) VALUES (1, 500, 'credit');
COMMIT;

During the recovery process, PostgreSQL replays the WAL logs to restore the committed transactions.

Example 3: Configuring Synchronous Commit

Control durability settings based on performance and reliability needs.


-- Use asynchronous commit for faster performance (risking durability)
SET synchronous_commit = 'off';

-- Perform a transaction
BEGIN;
UPDATE accounts SET balance = balance + 200 WHERE account_id = 2;
COMMIT;

-- Changes might be lost if the system crashes before the WAL is flushed.

Trade-offs of Durability

While durability ensures data persistence, it can affect database performance. For example:

  • Enforcing synchronous commits may slow down transactions.
  • Checkpointing can momentarily impact query performance due to disk I/O.

For high-performance systems, durability settings can be fine-tuned based on the application’s tolerance for potential data loss.

Durability and Other ACID Properties

Durability works closely with the other ACID properties:

  1. Atomicity: Ensures the all-or-nothing nature of transactions.
  2. Consistency: Guarantees the database remains in a valid state after a transaction.
  3. Isolation: Prevents concurrent transactions from interfering with each other.

Learning Notes #28 – Unlogged Table in Postgres

2 January 2025 at 17:30

Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

What Are Unlogged Tables?

Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

Key Characteristics of Unlogged Tables

  1. No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
  2. No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
  3. Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
  4. Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
  5. Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

Drawbacks of Unlogged Tables

  1. Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
  2. Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
  3. No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

Creating an Unlogged Table

Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.


CREATE UNLOGGED TABLE temp_data (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    value INT
);

In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

When to Avoid Unlogged Tables?

  • If you are working with critical data that needs to be durable and persistent across restarts.
  • If your application requires data replication, as unlogged tables are not replicated in standby servers.
  • If your workload involves frequent crash scenarios where data loss cannot be tolerated.

Examples

1. Temporary Storage for processing


CREATE UNLOGGED TABLE etl_staging (
    source_id INT,
    raw_data JSONB,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert raw data into the staging table
INSERT INTO etl_staging (source_id, raw_data)
VALUES 
    (1, '{"key": "value1"}'),
    (2, '{"key": "value2"}');

-- Perform transformations on the data
INSERT INTO final_table (id, key, value)
SELECT source_id, 
       raw_data->>'key' AS key, 
       'processed_value' AS value
FROM etl_staging;

-- Clear the staging table
TRUNCATE TABLE etl_staging;

2. Caching


CREATE UNLOGGED TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id INT,
    last_accessed TIMESTAMP DEFAULT NOW()
);

-- Insert session data
INSERT INTO user_sessions (session_id, user_id)
VALUES 
    (uuid_generate_v4(), 101),
    (uuid_generate_v4(), 102);

-- Update last accessed timestamp
UPDATE user_sessions
SET last_accessed = NOW()
WHERE session_id = 'some-session-id';

-- Delete expired sessions
DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';

❌
❌