❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #28 – Unlogged Table in Postgres

2 January 2025 at 17:30

Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

What Are Unlogged Tables?

Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

Key Characteristics of Unlogged Tables

  1. No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
  2. No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
  3. Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
  4. Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
  5. Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

Drawbacks of Unlogged Tables

  1. Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
  2. Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
  3. No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

Creating an Unlogged Table

Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.


CREATE UNLOGGED TABLE temp_data (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    value INT
);

In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

When to Avoid Unlogged Tables?

  • If you are working with critical data that needs to be durable and persistent across restarts.
  • If your application requires data replication, as unlogged tables are not replicated in standby servers.
  • If your workload involves frequent crash scenarios where data loss cannot be tolerated.

Examples

1. Temporary Storage for processing


CREATE UNLOGGED TABLE etl_staging (
    source_id INT,
    raw_data JSONB,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert raw data into the staging table
INSERT INTO etl_staging (source_id, raw_data)
VALUES 
    (1, '{"key": "value1"}'),
    (2, '{"key": "value2"}');

-- Perform transformations on the data
INSERT INTO final_table (id, key, value)
SELECT source_id, 
       raw_data->>'key' AS key, 
       'processed_value' AS value
FROM etl_staging;

-- Clear the staging table
TRUNCATE TABLE etl_staging;

2. Caching


CREATE UNLOGGED TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id INT,
    last_accessed TIMESTAMP DEFAULT NOW()
);

-- Insert session data
INSERT INTO user_sessions (session_id, user_id)
VALUES 
    (uuid_generate_v4(), 101),
    (uuid_generate_v4(), 102);

-- Update last accessed timestamp
UPDATE user_sessions
SET last_accessed = NOW()
WHERE session_id = 'some-session-id';

-- Delete expired sessions
DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';

Postgres – Write-Ahead Logging (WAL) in PostgreSQL

16 November 2024 at 07:06

Write-Ahead Logging (WAL) is a fundamental feature of PostgreSQL, ensuring data integrity and facilitating critical functionalities like crash recovery, replication, and backup.

This series of experimentation explores WAL in detail, its importance, how it works, and provides examples to demonstrate its usage.

What is Write-Ahead Logging (WAL)?

WAL is a logging mechanism where changes to the database are first written to a log file before being applied to the actual data files. This ensures that in case of a crash or unexpected failure, the database can recover and replay these logs to restore its state.

Your question is right !

Why do we need a WAL, when we do a periodic backup ?

Write-Ahead Logging (WAL) is critical even when periodic backups are in place because it complements backups to provide data consistency, durability, and flexibility in the following scenarios.

1. Crash Recovery

  • Why It’s Important: Periodic backups only capture the database state at specific intervals. If a crash occurs after the latest backup, all changes made since that backup would be lost.
  • Role of WAL: WAL ensures that any committed transactions not yet written to data files (due to PostgreSQL’s lazy-writing behavior) are recoverable. During recovery, PostgreSQL replays the WAL logs to restore the database to its last consistent state, bridging the gap between the last checkpoint and the crash.

Example:

  • Backup Taken: At 12:00 PM.
  • Crash Occurs: At 1:30 PM.
  • Without WAL: All changes after 12:00 PM are lost.
  • With WAL: All changes up to 1:30 PM are recovered.

2. Point-in-Time Recovery (PITR)

  • Why It’s Important: Periodic backups restore the database to the exact time of the backup. However, this may not be sufficient if you need to recover to a specific point, such as just before a mistake (e.g., accidental data deletion).
  • Role of WAL: WAL records every change, enabling you to replay transactions up to a specific time. This allows fine-grained recovery beyond what periodic backups can provide.

Example:

  • Backup Taken: At 12:00 AM.
  • Mistake Made: At 9:45 AM, an important table is accidentally dropped.
  • Without WAL: Restore only to 12:00 AM, losing 9 hours and 45 minutes of data.
  • With WAL: Restore to 9:44 AM, recovering all valid changes except the accidental drop.

3. Replication and High Availability

  • Why It’s Important: In a high-availability setup, replicas must stay synchronized with the primary database to handle failovers. Periodic backups cannot provide real-time synchronization.
  • Role of WAL: WAL enables streaming replication by transmitting logs to replicas, ensuring near real-time synchronization.

Example:

  • A primary database sends WAL logs to replicas as changes occur. If the primary fails, a replica can quickly take over without data loss.

4. Handling Incremental Changes

  • Why It’s Important: Periodic backups store complete snapshots of the database, which can be time-consuming and resource-intensive. They also do not capture intermediate changes.
  • Role of WAL: WAL allows incremental updates by recording only the changes made since the last backup or checkpoint. This is crucial for efficient data recovery and backup optimization.

5. Ensuring Data Durability

  • Why It’s Important: Even during normal operations, a database crash (e.g., power failure) can occur. Without WAL, transactions committed by users but not yet flushed to disk are lost.
  • Role of WAL: WAL ensures durability by logging all changes before acknowledging transaction commits. This guarantees that committed transactions are recoverable even if the system crashes before flushing the changes to data files.

6. Supporting Hot Backups

  • Why It’s Important: For large, active databases, taking a backup while the database is running can result in inconsistent snapshots.
  • Role of WAL: WAL ensures consistency by recording changes that occur during the backup process. When replayed, these logs synchronize the backup, ensuring it is valid and consistent.

7. Debugging and Auditing

  • Why It’s Important: Periodic backups are static snapshots and don’t provide a record of what happened in the database between backups.
  • Role of WAL: WAL contains a sequential record of all database modifications, which can help in debugging issues or auditing transactions.
FeaturePeriodic BackupsWrite-Ahead Logging
Crash RecoveryLimited to the last backupEnsures full recovery to the crash point
Point-in-Time RecoveryRestores only to the backup timeAllows recovery to any specific point
ReplicationNot supportedEnables real-time replication
EfficiencyFull snapshotIncremental changes
DurabilityRelies on backup frequencyGuarantees transaction durability

In upcoming sessions, we will all experiment each one of the failure scenarios for understanding.

❌
❌