Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

Fixed Number of Partitions
- The number of partitions is determined during system initialization (e.g., 8 partitions).
- Data is assigned to these partitions based on a consistent hashing algorithm.
Stable Data Mapping
- Each piece of data is permanently mapped to a specific partition.
- This eliminates the need for large-scale data reshuffling when scaling the system.
Adjustable Partition-to-Server Mapping
- Partitions can be reassigned to different servers as the system scales.
- Only the physical location of the partitions changes; the fixed mapping remains intact.
Balanced Load Distribution
- Partitions are distributed evenly across servers to balance the workload.
- Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

Server 1: transactions_p0, transactions_p1
Server 2: transactions_p2, transactions_p3
Server 3: transactions_p4, transactions_p5
Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

Server 1: transactions_p0
Server 2: transactions_p1
Server 3: transactions_p2, transactions_p3
Server 4: transactions_p4
Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

During the migration, transactions_p5 is copied from Server 3 to Server 5.
Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

The copying is limited to the partition being reassigned.
No data within the partition is reorganized.
Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

Reading view