Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Parotta Salna
Learning Notes #21 – Sharding (database) with Postgres
31 December 2024 at 07:54

Learning Notes #21 – Sharding (database) with Postgres

By: Mr.ParottaSalna

31 December 2024 at 07:54

…contd. Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

As the volume of data grows, managing databases efficiently becomes critical and when we understood that vertical scaling has its limits, we have two common strategies to handle large datasets are partitioning and sharding. While they may sound similar, these techniques serve different purposes and are implemented differently. Let’s explore these concepts in detail.

What is Sharding?

Sharding is a type of database architecture where data is horizontally divided across multiple database instances, called shards. Each shard is an independent database, often hosted on separate servers. Sharding is commonly used to scale out databases.

How Sharding Works

Shard Key
- A shard key determines how data is distributed across shards.
- Example: A social media app might use user IDs as the shard key to ensure all data related to a user resides in the same shard.
Data Distribution
- Data is split horizontally; each shard contains a subset of the entire dataset.
- Shards can be distributed geographically or across servers for better performance.

Combining Partitioning and Sharding

In some advanced architectures, partitioning and sharding are combined. Here, partitioned data is further distributed across shards. Each shard can manage its partitions independently, providing both scalability and query optimization.

PostgreSQL Example with Citus

1. Install the Citus extension (https://www.citusdata.com/download/, https://github.com/citusdata/citus?tab=readme-ov-file#getting-started)

2. Create a distributed table with partitioning

CREATE TABLE orders (
    id SERIAL,
    customer_id INT,
    order_date DATE NOT NULL,
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

SELECT create_distributed_table('orders', 'customer_id');

CREATE TABLE orders_jan PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_feb PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

3. Add worker nodes to manage shards

SELECT master_add_node('worker1', 5432);
SELECT master_add_node('worker2', 5432);

4. Data placement

The data for each partition (e.g., orders_jan) is distributed across the shards (worker1, worker2) based on the shard key (customer_id).

5. Insert and query data

INSERT INTO orders (customer_id, order_date) VALUES (101, '2024-01-15');
SELECT * FROM orders WHERE customer_id = 101;

Parotta Salna
Learning Notes #20 – Partitioning (data) With Postgres
31 December 2024 at 06:55

Learning Notes #20 – Partitioning (data) With Postgres

Parotta Salna

By: Mr.ParottaSalna

31 December 2024 at 06:55

Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

What is Partitioning?

Partitioning involves dividing a large dataset into smaller, manageable segments, known as partitions. Each partition is stored separately but remains part of a single database instance. Partitioning is typically used to improve query performance and manageability.

Types of Partitioning

1. Range Partitioning

Data is divided based on ranges of a column’s values.
Example: A table storing customer orders might partition data by order date: January orders in one partition, February orders in another.

PostgreSQL Example

CREATE TABLE orders (
    id SERIAL,
    customer_id INT,
    order_date DATE NOT NULL,
    PRIMARY KEY (id, order_date) -- Include the partition key
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_jan PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_feb PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

2. Hash Partitioning

A hash function determines the partition where a record will be stored.
Example: Orders can be distributed across partitions based on the hash of the customer ID.

Postgres Example

CREATE TABLE orders (
    id SERIAL ,
    customer_id INT,
    order_date DATE NOT NULL,
    PRIMARY KEY (id, customer_id)
) PARTITION BY HASH (customer_id, id);

CREATE TABLE orders_part_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE orders_part_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

3. List Partitioning

Data is divided based on a predefined list of values.
Example: A table storing sales data could partition based on regions: North, South, East, and West

Postgres Example

CREATE TABLE sales (
    id SERIAL ,
    region TEXT NOT NULL,
    amount NUMERIC,
    PRIMARY KEY (id, region)
) PARTITION BY LIST (region);

CREATE TABLE sales_north PARTITION OF sales
    FOR VALUES IN ('North');

CREATE TABLE sales_south PARTITION OF sales
    FOR VALUES IN ('South');

4. Composite Partitioning

Combines two or more partitioning strategies, such as range and list partitioning.
Example: A table partitioned by range on order date and sub-partitioned by list on region.

Postgres Example

CREATE TABLE orders (
    id SERIAL,
    customer_id INT,
    order_date DATE NOT NULL,
    region TEXT NOT NULL,
    PRIMARY KEY (id, order_date, region)
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY LIST (region);

CREATE TABLE orders_2024_north PARTITION OF orders_2024
    FOR VALUES IN ('North');

CREATE TABLE orders_2024_south PARTITION OF orders_2024
    FOR VALUES IN ('South');