Parotta Salna
📊 Learn PostgreSQL in Tamil: From Zero to 5★ on HackerRank in Just 10 Days
25 May 2025 at 12:42

📊 Learn PostgreSQL in Tamil: From Zero to 5★ on HackerRank in Just 10 Days

By: Mr.ParottaSalna

25 May 2025 at 12:42

PostgreSQL is one of the most powerful, stable, and open-source relational database systems trusted by global giants like Apple, Instagram, and Spotify. Whether you’re building a web application, managing enterprise data, or diving into analytics, understanding PostgreSQL is a skill that sets you apart.

But what if you could master it in just 10 days, in Tamil, with hands-on learning and a guaranteed 5★ rating on HackerRank as your goal?

Sounds exciting? Let’s dive in.

Why This Bootcamp?

This 10-day PostgreSQL Bootcamp in Tamil is designed to take you from absolute beginner to confident practitioner, with a curriculum built around real-world use cases, performance optimization, and daily challenge-driven learning.

Whether you’re a

Student trying to get into backend development
Developer wanting to upskill and crack interviews
Data analyst exploring SQL performance
Tech enthusiast curious about databases

…this bootcamp gives you the structured path you need.

What You’ll Learn

Over 10 days, we’ll cover

PostgreSQL installation & setup
PostgreSQL architecture and internals
Writing efficient SQL queries with proper formatting
Joins, CTEs, subqueries, and advanced querying
Indexing, query plans, and performance tuning
Transactions, isolation levels, and locking mechanisms
Schema design for real-world applications
Debugging techniques, tips, and best practices
Daily HackerRank challenges to track your progress
Solve 40+ HackerRank SQL challenges

Bootcamp Highlights

Language of instruction: Tamil
Format: Online, live and interactive
Daily live sessions with Q&A
Practice-oriented learning using HackerRank
Notes, cheat sheets, and shared resources
Access to community support and mentorship
Learn through real-world datasets and scenarios

Check our previous Postgres session

Details at a Glance

Duration: 10 Days
Language: Tamil
Format: Online, hands-on
Book Your Slot: https://topmate.io/parottasalna/1558376
Goal: Earn 5★ in PostgreSQL on HackerRank
Suitable for: Students, developers, DBAs, and tech enthusiasts

Why You Shouldn’t Miss This

Learn one of the most in-demand database systems in your native language
Structured learning path with practical tasks and daily targets
Build confidence to work on real projects and solve SQL challenges
Lifetime value from one affordable investment.

Will meet you in session !!!

Kalai Arasan
Update data on table
18 May 2024 at 19:50

Update data on table

Kalai Arasan

By: kalaiarasanpandi

18 May 2024 at 19:50

UPDATE statement allows you to update data in one or more columns of one or more rows in a table.

UPDATE table_name
SET column1 = value1,
    column2 = value2,
    ...
WHERE condition;

In this syntax:

First, specify the name of the table that you want to update data after the UPDATE keyword.
Second, specify columns and their new values after SET keyword. The columns that do not appear in the SET clause retain their original values.
Third, determine which rows to update in the condition of the WHERE clause

Code:

UPDATE company SET salary = salary * 2
WHERE salary = 150000;

The WHERE clause is optional. If you omit the WHERE clause, the UPDATE statement will update all the column values in the table.

Code:

UPDATE company SET salary = salary * 2

Summary :

Use the UPDATE statement to update data in one or more columns of a table.
Specify a condition in a WHERE clause to determine which rows to update data.
Use the RETURNING clause to return the updated rows from the UPDATE statement

Reference :

https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-update/

Kalai Arasan
Insert data on table
18 May 2024 at 19:32

Insert data on table

Kalai Arasan

By: kalaiarasanpandi

18 May 2024 at 19:32

INSERT statement to insert a new row into a table.

INSERT INTO table1(column1, column2, …)
           VALUES (value1, value2, …);

In this syntax:

First, specify the name of the table (table1) that you want to insert data after the INSERT INTO keywords and a list of comma-separated columns (colum1, column2, ....).
Second, supply a list of comma-separated values in parentheses (value1, value2, ...) after the VALUES keyword. The column and value lists must be in the same order.

RETURNING clause

The INSERT statement has an optional RETURNING clause
returns the information of the inserted row.

INSERT INTO table1(column1, column2, …)
VALUES (value1, value2, …)
RETURNING *;

It return the inside table

We can return by any parameters.

Summary

Use PostgreSQL INSERT statement to insert a new row into a table.
Use the RETURNING clause to get the inserted rows.

Reference:

https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-insert/

how to install PGAdmin in Linux-mint

Kalai Arasan

By: kalaiarasanpandi

18 May 2024 at 18:56

Use jammy ubuntu version, for install the PGAdmin on linux-mint.

1) Install the public key for the repository (if not done previously):

sudo curl https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo apt-key add

2) Create the repository configuration file: (jammy is important)

sudo sh -c 'echo "deb https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/jammy pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update

3) Install pgAdmin

sudo apt install pgadmin4

4) Install for desktop mode only

sudo apt install pgadmin4-desktop

5) If you need, also install for web mode only:

sudo apt install pgadmin4-web

6) After install pgadmin4. Configure the webserver, (if you installed pgadmin4-web)

sudo /usr/pgadmin4/bin/setup-web.sh

Reference :

Update data on table

Journey of Full stack developer

By: kalaiarasanpandi

18 May 2024 at 19:50

UPDATE statement allows you to update data in one or more columns of one or more rows in a table.

UPDATE table_name
SET column1 = value1,
    column2 = value2,
    ...
WHERE condition;

In this syntax:

First, specify the name of the table that you want to update data after the UPDATE keyword.
Second, specify columns and their new values after SET keyword. The columns that do not appear in the SET clause retain their original values.
Third, determine which rows to update in the condition of the WHERE clause

Code:

UPDATE company SET salary = salary * 2
WHERE salary = 150000;

The WHERE clause is optional. If you omit the WHERE clause, the UPDATE statement will update all the column values in the table.

Code:

UPDATE company SET salary = salary * 2

Summary :

Use the UPDATE statement to update data in one or more columns of a table.
Specify a condition in a WHERE clause to determine which rows to update data.
Use the RETURNING clause to return the updated rows from the UPDATE statement

Reference :

https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-update/

Insert data on table

Journey of Full stack developer

By: kalaiarasanpandi

18 May 2024 at 19:32

INSERT statement to insert a new row into a table.

INSERT INTO table1(column1, column2, …)
           VALUES (value1, value2, …);

In this syntax:

First, specify the name of the table (table1) that you want to insert data after the INSERT INTO keywords and a list of comma-separated columns (colum1, column2, ....).
Second, supply a list of comma-separated values in parentheses (value1, value2, ...) after the VALUES keyword. The column and value lists must be in the same order.

RETURNING clause

The INSERT statement has an optional RETURNING clause
returns the information of the inserted row.

INSERT INTO table1(column1, column2, …)
VALUES (value1, value2, …)
RETURNING *;

It return the inside table

We can return by any parameters.

Summary

Use PostgreSQL INSERT statement to insert a new row into a table.
Use the RETURNING clause to get the inserted rows.

Reference:

https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-insert/

how to install PGAdmin in Linux-mint

Journey of Full stack developer

By: kalaiarasanpandi

18 May 2024 at 18:56

Use jammy ubuntu version, for install the PGAdmin on linux-mint.

1) Install the public key for the repository (if not done previously):

sudo curl https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo apt-key add

2) Create the repository configuration file: (jammy is important)

sudo sh -c 'echo "deb https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/jammy pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update

3) Install pgAdmin

sudo apt install pgadmin4

4) Install for desktop mode only

sudo apt install pgadmin4-desktop

5) If you need, also install for web mode only:

sudo apt install pgadmin4-web

6) After install pgadmin4. Configure the webserver, (if you installed pgadmin4-web)

sudo /usr/pgadmin4/bin/setup-web.sh

Reference :

Parotta Salna
🎯 PostgreSQL Zero to Hero with Parottasalna – 2 Day Bootcamp (FREE!) 🚀
2 March 2025 at 07:09

🎯 PostgreSQL Zero to Hero with Parottasalna – 2 Day Bootcamp (FREE!) 🚀

Parotta Salna

By: Mr.ParottaSalna

2 March 2025 at 07:09

Databases power the backbone of modern applications, and PostgreSQL is one of the most powerful open-source relational databases trusted by top companies worldwide. Whether you’re a beginner or a developer looking to sharpen your database skills, this FREE bootcamp will take you from Zero to Hero in PostgreSQL!

What You’ll Learn?

PostgreSQL fundamentals & installation

Postgres Architecture
Writing optimized queries
Indexing & performance tuning
Transactions & locking mechanisms
Advanced joins, CTEs & subqueries
Real-world best practices & hands-on exercises

This intensive hands on bootcamp is designed for developers, DBAs, and tech enthusiasts who want to master PostgreSQL from scratch and apply it in real-world scenarios.

Who Should Attend?

Beginners eager to learn databases
Developers & Engineers working with PostgreSQL
Anyone looking to optimize their SQL skills

Date: March 22, 23 -> (Moved to April 5, 6)
Time: Will be finalized later.
Location: Online
Cost: 100% FREE

RSVP Here

Session is not taken !!! Will be announced later.

Support us by subscribing on YouTube!

Prerequisite

Checkout this playlist of our previous postgres session https://www.youtube.com/playlist?list=PLiutOxBS1Miy3PPwxuvlGRpmNo724mAlt

This bootcamp is completely FREE – Learn without any cost!

Spots are limited – RSVP now to reserve your seat!

VS Raj
About SQL
31 January 2025 at 16:15

About SQL

VS Raj

By: vsraj80

31 January 2025 at 16:15

Structured Query Language

Relational Data-Base Management System

SQL is a Free Open Source Software

MySQL Client – front end MySQL Server – back end

Functions of SQL Client

Validating the password and authenticating

2. Receiving input from client end and convert it as token and send to sql server

3. Getting the results from SQL server to user

Functions of SQL Server

SQL server consists 2 Major part

Receiving the request from client and return the response after processing

1.Management Layer

a.Decoding the data

b.Validating and parsing(analyzing) the data

c.Sending the catched queries to Storage Engine

2.Storage Engine

a.Managing Database,tables,indexes

b.sending the data to other shared SQL Server

Install SQL in Ubuntu

sudo apt-get install mysql-server

To make secure configure as below

sudo mysql_secure_installation

1.It used to removes Anonymous users

2.Allow the root only from the local host

3.Removing the test database

MySQL Configuration options

/etc/mysql is the MySQL configuration directory

To Start MySQL

sudo service mysql start

To Stop MySQL

sudo service mysql stop

To Restart MySQL

sudo service mysql restart

MySQL Clients

Normally we will use mysql in command line

But in linux we can access through following GUI

MySQL Work Bench

sudo apt-get install MySQL-workbench

MySQL Navigator

sudo apt-get install MySQL-navigator

EMMA

sudo apt-get install emma

PHP MYAdmin

sudo aptitude install phpmyadmin

MySQL Admin

sudo apt-get install MySQL-admin

Kinds of MySQL

1.GUI based Desktop based application

2.Web based application

3.Shell based application -(text-only based applications)

To connect the server with MySQL client

mysql -u root -p

To connect with a particular host , user name, database name

mysql - h

mysql -u

mysql -p

if not given the above host/username/password , it will take default local server/ uinux user name and without password for authentication.

to find more options about mysql

mysql -?

to disconnect the client with server

exit

from page 33 to 39 need to understand and read agan.

Build a Product Rental App in Node.js

Krishna

By: krishna

29 January 2025 at 10:47

Introduction

I created a website called Vinmeen that allows users to rent products for temporary needs at a low cost. The goal was to design a simple UI for users to easily rent things they need temporarily.

Technologies Used

Node.js & Express
Node Packages
- Express
- EJS
- Nodemailer
- Bcrypt
- Multer
- Sync-SQL
- MySQL
MySQL

What I Learned from This Project

This project helped me understand how dynamic websites work and how template rendering is done. I used EJS for rendering templates, MySQL for database handling, and Bcrypt for securely storing user passwords through hashing. I also learned how to send email notifications with OTP and rent requests, among other things.

Hosting

I hosted the site using two different services

Website Hosting – Render.com

Render provides free hosting for experimentation and student projects. This plan has minimal resources, but it’s great for learning and testing.

MySQL Database – Filess.io:

Files.io offers a free MySQL database with a 10MB size limit and a maximum of 5 concurrent connections. It’s ideal for students and self-study projects, but not recommended for startups or businesses.

Links

Website : vinmeen
SourceCode(Github) : code

Parotta Salna
Learning Notes #51 – Postgres as a Queue using SKIP LOCKED
11 January 2025 at 06:56

Learning Notes #51 – Postgres as a Queue using SKIP LOCKED

Parotta Salna

By: Mr.ParottaSalna

11 January 2025 at 06:56

Yesterday, i came across a blog from inferable.ai https://www.inferable.ai/blog/posts/postgres-skip-locked, which walkthrough about using postgres as a queue. In this blog, i jot down notes on using postgres as a queue for future references.

PostgreSQL is a robust relational database that can be used for more than just storing structured data. With the SKIP LOCKED feature introduced in PostgreSQL 9.5, you can efficiently turn a PostgreSQL table into a job queue for distributed processing.

Why Use PostgreSQL as a Queue?

Using PostgreSQL as a queue can be advantageous because,

Familiarity: If you’re already using PostgreSQL, there’s no need for an additional message broker.
Durability: PostgreSQL ensures ACID compliance, offering reliability for your job processing.
Simplicity: No need to manage another component like RabbitMQ or Kafka

Implementing a Queue with SKIP LOCKED

1. Create a Queue Table

To start, you need a table to store the jobs,


CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    job_data JSONB NOT NULL,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This table has the following columns,

id: A unique identifier for each job.
job_data: The data or payload for the job.
status: Tracks the job’s state (‘pending’, ‘in_progress’, or ‘completed’).
created_at: Timestamp of job creation.

2. Insert Jobs into the Queue

Adding jobs is straightforward,


INSERT INTO job_queue (job_data)
VALUES ('{"task": "send_email", "email": "user@example.com"}');

3. Fetch Jobs for Processing with SKIP LOCKED

Workers will fetch jobs from the queue using SELECT ... FOR UPDATE SKIP LOCKED to avoid contention,

WITH next_job AS (
    SELECT id, job_data
    FROM job_queue
    WHERE status = 'pending'
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
UPDATE job_queue
SET status = 'in_progress'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.id, job_queue.job_data;

Key Points:

FOR UPDATE locks the selected row to prevent other workers from picking it up.
SKIP LOCKED ensures locked rows are skipped, enabling concurrent workers to operate without waiting.
LIMIT 1 processes one job at a time per worker.

4. Mark Jobs as Completed

Once a worker finishes processing a job, it should update the job’s status,


UPDATE job_queue
SET status = 'completed'
WHERE id = $1; -- Replace $1 with the job ID

5. Delete Old or Processed Jobs

To keep the table clean, you can periodically remove completed jobs,


DELETE FROM job_queue
WHERE status = 'completed' AND created_at < NOW() - INTERVAL '30 days';

Example Worker Implementation

Here’s an example of a worker implemented in Python using psycopg2


import psycopg2
from psycopg2.extras import RealDictCursor

connection = psycopg2.connect("dbname=yourdb user=youruser")

while True:
    with connection.cursor(cursor_factory=RealDictCursor) as cursor:
        cursor.execute(
            """
            WITH next_job AS (
                SELECT id, job_data
                FROM job_queue
                WHERE status = 'pending'
                FOR UPDATE SKIP LOCKED
                LIMIT 1
            )
            UPDATE job_queue
            SET status = 'in_progress'
            FROM next_job
            WHERE job_queue.id = next_job.id
            RETURNING job_queue.id, job_queue.job_data;
            """
        )

        job = cursor.fetchone()
        if job:
            print(f"Processing job {job['id']}: {job['job_data']}")

            # Simulate job processing
            cursor.execute("UPDATE job_queue SET status = 'completed' WHERE id = %s", (job['id'],))

        else:
            print("No jobs available. Sleeping...")
            time.sleep(5)

    connection.commit()

Considerations

Transaction Isolation: Use the REPEATABLE READ or SERIALIZABLE isolation level cautiously to avoid unnecessary locks.
Row Locking: SKIP LOCKED only skips rows locked by other transactions, not those locked within the same transaction.
Performance: Regularly archive or delete old jobs to prevent the table from growing indefinitely. Consider indexing the status column to improve query performance.
Fault Tolerance: Ensure that workers handle crashes or timeouts gracefully. Use a timeout mechanism to revert jobs stuck in the ‘in_progress’ state.
Scaling: Distribute workers across multiple nodes to handle a higher job throughput.
The SKIP LOCKED clause only applies to row-level locks – the required ROW SHARE table-level lock is still taken normally.
Using SKIP LOCKED provides an inconsistent view of the data by design. This is why it’s perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required.

Parotta Salna
Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern
9 January 2025 at 16:51

Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

Parotta Salna

By: Mr.ParottaSalna

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

Fixed Number of Partitions
- The number of partitions is determined during system initialization (e.g., 8 partitions).
- Data is assigned to these partitions based on a consistent hashing algorithm.
Stable Data Mapping
- Each piece of data is permanently mapped to a specific partition.
- This eliminates the need for large-scale data reshuffling when scaling the system.
Adjustable Partition-to-Server Mapping
- Partitions can be reassigned to different servers as the system scales.
- Only the physical location of the partitions changes; the fixed mapping remains intact.
Balanced Load Distribution
- Partitions are distributed evenly across servers to balance the workload.
- Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

Server 1: transactions_p0, transactions_p1
Server 2: transactions_p2, transactions_p3
Server 3: transactions_p4, transactions_p5
Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

Server 1: transactions_p0
Server 2: transactions_p1
Server 3: transactions_p2, transactions_p3
Server 4: transactions_p4
Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

During the migration, transactions_p5 is copied from Server 3 to Server 5.
Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

The copying is limited to the partition being reassigned.
No data within the partition is reorganized.
Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

Parotta Salna
Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres
6 January 2025 at 14:07

Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres

Parotta Salna

By: Mr.ParottaSalna

6 January 2025 at 14:07

Today, I learnt about various locking mechanism to prevent double update. In this blog, i make notes on Shared Lock and Exclusive Lock for my future self.

What Are Locks in Databases?

Locks are mechanisms used by a DBMS to control access to data. They ensure that transactions are executed in a way that maintains the ACID (Atomicity, Consistency, Isolation, Durability) properties of the database. Locks can be classified into several types, including

Shared Locks (S Locks): Allow multiple transactions to read a resource simultaneously but prevent any transaction from writing to it.
Exclusive Locks (X Locks): Allow a single transaction to modify a resource, preventing both reading and writing by other transactions.
Intent Locks: Used to signal the type of lock a transaction intends to acquire at a lower level.
Deadlock Prevention Locks: Special locks aimed at preventing deadlock scenarios.

Shared Lock

A shared lock is used when a transaction needs to read a resource (e.g., a database row or table) without altering it. Multiple transactions can acquire a shared lock on the same resource simultaneously. However, as long as one or more shared locks exist on a resource, no transaction can acquire an exclusive lock on that resource.


-- Transaction A: Acquire a shared lock on a row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Transaction B: Acquire a shared lock on the same row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Both transactions can read the row concurrently
-- Transaction C: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 1;
-- Transaction C will be blocked until Transactions A and B release their locks

Key Characteristics of Shared Locks

1. Concurrent Reads

Shared locks allow multiple transactions to read the same resource at the same time.
This is ideal for operations like SELECT queries that do not modify data.

2. Write Blocking

While a shared lock is active, no transaction can modify the locked resource.
Prevents dirty writes and ensures read consistency.

3. Compatibility

Shared locks are compatible with other shared locks but not with exclusive locks.

When Are Shared Locks Used?

Shared locks are typically employed in read operations under certain isolation levels. For instance,

1. Read Committed Isolation Level:

Shared locks are held for the duration of the read operation.
Prevents dirty reads by ensuring the data being read is not modified by other transactions during the read.

2. Repeatable Read Isolation Level:

Shared locks are held until the transaction completes.
Ensures that the data read during a transaction remains consistent and unmodified.

3. Snapshot Isolation:

Shared locks may not be explicitly used, as the DBMS creates a consistent snapshot of the data for the transaction.

Exclusive Locks

An exclusive lock is used when a transaction needs to modify a resource. Only one transaction can hold an exclusive lock on a resource at a time, ensuring no other transactions can read or write to the locked resource.


-- Transaction X: Acquire an exclusive lock to update a row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 2;
-- Transaction Y: Attempt to read the same row
BEGIN;
SELECT * FROM employees WHERE id = 2;
-- Transaction Y will be blocked until Transaction X completes
-- Transaction Z: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 500 WHERE id = 2;
-- Transaction Z will also be blocked until Transaction X completes

Key Characteristics of Exclusive Locks

1. Write Operations: Exclusive locks are essential for operations like INSERT, UPDATE, and DELETE.

2. Blocking Reads and Writes: While an exclusive lock is active, no other transaction can read or write to the resource.

3. Isolation: Ensures that changes made by one transaction are not visible to others until the transaction is complete.

When Are Exclusive Locks Used?

Exclusive locks are typically employed in write operations or any operation that modifies the database. For instance:

1. Transactional Updates – A transaction that updates a row acquires an exclusive lock to ensure no other transaction can access or modify the row during the update.

2. Table Modifications – When altering a table structure, the DBMS may place an exclusive lock on the entire table.

Benefits of Shared and Exclusive Locks

Benefits of Shared Locks

Consistency in Multi-User Environments – Ensure that data being read is not altered by other transactions, preserving consistency.
Concurrency Support – Allow multiple transactions to read data simultaneously, improving system performance.
Data Integrity – Prevent dirty reads and writes, ensuring that operations yield reliable results.

Benefits of Exclusive Locks

Data Integrity During Modifications – Prevents other transactions from accessing data being modified, ensuring changes are applied safely.
Isolation of Transactions – Ensures that modifications by one transaction are not visible to others until committed.

Limitations and Challenges

Shared Locks

Potential for Deadlocks – Deadlocks can occur if two transactions simultaneously hold shared locks and attempt to upgrade to exclusive locks.
Blocking Writes – Shared locks can delay write operations, potentially impacting performance in write-heavy systems.
Lock Escalation – In systems with high concurrency, shared locks may escalate to table-level locks, reducing granularity and concurrency.

Exclusive Locks

Reduced Concurrency – Exclusive locks prevent other transactions from accessing the locked resource, which can lead to bottlenecks in highly concurrent systems.
Risk of Deadlocks – Deadlocks can occur if two transactions attempt to acquire exclusive locks on resources held by each other.

Lock Compatibility

Parotta Salna
Learning Notes #28 – Unlogged Table in Postgres
2 January 2025 at 17:30

Learning Notes #28 – Unlogged Table in Postgres

Parotta Salna

By: Mr.ParottaSalna

2 January 2025 at 17:30

Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

What Are Unlogged Tables?

Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

Key Characteristics of Unlogged Tables

No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

Drawbacks of Unlogged Tables

Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

Creating an Unlogged Table

Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.


CREATE UNLOGGED TABLE temp_data (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    value INT
);

In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

When to Avoid Unlogged Tables?

If you are working with critical data that needs to be durable and persistent across restarts.
If your application requires data replication, as unlogged tables are not replicated in standby servers.
If your workload involves frequent crash scenarios where data loss cannot be tolerated.

Examples

1. Temporary Storage for processing


CREATE UNLOGGED TABLE etl_staging (
    source_id INT,
    raw_data JSONB,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert raw data into the staging table
INSERT INTO etl_staging (source_id, raw_data)
VALUES 
    (1, '{"key": "value1"}'),
    (2, '{"key": "value2"}');

-- Perform transformations on the data
INSERT INTO final_table (id, key, value)
SELECT source_id, 
       raw_data->>'key' AS key, 
       'processed_value' AS value
FROM etl_staging;

-- Clear the staging table
TRUNCATE TABLE etl_staging;

2. Caching


CREATE UNLOGGED TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id INT,
    last_accessed TIMESTAMP DEFAULT NOW()
);

-- Insert session data
INSERT INTO user_sessions (session_id, user_id)
VALUES 
    (uuid_generate_v4(), 101),
    (uuid_generate_v4(), 102);

-- Update last accessed timestamp
UPDATE user_sessions
SET last_accessed = NOW()
WHERE session_id = 'some-session-id';

-- Delete expired sessions
DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';

Sugi
Basic SQL Queries, Stored Proc, Function in PostgreSQL
2 January 2025 at 06:17

Basic SQL Queries, Stored Proc, Function in PostgreSQL

Sugi

By: Sugirtha

2 January 2025 at 06:17

DDL, DML, DQL Queries:

CREATE TABLE Employees (
    EmployeeID INTEGER PRIMARY KEY, 
    Name VARCHAR(50), 
    Age INTEGER, 
    DepartmentID INTEGER, 
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

INSERT INTO Employees(empid, ename, age, deptid) VALUES(1, 'Kavi', 32, 101), (2, 'Sugi', 30, 102);

UPDATE Employees SET age=31 WHERE Name='Nila';

DELETE FROM Employees WHERE Name='Nila';

SELECT e.*, d.DepartmentName 
FROM Employees e 
JOIN Departments d ON e.DepartmentID = d.DepartmentID;

SELECT e.EmpName AS Employee, m.EmpName AS Manager
FROM Employees e
JOIN Employees m ON e.ManagerID = m.EmpID;

`INNER JOIN`:

Returns only the rows where there is a match between the columns in both tables.
If no match is found, the row is not included in the result.
It’s the most common type of join.

`OUTER JOIN`:

Returns all rows from one or both tables, even if there is no match in the other table.
- LEFT OUTER JOIN (or just LEFT JOIN): Returns all rows from the left table, and the matched rows from the right table. If no match, the result will have NULL values for columns from the right table.
- RIGHT OUTER JOIN (or just RIGHT JOIN): Returns all rows from the right table, and the matched rows from the left table. If no match, the result will have NULL values for columns from the left table.
- FULL OUTER JOIN: Returns all rows from both tables. If there is no match, the result will have NULL values for the non-matching table’s columns.

GROUP BY:

Groups rows that have the same values in specified columns into summary rows (like finding the total count, sum, average, etc.).
It is typically used with aggregate functions such as COUNT(), SUM(), AVG(), MAX(), MIN().

HAVING:

Used to filter records after the GROUP BY has been applied.
It works similarly to the WHERE clause, but WHERE is used for filtering individual rows before grouping, while HAVING filters the grouped results.

SELECT DeptName, COUNT(*)
FROM Employees
GROUP BY DeptName;

DISTINCT:

Used to remove duplicate rows from the result set based on the specified columns.
If you specify only one column, it will return the distinct values of that column.
If you specify multiple columns, the combination of values in those columns will be considered to determine uniqueness.

SELECT DISTINCT DeptName FROM Employees;

SELECT DISTINCT DeptName, EmpName FROM Employees;

Difference between `DELETE` and TRUNCATE:

Removes rows one by one and logs each deletion, which can be slower for large datasets.
You can use a WHERE clause to specify which rows to delete.
Can be rolled back if you’re working within a transaction (assuming no COMMIT has been done).
Can fire triggers if there are any triggers defined on the table (for example, BEFORE DELETE or AFTER DELETE triggers).

`TRUNCATE`:

Removes all rows in the table in one go, without scanning them individually.
Does not support a WHERE clause, so it always deletes all rows.
It’s much faster than DELETE because it doesn’t log individual row deletions (but it does log the deallocation of the table’s data pages).
Cannot be rolled back in most databases (unless in a transaction, depending on the DBMS), and there are no triggers involved.

UNION:

Combines the results of two or more queries.
Removes duplicates: Only unique rows are included in the final result.
It performs a sort operation to eliminate duplicates, which can have a slight performance cost.

UNION ALL:

Also combines the results of two or more queries.
Keeps duplicates: All rows from the queries are included in the final result, even if they are the same.
It doesn’t perform the sort operation, which usually makes it faster than UNION.

SELECT EmpID, EmpName FROM Employees
UNION ALL
SELECT EmpID, EmpName FROM Contractors;

SELECT EmpID, EmpName FROM Employees
UNION 
SELECT EmpID, EmpName FROM Contractors;

COALESCE():

First Non null value will be taken, For ex. in select statement, some names are null, that time some default value can be used or another field value.
SELECT COALESCE(NULL, ‘Hello’, ‘World’);
Output: Hello

INSERT INTO users (name, nickname) VALUES
(‘Alice’, NULL),
(NULL, ‘Bob’),
(NULL, NULL);

SELECT id, COALESCE(name, nickname, ‘Unknown’) AS display_name FROM users;

NULLIF()

NULLIF(expression1, expression2)
Returns null if both expressions or column values are equal, else return first the first column value, ie expression1
SELECT NULLIF(10, 10); — Output: NULL
SELECT NULLIF(10, 20); — Output: 10
SELECT NULLIF(10, NULL) OR — Output: 10
SELECT NULLIF(NULL, 10) — Output: NULL

IF Condition:

The IF statement is used to check conditions and execute SQL code accordingly.

IF condition THEN
    -- Code to execute if the condition is true
ELSIF condition THEN
    -- Code block to execute if another condition is true
ELSE
    -- Code to execute if the condition is false
END IF;

IF NOT FOUND THEN
    RAISE NOTICE 'Employee with ID % not found!', emp_id;
    emp_bonus := 0;
END IF;

CASE WHEN:

The CASE WHEN expression is used for conditional logic within a query (similar to IF but more flexible in SQL).

SELECT 
    name,
    salary,
    CASE 
        WHEN salary > 5000 THEN 'High Salary'
        WHEN salary BETWEEN 3000 AND 5000 THEN 'Average Salary'
        ELSE 'Low Salary'
    END AS salary_category
FROM employees;

FOR LOOP:

DECLARE 
    i INT;
BEGIN
    FOR i IN 1..5 LOOP
        -- Perform an action for each iteration (e.g., insert or update a record)
        INSERT INTO audit_log (action, timestamp) 
        VALUES ('Employee update', NOW());
    END LOOP;
END;

FOR record IN SELECT column1, column2 FROM employees LOOP
-- Code block using record.column1, record.column2
END LOOP;

RAISE – used for printing something (SOP in java)

RAISE NOTICE ‘Employee: %, Salary: %’, emp_name, emp_salary;
RAISE EXCEPTION ‘An error occurred: %’, error_message; — This will print and halt the execution.
RAISE INFO ‘Employee: %, Salary: %’, emp_name, emp_salary;

Stored Procedures in SQL:

A stored procedure is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. Stored procedures are used for:

Modularizing complex SQL logic.
Improving performance by reducing network traffic.
Ensuring code reuse and security (by granting permissions to execute rather than to the tables directly).

Example:

A stored procedure to insert a new employee record:

CREATE PROCEDURE add_employee(emp_name VARCHAR, emp_salary NUMERIC)
LANGUAGE plpgsql AS 
$$ 
BEGIN 
  INSERT INTO employees (name, salary) VALUES (emp_name, emp_salary); 
END; 
$$;

Execution:

CALL add_employee(‘John Doe’, 50000);

Functions in SQL:

A SQL function is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. It is similar to a procedure but returns a single value or table. Functions are typically used for computations or transformations.
Example: A function to calculate the yearly salary:

CREATE FUNCTION calculate_yearly_salary(monthly_salary NUMERIC)
RETURNS NUMERIC
LANGUAGE plpgsql AS 
$$
BEGIN
  RETURN monthly_salary * 12;
END;
$$;

Execution:

SELECT calculate_yearly_salary(5000); OR EXECUTE calculate_yearly_salary(5000); (If we are using inside a trigger)

Key Differences Between Procedures and Functions:

Return Type:

Function: Always returns a value.
Procedure: Does not return a value.

Usage:

Function: Can be used in SQL queries (e.g., SELECT).
Procedure: Called using CALL, cannot be used in SQL queries.

Transaction Control:

Function: Cannot manage transactions.
Procedure: Can manage transactions (e.g., COMMIT, ROLLBACK).

Side Effects:

Function: Should not have side effects (e.g., modifying data).
Procedure: Can modify data and have side effects.

Calling Mechanism:

Procedure: Called using CALL procedure_name().

Function: Called within SQL expressions, like SELECT function_name().

TRIGGER:

A trigger is a special kind of stored procedure that automatically executes (or “fires”) when certain events occur in the database, such as INSERT, UPDATE, or DELETE. Triggers can be used to enforce business rules, validate data, or maintain audit logs.
Key Points:

Types of Triggers:

BEFORE Trigger: Fires before the actual operation (INSERT, UPDATE, DELETE).
AFTER Trigger: Fires after the actual operation.
INSTEAD OF Trigger: Used to override the standard operation, useful in views. (This is in SQL Server only not in postgres)

Trigger Actions: The trigger action can be an operation like logging data, updating related tables, or enforcing data integrity.
Trigger Events: A trigger can be set to fire on certain events, such as when a row is inserted, updated, or deleted.
Trigger Scope: Triggers can be defined to act on either a row (executing once for each affected row) or a statement (executing once for the entire statement).
A trigger can be created to log changes in a Users table whenever a record is updated, or it could prevent deleting a record if certain conditions aren’t met.

Example:

CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
    IF OLD.Age <> NEW.Age THEN
        INSERT INTO EmployeeLogs (EmployeeID, OldAge, NewAge)
        VALUES (OLD.EmployeeID, OLD.Age, NEW.Age);
    END IF;
END;

Example:

CREATE OR REPLACE FUNCTION prevent_employee_delete()
RETURNS TRIGGER AS 
$$
BEGIN
-- Check if the employee is in a protected department (for example, department_id = 10)
  IF OLD.department_id = 10 THEN
     RAISE EXCEPTION 'Cannot delete employee in department 10';
  END IF;
  RETURN OLD;
END;
$$ 
LANGUAGE plpgsql;

-- Attach the function to a trigger
CREATE TRIGGER prevent_employee_delete_trigger
BEFORE DELETE ON Employees
FOR EACH ROW
EXECUTE FUNCTION prevent_employee_delete();

Creates a trigger which is used to log age and related whenever insert, delete, update action on employee rows:

CREATE OR REPLACE FUNCTION log_employee_changes()
RETURNS TRIGGER AS 
$$
BEGIN
-- Handle INSERT operation
  IF (TG_OP = 'INSERT') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, NewAge,    ChangeTime)
    VALUES (NEW.EmployeeID, 'INSERT', NEW.Age, CURRENT_TIMESTAMP);
    RETURN NEW;
     -- Handle UPDATE operation
  ELSIF (TG_OP = 'UPDATE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, NewAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'UPDATE', OLD.Age, NEW.Age,  CURRENT_TIMESTAMP);
    RETURN NEW;
  -- Handle DELETE operation
  ELSIF (TG_OP = 'DELETE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'DELETE', OLD.Age, CURRENT_TIMESTAMP);
    RETURN OLD;
  END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER log_employee_changes_trigger
AFTER INSERT OR UPDATE OR DELETE 
ON Employees
FOR EACH ROW
EXECUTE FUNCTION log_employee_changes();

Step 3: Attach the Trigger to the Employees Table

Now that we have the function, we can attach it to the Employees table to log changes. We’ll create a trigger that fires on insert, update, and delete operations.

TG_OP: This is a special variable in PostgreSQL that holds the operation type (either INSERT, UPDATE, or DELETE).
NEW and OLD: These are references to the row being inserted or updated (NEW) or the row before it was updated or deleted (OLD).
EmployeeChangeLog: This table stores the details of the changes (employee ID, operation type, old and new values, timestamp). – Programmer defined.

What happens when you omit `FOR EACH ROW`?

Statement-Level Trigger: The trigger will fire once per SQL statement, regardless of how many rows are affected. This means it won’t have access to the individual rows being modified.
- For example, if you run an UPDATE statement that affects 10 rows, the trigger will fire once (for the statement) rather than for each of those 10 rows.
No Access to Row-Specific Data: You won’t be able to use OLD or NEW values to capture the individual row’s data. The trigger will just execute as a whole, without row-specific actions.
With FOR EACH ROW: The trigger works on each row affected, and you can track specific changes (e.g., old vs new values).Without FOR EACH ROW: The trigger fires once per statement and doesn’t have access to specific row data.

CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
BEGIN
    -- Perform some operation, but it won't track individual rows.
    INSERT INTO AuditLogs (EventDescription)
    VALUES ('Employees table updated');
END;

NORMALIZATION:

1st NF:

Each column/attribute should have atomic value or indivisible value, ie only one value.
Rows should not be repeated, ie unique rows, there is not necessary to have PKey here.

2nd NF:

Must fulfill the 1st NF. [cadidate key(composite key to form the uniqueness)]
All non-candidate-key columns should be fully dependent on the each attribute/column of the composite keys to form the cadidate key. For ex. If the DB is in denormalalized form (ie before normalization, all tables and values are together in a single table) and the candidate key is (orderId+ProductId), then the non-key(not part of the candidate key) if you take orderdate, orderedStatus, qty, item_price are not dependent on each part of the candidate key ie it depends only orderId, not ProductId, ProductName are not dependent on Order, like that customer details are not dependent on ProductId. So only related items should be there in a table, so the table is partitioned based on the column values, so that each attribute will depend on its candidate key.
So Products goto separate table, orders separate and customers going to separate table.
Primary key is created based for each separated table and ensure that all non-key columns completely dependent on the primary key. Then the foreign key relationships also established to connect all the tablesis not fullly dependent on.

3rd NF:

Must fulfill till 2ndNF.
Remove the transitional dependency (In a decentralized DB, One column value(Order ID) is functionally dependent on another column(Product ID) and OrderId is functionally dependent on the OrderId, so that disturbing one value will affect another row with same column value), so to avoid that separate the table, for Ex. from orders table Sales People’s data is separated.

What is a Transitive Dependency? Let’s break this down with a simple example:
StudentID Department HODName
S001 IT Dr. Rajan
S002 CS Dr. Priya

Primary Key: StudentID
Non-prime attributes: Department, HODName

StudentID → Department (StudentID determines the department).
Department → HODName (Department determines the HOD name). It should be like StudentID only should determine HOD, not the dept. HODName depends indirectly on StudentID through Department.

This is a transitive dependency, and we need to remove it.

A transitive dependency means a non-prime attribute (not part of the candidate key) depends indirectly on the primary key through another non-prime attribute.

Reference: https://www.youtube.com/watch?v=rBPQ5fg_kiY and Learning with the help of chatGPT

Parotta Salna
Learning Notes #20 – Partitioning (data) With Postgres
31 December 2024 at 06:55

Learning Notes #20 – Partitioning (data) With Postgres

Parotta Salna

By: Mr.ParottaSalna

31 December 2024 at 06:55

Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

As the volume of data grows, managing databases efficiently becomes critical and when we understood that vertical scaling has its limits, we have two common strategies to handle large datasets are partitioning and sharding. While they may sound similar, these techniques serve different purposes and are implemented differently. Let’s explore these concepts in detail.

What is Partitioning?

Partitioning involves dividing a large dataset into smaller, manageable segments, known as partitions. Each partition is stored separately but remains part of a single database instance. Partitioning is typically used to improve query performance and manageability.

Types of Partitioning

1. Range Partitioning

Data is divided based on ranges of a column’s values.
Example: A table storing customer orders might partition data by order date: January orders in one partition, February orders in another.

PostgreSQL Example

CREATE TABLE orders (
    id SERIAL,
    customer_id INT,
    order_date DATE NOT NULL,
    PRIMARY KEY (id, order_date) -- Include the partition key
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_jan PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_feb PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

2. Hash Partitioning

A hash function determines the partition where a record will be stored.
Example: Orders can be distributed across partitions based on the hash of the customer ID.

Postgres Example

CREATE TABLE orders (
    id SERIAL ,
    customer_id INT,
    order_date DATE NOT NULL,
    PRIMARY KEY (id, customer_id)
) PARTITION BY HASH (customer_id, id);

CREATE TABLE orders_part_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE orders_part_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

3. List Partitioning

Data is divided based on a predefined list of values.
Example: A table storing sales data could partition based on regions: North, South, East, and West

Postgres Example

CREATE TABLE sales (
    id SERIAL ,
    region TEXT NOT NULL,
    amount NUMERIC,
    PRIMARY KEY (id, region)
) PARTITION BY LIST (region);

CREATE TABLE sales_north PARTITION OF sales
    FOR VALUES IN ('North');

CREATE TABLE sales_south PARTITION OF sales
    FOR VALUES IN ('South');

4. Composite Partitioning

Combines two or more partitioning strategies, such as range and list partitioning.
Example: A table partitioned by range on order date and sub-partitioned by list on region.

Postgres Example

CREATE TABLE orders (
    id SERIAL,
    customer_id INT,
    order_date DATE NOT NULL,
    region TEXT NOT NULL,
    PRIMARY KEY (id, order_date, region)
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY LIST (region);

CREATE TABLE orders_2024_north PARTITION OF orders_2024
    FOR VALUES IN ('North');

CREATE TABLE orders_2024_south PARTITION OF orders_2024
    FOR VALUES IN ('South');

Sugi
Query Optimization
29 December 2024 at 09:35

Query Optimization

Sugi

By: Sugirtha

29 December 2024 at 09:35

Query Optimization:

Query Optimization is the process of improving the performance of a SQL query by reducing the amount of time and resources (like CPU, memory, and I/O) required to execute the query. The goal is to retrieve the desired data as quickly and efficiently as possible.

Important implementation of Query Optimization:

Indexing: Indexes on frequently used columns: As you mentioned, indexing columns that are part of the WHERE, JOIN, or ORDER BY clauses can significantly improve performance. For example, if you’re querying a salary column frequently, indexing it can speed up those queries.
Composite indexes: If a query filters by multiple columns, a composite index on those columns might improve performance. For instance, INDEX (first_name, last_name) could be more efficient than two separate indexes on first_name and last_name.
Instead of SELECT * FROM, can use the required columns and use of LIMIT for the required no. of rows.
Optimizing JOIN Operations: Use appropriate join types: For example, avoid OUTER JOIN if INNER JOIN would suffice. Redundant or unnecessary joins increase query complexity and processing time.
Use of EXPLAIN to Analyze Query Plan:
Running EXPLAIN before a query allows you to understand how the database is executing it. You can spot areas where indexes are not being used, unnecessary full table scans are happening, or joins are inefficient.

How to Implement Query Optimization:

Use Indexes:

Create indexes on columns that are frequently queried or used in JOIN, WHERE, or ORDER BY clauses. For example, if you frequently query a column like user_id, an index on user_id will speed up lookups. Use multi-column indexes for queries involving multiple columns.
CREATE INDEX idx_user_id ON users(user_id);

2. Rewrite Queries:

Avoid using SELECT * and instead select only the necessary columns.
Break complex queries into simpler ones and use temporary tables or Common Table Expressions (CTEs) if needed.
SELECT name, age FROM users WHERE age > 18;

3. Use Joins Efficiently:

Ensure that you are using the most efficient join type for your query (e.g., prefer INNER JOIN over OUTER JOIN when possible).
Join on indexed columns to speed up the process.

4. Optimize WHERE Clauses:

Make sure conditions in WHERE clauses are selective and reduce the number of rows as early as possible.
Use AND and OR operators appropriately to filter data early in the query.

5. Limit the Number of Rows:

Use the LIMIT clause when dealing with large datasets to fetch only a required subset of data.
Avoid retrieving unnecessary data from the database.

6. Avoid Subqueries When Possible:

Subqueries can be inefficient because they often lead to additional scans of the same data. Use joins instead of subqueries when possible.
If you must use subqueries, try to write them in a way that they don’t perform repeated calculations.

7. Analyze Execution Plans:

Use EXPLAIN to see how the database is executing your query. This will give you insights into whether indexes are being used, how tables are being scanned, etc.
Example:

EXPLAIN SELECT * FROM users WHERE age > 18;

8. Use Proper Data Types:

Choose the most efficient data types for your columns. For instance, use INTEGER for numeric values rather than VARCHAR, which takes more space and requires more processing.

9. Avoid Functions on Indexed Columns:

Using functions like UPPER(), LOWER(), or DATE() on indexed columns in WHERE clauses can prevent the database from using indexes effectively.
Instead, try to perform transformations outside the query or ensure indexes are used.

10. Database Configuration:

Ensure the database system is configured properly for the hardware it’s running on. For example, memory and cache settings can significantly affect query performance.

Example of Optimized Query:

Non-Optimized Query:

SELECT * FROM orders
WHERE customer_id = 1001
AND order_date > '2023-01-01';

This query might perform a full table scan if customer_id and order_date are not indexed.

Optimized Query:

CREATE INDEX idx_customer_order_date ON orders(customer_id, order_date);

SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 1001
AND order_date > '2023-01-01';

In this optimized version, an index on customer_id and order_date helps the database efficiently filter the rows without scanning the entire table.

Reference : Learnt from ChatGPT

Sugi
SQL – Postgres – Few Advance Topics
29 December 2024 at 09:31

SQL – Postgres – Few Advance Topics

Sugi

By: Sugirtha

29 December 2024 at 09:31

The order of execution in a SQL query:

FROM and/or JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT nad/or OFFSET

Command Types:

References : Aysha Beevi

CAST()

CAST is used to typecast or we can use ::target data type.

SELECT ‘The current date is: ‘ || CURRENT_DATE::TEXT;
SELECT ‘2024-12-21’::DATE::TEXT;
SELECT CAST(‘2024-12-21’ AS DATE);

|| –> Concatenation operator

DATE functions:

SELECT CURRENT_DATE; — Output: 2024-12-21
SELECT CURRENT_TIME; — Output: 09:15:34.123456+05:30
SELECT NOW(); — Output: 2024-12-21 09:15:34.123456+05:30
SELECT AGE(‘2020-01-01’, ‘2010-01-01’); — Output: 10 years 0 mons 0 days
SELECT AGE(‘1990-05-15’); — Output: 34 years 7 mons 6 days (calculated from NOW())
SELECT EXTRACT(YEAR FROM NOW()); — Output: 2024
SELECT EXTRACT(MONTH FROM CURRENT_DATE); — Output: 12
SELECT EXTRACT(DAY FROM TIMESTAMP ‘2024-12-25 10:15:00’); — Output: 25

The DATE_TRUNC() function truncates a date or timestamp to the specified precision. This means it “resets” smaller parts of the date/time to their starting values.
SELECT DATE_TRUNC(‘month’, TIMESTAMP ‘2024-12-21 10:45:30’);
— Output: 2024-12-01 00:00:00 –> The ‘month’ precision resets the day to the 1st, and the time to 00:00:00.
SELECT DATE_TRUNC(‘year’, TIMESTAMP ‘2024-12-21 10:45:30’);
— Output: 2024-01-01 00:00:00
SELECT DATE_TRUNC(‘day’, TIMESTAMP ‘2024-12-21 10:45:30’);
— Output: 2024-12-21 00:00:00

SELECT NOW() + INTERVAL ‘1 year’;
— Output: Current timestamp + 1 year
SELECT CURRENT_DATE – INTERVAL ’30 days’;
— Output: Today’s date – 30 days
SELECT NOW() + INTERVAL ‘2 hours’;
— Output: Current timestamp + 2 hours
SELECT NOW() + INTERVAL ‘1 year’ + INTERVAL ‘3 months’ – INTERVAL ’15 days’;

Window Functions

This is the function that will operate over the specified window. Common window functions include ROW_NUMBER(), RANK(), SUM(), AVG(), etc

.PARTITION BY: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.ORDER BY: (Optional) Orders the rows in each partition before the window function is applied.

window_function() OVER (--RANK() or SUM() etc. can come in window_function
    PARTITION BY column_name(s)
    ORDER BY column_name(s)
 );

SELECT 
    department_id,
    employee_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) AS running_total
FROM employees;

CURSOR:

DO $$
DECLARE
emp_name VARCHAR;
emp_salary DECIMAL;
emp_cursor CURSOR FOR SELECT name, salary FROM employees;
BEGIN
OPEN emp_cursor;
LOOP
FETCH emp_cursor INTO emp_name, emp_salary;
EXIT WHEN NOT FOUND; — Exit the loop when no rows are left
RAISE NOTICE ‘Employee: %, Salary: %’, emp_name, emp_salary;
END LOOP;
CLOSE emp_cursor;

Basic Data Types in PostgreSQL

TEXT, VARCHAR, CHAR: Working with strings.
INTEGER, BIGINT, NUMERIC: Handling numbers.
DATE, TIMESTAMP: Date and time handling.

OVER CLAUSE

In PostgreSQL, the OVER() clause is used in window functions to define a window of rows over which a function operates. Just create a serial number (Row_number) from 1 (Rows are already ordered by salary desc)
SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
WHERE row_num <= 5;

RANK()

Parition the table records based on the dept id, then inside each partition order by salary desc with rank 1,2,3… – In RANK() if same salary then RANK repeats.

SELECT department_id, name, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
Output:
department_id name salary rank
101 Charlie 70,000 1
101 Alice 50,000 2
101 Frank 50,000 2
102 Eve 75,000 1
102 Bob 60,000 2
103 David 55,000 1

Divides employees into 3 equal salary buckets (quartiles).
SELECT id, name, salary,
NTILE(3) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
id name salary quartile
5 Eve 75,000 1
3 Charlie 70,000 1
2 Bob 60,000 2
4 David 55,000 2
1 Alice 50,000 3
6 Frank 50,000 3
Retrieves the first name in each department based on descending salary.
SELECT department_id, name, salary,
FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS top_earner
FROM employees;
Output:
department_id name salary top_earner
101 Charlie 70,000 Charlie
101 Alice 50,000 Charlie
101 Frank 50,000 Charlie
102 Eve 75,000 Eve
102 Bob 60,000 Eve
103 David 55,000 David

First from table will be taken, then WHERE condition will be applied

In the WHERE clause directly you cannot call the RANK(), it should be stored in result set, from there only we can call it. So only RANK() will get executed ie Windows CTE (Common Table Expression), that’s why first the CTE will get executed and stored in a temp result set, then SELECT from that result set.
Below we gave in the subquery, so it will get executed and then that value is getting used by the outer query.

In each dept top earner name with his name and salary (consider the above table employees)
SELECT department_id, name, salary
FROM (
SELECT department_id, name, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
) ranked_employees
WHERE rank = 1;

department_id name salary
101 Charlie 70,000
102 Eve 75,000
103 David 55,000

Resultset – here RankedSalaries is Resultset

WITH RankedSalaries AS (
SELECT salary, RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
)
SELECT salary
FROM RankedSalaries WHERE rank = 2;

Here, RankedSalaries is a temporary result set or CTE (Common Table Expression)

Reference: Learnt from ChatGPT and Picture from Ms.Aysha

04. தரவு ஒருங்கிணைவு (Data Integrity)

Vijayan S

By: Vijayan S

20 November 2024 at 12:13

தரவு ஒருங்கிணைவு (Data Integrity)

தரவு ஒருங்கிணைவு என்பது தரவுத்தளத்தில் உள்ள தரவுகள் சரியானதாகவும், துல்லியமாகவும், நிலைத்தன்மையுடனும் இருப்பதை உறுதி செய்யும் செயல்முறையாகும். இது தரவுத்தளத்தின் நம்பகத்தன்மையை மேம்படுத்துகிறது மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.

தரவு ஒருங்கிணைவின் முக்கிய வகைகள்:

பண்பு ஒருங்கிணைவு (Domain Integrity):
- ஒவ்வொரு பத்தியும் (column) அதற்கு ஒதுக்கப்பட்ட தரவு வகையை (data type) பின்பற்ற வேண்டும்.
- உதாரணமாக, வயது பத்தியில் எண்களையே உள்ளிட முடியும், எழுத்துக்களை உள்ளிட முடியாது.
நிறுவன ஒருங்கிணைவு (Entity Integrity):
- ஒவ்வொரு அட்டவணையிலும் (table) உள்ள ஒவ்வொரு பதிவும் (record) தனித்துவமான முதன்மை விசையைக் (primary key) கொண்டிருக்க வேண்டும்.
- உதாரணமாக, ஒரு பள்ளியின் மாணவர் பதிவேட்டில், மாணவர் கல்வி எண் (roll number) முதன்மை விசையாக இருக்கலாம்.
குறிப்பு ஒருங்கிணைவு (Referential Integrity):
- ஒரு அட்டவணையில் உள்ள வெளிநாட்டு விசை (foreign key) மற்றொரு அட்டவணையின் முதன்மை விசையை குறிக்க வேண்டும்.
- உதாரணமாக, ஒரு விற்பனை அட்டவணையில் உள்ள வாடிக்கையாளர் ID வெளிநாட்டு விசையாக இருந்து, வாடிக்கையாளர் விவரங்கள் அட்டவணையின் வாடிக்கையாளர் ID முதன்மை விசையுடன் பொருந்த வேண்டும்.
துணை ஒருங்கிணைவு (Tuple Integrity):
- ஒவ்வொரு அட்டவணையிலும் உள்ள ஒவ்வொரு பதிவும் தனித்துவமானதாக இருக்க வேண்டும்.
- உதாரணமாக, ஒரு ஊழியர் அட்டவணையில், இரண்டு ஊழியர்களுக்கும் ஒரே ஊழியர் ID இருக்க முடியாது.

தரவு ஒருங்கிணைவு நன்மைகள்:

தரவு துல்லியம் மற்றும் நம்பகத்தன்மையை மேம்படுத்துகிறது.
தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.
தரவுத்தள செயல்திறனை மேம்படுத்துகிறது.
தரவு பாதுகாப்பை அதிகரிக்கிறது.

தரவு ஒருங்கிணைவு என்பது தரவுத்தள மேலாண்மை அமைப்புகளில் (DBMS) மிக முக்கியமான அம்சமாகும். இது தரவுத்தளத்தின் சரியான செயல்பாட்டை உறுதி செய்து, தரவு இழப்பு மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தவிர்க்க உதவுகிறது.

Normal view

Why This Bootcamp?

What You’ll Learn

Bootcamp Highlights

Check our previous Postgres session

Details at a Glance

Why You Shouldn’t Miss This

Code:

Code:

Summary :

Reference :

RETURNING clause

Summary

Reference:

Reference :

Code:

Code:

Summary :

Reference :

RETURNING clause

Summary

Reference:

Reference :

What You’ll Learn?

Who Should Attend?

RSVP Here

Prerequisite

Introduction

Technologies Used

What I Learned from This Project

Hosting

Links

Why Use PostgreSQL as a Queue?

Implementing a Queue with SKIP LOCKED

1. Create a Queue Table

2. Insert Jobs into the Queue

3. Fetch Jobs for Processing with SKIP LOCKED

4. Mark Jobs as Completed

5. Delete Old or Processed Jobs

Example Worker Implementation

Considerations

Problem Statement

Solution

What Is Fixed Partitioning?

Key Features of Fixed Partitioning

Naive Example

2. Inserting Data

3. Querying Data

4. Scaling by Adding Servers

Initial Setup:

Adding a New Server:

Benefits:

What happens when a new server is added then. Don’t we need to copy the data ?

What Are Locks in Databases?

Shared Lock

Key Characteristics of Shared Locks

When Are Shared Locks Used?

Exclusive Locks

Key Characteristics of Exclusive Locks

When Are Exclusive Locks Used?

Benefits of Shared and Exclusive Locks

Benefits of Shared Locks

Benefits of Exclusive Locks

Limitations and Challenges

Shared Locks

Exclusive Locks

Lock Compatibility

What Are Unlogged Tables?

Key Characteristics of Unlogged Tables

Drawbacks of Unlogged Tables

Creating an Unlogged Table

When to Avoid Unlogged Tables?

Examples

1. Temporary Storage for processing

2. Caching

DDL, DML, DQL Queries:

INNER JOIN:

OUTER JOIN:

Difference between DELETE and TRUNCATE:

TRUNCATE:

`INNER JOIN`:

`OUTER JOIN`:

Difference between `DELETE` and TRUNCATE:

`TRUNCATE`:

What happens when you omit `FOR EACH ROW`?