Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

🎯 PostgreSQL Zero to Hero with Parottasalna – 2 Day Bootcamp (FREE!) 🚀

2 March 2025 at 07:09

Databases power the backbone of modern applications, and PostgreSQL is one of the most powerful open-source relational databases trusted by top companies worldwide. Whether you’re a beginner or a developer looking to sharpen your database skills, this FREE bootcamp will take you from Zero to Hero in PostgreSQL!

What You’ll Learn?

✅ PostgreSQL fundamentals & installation

✅ Postgres Architecture
✅ Writing optimized queries
✅ Indexing & performance tuning
✅ Transactions & locking mechanisms
✅ Advanced joins, CTEs & subqueries
✅ Real-world best practices & hands-on exercises

This intensive hands on bootcamp is designed for developers, DBAs, and tech enthusiasts who want to master PostgreSQL from scratch and apply it in real-world scenarios.

Who Should Attend?

🔹 Beginners eager to learn databases
🔹 Developers & Engineers working with PostgreSQL
🔹 Anyone looking to optimize their SQL skills

📅 Date: March 22, 23 -> (Moved to April 5, 6)
⏰ Time: Will be finalized later.
📍 Location: Online
💰 Cost: 100% FREE 🎉

🔗 RSVP Here

Prerequisite

  1. Checkout this playlist of our previous postgres session https://www.youtube.com/playlist?list=PLiutOxBS1Miy3PPwxuvlGRpmNo724mAlt

🎉 This bootcamp is completely FREE – Learn without any cost! 🎉

💡 Spots are limited – RSVP now to reserve your seat!

Effortless Data Storage with LocalBase and IndexedDB

22 February 2025 at 04:42

IndexedDB is a powerful client-side database API for storing structured data in browsers. However, its API is complex, requiring transactions, object stores, and cursors to manage data. LocalBase simplifies IndexedDB by providing an intuitive, promise-based API.

In this blog, we’ll explore LocalBase, its features, and how to use it effectively in web applications.

What is LocalBase?

LocalBase is an easy-to-use JavaScript library that simplifies IndexedDB interactions. It provides a syntax similar to Firestore, making it ideal for developers familiar with Firebase.

✅ Key Features

  • Promise based API
  • Simple CRUD operations
  • No need for manual transaction handling
  • Works seamlessly in modern browsers

Installation

You can install LocalBase via npm or use it directly in a script tag

Using npm

npm install localbase

Using CDN

https://cdn.jsdelivr.net/npm/localbase/dist/localbase.min.js

Getting Started with LocalBase

First, initialize the database

let db = new Localbase('myDatabase')




Adding Data

You can add records to a collection,

db.collection('users').add({
  id: 1,
  name: 'John Doe',
  age: 30
})

Fetching Data

Retrieve all records from a collection

db.collection('users').get().then(users => {
  console.log(users)
})

Updating Data

To update a record

db.collection('users').doc({ id: 1 }).update({
  age: 31
})

Deleting Data

Delete a specific document

db.collection('users').doc({ id: 1 }).delete()

Or delete the entire collection

db.collection('users').delete()

Advanced LocalBase Functionalities

1. Updating Data in LocalBase

LocalBase allows updating specific fields in a document without overwriting the entire record.

Basic Update Example

db.collection('users').doc({ id: 1 }).update({
  age: 31
})

🔹 This updates only the age field while keeping other fields unchanged.

Updating Multiple Fields

db.collection('users').doc({ id: 1 }).update({
  age: 32,
  city: 'New York'
})

🔹 The city field is added, and age is updated.

Handling Non-Existing Documents

If the document doesn’t exist, LocalBase won’t create it automatically. You can handle this with .get()

db.collection('users').doc({ id: 2 }).get().then(user => {
  if (user) {
    db.collection('users').doc({ id: 2 }).update({ age: 25 })
  } else {
    db.collection('users').add({ id: 2, name: 'Alice', age: 25 })
  }
})

2. Querying and Filtering Data

You can fetch documents based on conditions.

Get All Documents in a Collection

db.collection('users').get().then(users => {
  console.log(users)
})

Get a Single Document

db.collection('users').doc({ id: 1 }).get().then(user => {
  console.log(user)
})

Filter with Conditions

db.collection('users').get().then(users => {
  let filteredUsers = users.filter(user => user.age > 25)
  console.log(filteredUsers)
})

🔹 Since LocalBase doesn’t support native where queries, you need to filter manually.

3. Handling Transactions

LocalBase handles transactions internally, so you don’t need to worry about opening and closing them. However, you should use .then() to ensure operations complete before the next action.

Example: Sequential Updates

db.collection('users').doc({ id: 1 }).update({ age: 32 }).then(() => {
  return db.collection('users').doc({ id: 1 }).update({ city: 'Los Angeles' })
}).then(() => {
  console.log('Update complete')
})

🔹 This ensures that the age field is updated before adding the city field.

4. Clearing and Deleting Data

Deleting a Single Document

db.collection('users').doc({ id: 1 }).delete()

Deleting an Entire Collection

db.collection('users').delete()

Clearing All Data

db.delete()

🔹 This removes everything from the database!

5. Using LocalBase in Real-World Scenarios

Offline Caching for a To-Do List

db.collection('tasks').add({ id: 1, title: 'Buy groceries', completed: false })

Later, when the app is online, you can sync it with a remote database.

User Preferences Storage

db.collection('settings').doc({ theme: 'dark' }).update({ fontSize: '16px' })

🔹 Stores user settings locally, ensuring a smooth UX.

LocalBase makes IndexedDB developer-friendly with


✅ Easy updates without overwriting entire documents
✅ Simple filtering with JavaScript functions
✅ Automatic transaction handling
✅ Efficient storage for offline-first apps

For more details, check out the official repository:
🔗 GitHub – LocalBase

Got Inspired from 450dsa.com .

The Pros and Cons of LocalStorage in Modern Web Development

20 February 2025 at 16:26

Introduction

The Web storage api is a set of mechanisms that enable browsers to store key-value pairs. Before HTML5, application data had to be sorted in cookies, included in every server request. Its intended to be far more user-friendly than using cookies.

Web storage is more secure, and large amounts of data can be stored locally, without affecting website performance.

There are 2 types of web storage,

  1. Local Storage
  2. Session Storage

We already have cookies. Why additional objects?

Unlike cookies, web storage objects are not sent to server with each request. Because of that, we can store much more. Most modern browsers allow at least 5 megabytes of data (or more) and have settings to configure that.

Also unlike cookies, the server can’t manipulate storage objects via HTTP headers. Everything’s done in JavaScript.The storage is bound to the origin (domain/protocol/port triplet). That is, different protocols or subdomains infer different storage objects, they can’t access data from each other.

In this guide, you will learn/refresh about LocalStorage.

LocalStorage

The localStorage is property of the window (browser window object) interface allows you to access a Storage object for the Document’s origin; the stored data is saved across browser sessions.

  1. Data is kept for a longtime in local storage (with no expiration date.). This could be one day, one week, or even one year as per the developer preference ( Data in local storage maintained even if the browser is closed).
  2. Local storage only stores strings. So, if you intend to store objects, lists or arrays, you must convert them into a string using JSON.stringfy()
  3. Local storage will be available via the window.localstorage property.
  4. What’s interesting about them is that the data survives a page refresh (for sessionStorage) and even a full browser restart (for localStorage).

Functionalities

// setItem normal strings
window.localStorage.setItem("name", "goku");

// getItem 
const name = window.localStorage.getItem("name");
console.log("name from localstorage, "+name);

// Storing an Object without JSON stringify

const data = {
  "commodity":"apple",
  "price":43
};
window.localStorage.setItem('commodity', data);
var result = window.localStorage.getItem('commodity');
console.log("Retrived data without jsonified, "+ result);

// Storing an object after converting to JSON string. 
var jsonifiedString = JSON.stringify(data);
window.localStorage.setItem('commodity', jsonifiedString);
var result = window.localStorage.getItem('commodity');
console.log("Retrived data after jsonified, "+ result);

// remove item 
window.localStorage.removeItem("commodity");
var result = window.localStorage.getItem('commodity');
console.log("Data after removing the key "+ result);

//length
console.log("length of local storage " + window.localStorage.length);

// clear
window.localStorage.clear();
console.log("length of local storage - after clear " + window.localStorage.length);

When to use Local Storage

  1. Data stored in Local Storage can be easily accessed by third party individuals.
  2. So its important to know that any sensitive data must not sorted in Local Storage.
  3. Local Storage can help in storing temporary data before it is pushed to the server.
  4. Always clear local storage once the operation is completed.

Where the local storage is saved ?

Windows

  • Firefox: C:\Users\\AppData\Roaming\Mozilla\Firefox\Profiles\\webappsstore.sqlite, %APPDATA%\Mozilla\Firefox\Profiles\\webappsstore.sqlite
  • Chrome: %LocalAppData%\Google\Chrome\User Data\Default\Local Storage\

Linux

  • Firefox: ~/.mozilla/firefox//webappsstore.sqlite
  • Chrome: ~/.config/google-chrome/Default/Local Storage/

Mac

  • Firefox: ~/Library/Application Support/Firefox/Profiles//webappsstore.sqlite, ~/Library/Mozilla/Firefox/Profiles//webappsstore.sqlite
  • Chrome: ~/Library/Application Support/Google/Chrome//Local Storage/, ~/Library/Application Support/Google/Chrome/Default/Local Storage/

Downside of Localstorage

The majority of local storage’s drawbacks aren’t really significant. You may still not use it, but your app will run a little slower and you’ll experience a tiny developer inconvenience. Security, however, is distinct. Knowing and understanding the security model of local storage is crucial since it will have a significant impact on your website in ways you might not have anticipated.

Local storage also has the drawback of being insecure. In no way! Everyone who stores sensitive information in local storage, such as session data, user information, credit card information (even momentarily! ), and anything else you wouldn’t want shared publicly on social media, is doing it incorrectly.

The purpose of local storage in a browser for safe storage was not intended. It was intended to be a straightforward key/value store for strings only that programmers could use to create somewhat more complicated single page apps.

General Preventions

  1. For example, if we are using third party JavaScript libraries and they are injected with some scripts which extract the storage objects, our storage data won’t be secure anymore. Therefore it’s not recommended to save sensitive data as
    • Username/Password
    • Credit card info
    • JWT tokens
    • API keys
    • Personal info
    • Session ids
  2. Do not use the same origin for multiple web applications. Instead, use subdomains since otherwise, the storage will be shared with all. Reason is, for each subdomain it will have an unique localstorage; and they can’t communicate between subdomain instances.
  3. Once some data are stored in Local storage, the developers don’t have any control over it until the user clears it. If you want the data to be removed once the session ends, use SessionStorage.
  4. Validate, encode and escape data read from browser storage
  5. Encrypt data before saving

Avoid Cache Pitfalls: Key Problems and Fixes

16 February 2025 at 09:22

Caching is an essential technique for improving application performance and reducing the load on databases. However, improper caching strategies can lead to serious issues.

I got inspired from ByteByteGo https://www.linkedin.com/posts/bytebytego_systemdesign-coding-interviewtips-activity-7296767687978827776-Dizz

In this blog, we will discuss four common cache problems: Thundering Herd Problem, Cache Penetration, Cache Breakdown, and Cache Crash, along with their causes, consequences, and solutions.

  1. Thundering Herd Problem
    1. What is it?
    2. Example Scenario
    3. Solutions
  2. Cache Penetration
    1. What is it?
    2. Example Scenario
    3. Solutions
  3. Cache Breakdown
    1. What is it?
    2. Example Scenario
    3. Solutions
  4. Cache Crash
    1. What is it?
    2. Example Scenario
    3. Solutions

Thundering Herd Problem

What is it?

The Thundering Herd Problem occurs when a large number of keys in the cache expire at the same time. When this happens, all requests bypass the cache and hit the database simultaneously, overwhelming it and causing performance degradation or even a system crash.

Example Scenario

Imagine an e-commerce website where product details are cached for 10 minutes. If all the products’ cache expires at the same time, thousands of users sending requests will cause an overwhelming load on the database.

Solutions

  1. Staggered Expiration: Instead of setting a fixed expiration time for all keys, introduce a random expiry variation.
  2. Allow Only Core Business Queries: Limit direct database access only to core business data, while returning stale data or temporary placeholders for less critical data.
  3. Lazy Rebuild Strategy: Instead of all requests querying the database, the first request fetches data and updates the cache while others wait.
  4. Batch Processing: Queue multiple requests and process them in batches to reduce database load.

Cache Penetration

What is it?

Cache Penetration occurs when requests are made for keys that neither exist in the cache nor in the database. Since these requests always hit the database, they put excessive pressure on the system.

Example Scenario

A malicious user could attempt to query random user IDs that do not exist, forcing the system to repeatedly query the database and skip the cache.

Solutions

  1. Cache Null Values: If a key does not exist in the database, store a null value in the cache to prevent unnecessary database queries.
  2. Use a Bloom Filter: A Bloom filter helps check whether a key exists before querying the database. If the Bloom filter does not contain the key, the request is discarded immediately.
  3. Rate Limiting: Implement request throttling to prevent excessive access to non-existent keys.
  4. Data Prefetching: Predict and load commonly accessed data into the cache before it is needed.

Cache Breakdown

What is it?

Cache Breakdown is similar to the Thundering Herd Problem, but it occurs specifically when a single hot key (a frequently accessed key) expires. This results in a surge of database queries as all users try to retrieve the same data.

Example Scenario

A social media platform caches trending hashtags. If the cache expires, millions of users will query the same hashtag at once, hitting the database hard.

Solutions

  1. Never Expire Hot Keys: Keep hot keys permanently in the cache unless an update is required.
  2. Preload the Cache: Refresh the cache asynchronously before expiration by setting a background task to update the cache regularly.
  3. Mutex Locking: Ensure only one request updates the cache, while others wait for the update to complete.
  4. Double Buffering: Maintain a secondary cache layer to serve requests while the primary cache is being refreshed.

Cache Crash

What is it?

A Cache Crash occurs when the cache service itself goes down. When this happens, all requests fall back to the database, overloading it and causing severe performance issues.

Example Scenario

If a Redis instance storing session data for a web application crashes, all authentication requests will be forced to hit the database, leading to a potential outage.

Solutions

  1. Cache Clustering: Use a cluster of cache nodes instead of a single instance to ensure high availability.
  2. Persistent Storage for Cache: Enable persistence modes like Redis RDB or AOF to recover data quickly after a crash.
  3. Automatic Failover: Configure automated failover with tools like Redis Sentinel to ensure availability even if a node fails.
  4. Circuit Breaker Mechanism: Prevent the application from directly accessing the database if the cache is unavailable, reducing the impact of a crash.
class CircuitBreaker:
    def __init__(self, failure_threshold=5):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
    
    def call(self, func, *args, **kwargs):
        if self.failure_count >= self.failure_threshold:
            return "Service unavailable"
        try:
            return func(*args, **kwargs)
        except Exception:
            self.failure_count += 1
            return "Error"

Caching is a powerful mechanism to improve application performance, but improper strategies can lead to severe bottlenecks. Problems like Thundering Herd, Cache Penetration, Cache Breakdown, and Cache Crash can significantly degrade system reliability if not handled properly.

About SQL

By: vsraj80
31 January 2025 at 16:15

Structured Query Language

Relational Data-Base Management System

SQL is a Free Open Source Software

MySQL Client – front end MySQL Server – back end

Functions of SQL Client

  1. Validating the password and authenticating

2. Receiving input from client end and convert it as token and send to sql server

3. Getting the results from SQL server to user

Functions of SQL Server

SQL server consists 2 Major part

Receiving the request from client and return the response after processing

1.Management Layer

a.Decoding the data

b.Validating and parsing(analyzing) the data

c.Sending the catched queries to Storage Engine

2.Storage Engine

a.Managing Database,tables,indexes

b.sending the data to other shared SQL Server

Install SQL in Ubuntu

sudo apt-get install mysql-server

To make secure configure as below

sudo mysql_secure_installation

1.It used to removes Anonymous users

2.Allow the root only from the local host

3.Removing the test database

MySQL Configuration options

/etc/mysql is the MySQL configuration directory

To Start MySQL

sudo service mysql start

To Stop MySQL

sudo service mysql stop

To Restart MySQL

sudo service mysql restart

MySQL Clients

Normally we will use mysql in command line

But in linux we can access through following GUI

MySQL Work Bench

sudo apt-­get install MySQL­-workbench

MySQL Navigator

sudo apt­-get install MySQL­-navigator

EMMA

sudo apt­-get install emma

PHP MYAdmin

sudo aptitude install phpmyadmin

MySQL Admin

sudo apt­-get install MySQL­-admin

Kinds of MySQL

1.GUI based Desktop based application

2.Web based application

3.Shell based application -(text-only based applications)

To connect the server with MySQL client

mysql -u root -p

To connect with a particular host , user name, database name

mysql - h

mysql -u

mysql -p

if not given the above host/username/password , it will take default local server/ uinux user name and without password for authentication.

to find more options about mysql

mysql -?

to disconnect the client with server

exit

from page 33 to 39 need to understand and read agan.

Learning Notes #63 – Change Data Capture. What does it do ?

19 January 2025 at 16:22

Few days back i came across a concept of CDC. Like a notifier of database events. Instead of polling, this enables event to be available in a queue, which can be consumed by many consumers. In this blog, i try to explain the concepts, types in a theoretical manner.

You run a library. Every day, books are borrowed, returned, or new books are added. What if you wanted to keep a live record of all these activities so you always know the exact state of your library?

This is essentially what Change Data Capture (CDC) does for your databases. It’s a way to track changes (like inserts, updates, or deletions) in your database tables and send them to another system, like a live dashboard or a backup system. (Might be a bad example. Don’t lose hope. Continue …)

CDC is widely used in modern technology to power,

  • Real-Time Analytics: Live dashboards that show sales, user activity, or system performance.
  • Data Synchronization: Keeping multiple databases or microservices in sync.
  • Event-Driven Architectures: Triggering notifications, workflows, or downstream processes based on database changes.
  • Data Pipelines: Streaming changes to data lakes or warehouses for further processing.
  • Backup and Recovery: Incremental backups by capturing changes instead of full data dumps.

It’s a critical part of tools like Debezium, Kafka, and cloud services such as AWS Database Migration Service (DMS) and Azure Data Factory. CDC enables companies to move towards real-time data-driven decision-making.

What is CDC?

CDC stands for Change Data Capture. It’s a technique that listens to a database and captures every change that happens in it. These changes can then be sent to other systems to,

  • Keep data in sync across multiple databases.
  • Power real-time analytics dashboards.
  • Trigger notifications for certain database events.
  • Process data streams in real time.

In short, CDC ensures your data is always up-to-date wherever it’s needed.

Why is CDC Useful?

Imagine you have an online store. Whenever someone,

  • Places an order,
  • Updates their shipping address, or
  • Cancels an order,

you need these changes to be reflected immediately across,

  • The shipping system.
  • The inventory system.
  • The email notification service.

Instead of having all these systems query the database (this is one of main reasons) constantly (which is slow and inefficient), CDC automatically streams these changes to the relevant systems.

This means,

  1. Real-Time Updates: Systems receive changes instantly.
  2. Improved Performance: Your database isn’t overloaded with repeated queries.
  3. Consistency: All systems stay in sync without manual intervention.

How Does CDC Work?

Note: I haven’t yet tried all these. But conceptually having a feeling.

CDC relies on tracking changes in your database. There are a few ways to do this,

1. Query-Based CDC

This method repeatedly checks the database for changes. For example:

  • Every 5 minutes, it queries the database: “What changed since my last check?”
  • Any new or modified data is identified and processed.

Drawbacks: This can miss changes if the timing isn’t right, and it’s not truly real-time (Long Polling).

2. Log-Based CDC

Most modern databases (like PostgreSQL or MySQL) keep logs of every operation. Log-based CDC listens to these logs and captures changes as they happen.

Advantages

  • It’s real-time.
  • It’s lightweight since it doesn’t query the database directly.

3. Trigger-Based CDC

In this method, the database uses triggers to log changes into a separate table. Whenever a change occurs, a trigger writes a record of it.

Advantages: Simple to set up.

Drawbacks: Can slow down the database if not carefully managed.

Tools That Make CDC Easy

Several tools simplify CDC implementation. Some popular ones are,

  1. Debezium: Open-source and widely used for log-based CDC with databases like PostgreSQL, MySQL, and MongoDB.
  2. Striim: A commercial tool for real-time data integration.
  3. AWS Database Migration Service (DMS): A cloud-based CDC service.
  4. StreamSets: Another tool for real-time data movement.

These tools integrate with databases, capture changes, and deliver them to systems like RabbitMQ, Kafka, or cloud storage.

To help visualize CDC, think of,

  • Social Media Feeds: When someone likes or comments on a post, you see the update instantly. This is CDC in action.
  • Bank Notifications: Whenever you make a transaction, your bank app updates instantly. Another example of CDC.

In upcoming blogs, will include Debezium implementation with CDC.

Learning Notes #57 – Partial Indexing in Postgres

16 January 2025 at 14:36

Today, i learnt about partial indexing in postgres, how its optimizes the indexing process to filter subset of table more efficiently. In this blog, i jot down notes on partial indexing.

Partial indexing in PostgreSQL is a powerful feature that provides a way to optimize database performance by creating indexes that apply only to a subset of a table’s rows. This selective indexing can result in reduced storage space, faster index maintenance, and improved query performance, especially when queries frequently involve filters or conditions that only target a portion of the data.

An index in PostgreSQL, like in other relational database management systems, is a data structure that improves the speed of data retrieval operations. However, creating an index on an entire table can sometimes be inefficient, especially when dealing with very large datasets where queries often focus on specific subsets of the data. This is where partial indexing becomes invaluable.

Unlike a standard index that covers every row in a table, a partial index only includes rows that satisfy a specified condition. This condition is defined using a WHERE clause when the index is created.

To understand the mechanics, let us consider a practical example.

Suppose you have a table named orders that stores details about customer orders, including columns like order_id, customer_id, order_date, status, and total_amount. If the majority of your queries focus on pending orders those where the status is pending, creating a partial index specifically for these rows can significantly improve performance.

Example 1:

Here’s how you can create such an index,

CREATE INDEX idx_pending_orders
ON orders (order_date)
WHERE status = 'pending';

In this example, the index idx_pending_orders includes only the rows where status equals pending. This means that any query that involves filtering by status = 'pending' and utilizes the order_date column will leverage this index. For instance, the following query would benefit from the partial index,

SELECT *
FROM orders
WHERE status = 'pending'
AND order_date > '2025-01-01';

The benefits of this approach are significant. By indexing only the rows with status = 'pending', the size of the index is much smaller compared to a full table index.

This reduction in size not only saves disk space but also speeds up the process of scanning the index, as there are fewer entries to traverse. Furthermore, updates or modifications to rows that do not meet the WHERE condition are excluded from index maintenance, thereby reducing the overhead of maintaining the index and improving performance for write operations.

Example 2:

Let us explore another example. Suppose your application frequently queries orders that exceed a certain total amount. You can create a partial index tailored to this use case,

CREATE INDEX idx_high_value_orders
ON orders (customer_id)
WHERE total_amount > 1000;

This index would optimize queries like the following,

SELECT *
FROM orders
WHERE total_amount > 1000
AND customer_id = 123;

The key advantage here is that the index only includes rows where total_amount > 1000. For datasets with a wide range of order amounts, this can dramatically reduce the number of indexed entries. Queries that filter by high-value orders become faster because the database does not need to sift through irrelevant rows.

Additionally, as with the previous example, index maintenance is limited to the subset of rows matching the condition, improving overall performance for insertions and updates.

Partial indexes are also useful for enforcing constraints in a selective manner. Consider a scenario where you want to ensure that no two active promotions exist for the same product. You can achieve this using a unique partial index

CREATE UNIQUE INDEX idx_unique_active_promotion
ON promotions (product_id)
WHERE is_active = true;

This index guarantees that only one row with is_active = true can exist for each product_id.

In conclusion, partial indexing in PostgreSQL offers a flexible and efficient way to optimize database performance by targeting specific subsets of data.

Learning Notes #51 – Postgres as a Queue using SKIP LOCKED

11 January 2025 at 06:56

Yesterday, i came across a blog from inferable.ai https://www.inferable.ai/blog/posts/postgres-skip-locked, which walkthrough about using postgres as a queue. In this blog, i jot down notes on using postgres as a queue for future references.

PostgreSQL is a robust relational database that can be used for more than just storing structured data. With the SKIP LOCKED feature introduced in PostgreSQL 9.5, you can efficiently turn a PostgreSQL table into a job queue for distributed processing.

Why Use PostgreSQL as a Queue?

Using PostgreSQL as a queue can be advantageous because,

  • Familiarity: If you’re already using PostgreSQL, there’s no need for an additional message broker.
  • Durability: PostgreSQL ensures ACID compliance, offering reliability for your job processing.
  • Simplicity: No need to manage another component like RabbitMQ or Kafka

Implementing a Queue with SKIP LOCKED

1. Create a Queue Table

To start, you need a table to store the jobs,


CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    job_data JSONB NOT NULL,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This table has the following columns,

  • id: A unique identifier for each job.
  • job_data: The data or payload for the job.
  • status: Tracks the job’s state (‘pending’, ‘in_progress’, or ‘completed’).
  • created_at: Timestamp of job creation.

2. Insert Jobs into the Queue

Adding jobs is straightforward,


INSERT INTO job_queue (job_data)
VALUES ('{"task": "send_email", "email": "user@example.com"}');

3. Fetch Jobs for Processing with SKIP LOCKED

Workers will fetch jobs from the queue using SELECT ... FOR UPDATE SKIP LOCKED to avoid contention,

WITH next_job AS (
    SELECT id, job_data
    FROM job_queue
    WHERE status = 'pending'
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
UPDATE job_queue
SET status = 'in_progress'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.id, job_queue.job_data;

Key Points:

  • FOR UPDATE locks the selected row to prevent other workers from picking it up.
  • SKIP LOCKED ensures locked rows are skipped, enabling concurrent workers to operate without waiting.
  • LIMIT 1 processes one job at a time per worker.

4. Mark Jobs as Completed

Once a worker finishes processing a job, it should update the job’s status,


UPDATE job_queue
SET status = 'completed'
WHERE id = $1; -- Replace $1 with the job ID

5. Delete Old or Processed Jobs

To keep the table clean, you can periodically remove completed jobs,


DELETE FROM job_queue
WHERE status = 'completed' AND created_at < NOW() - INTERVAL '30 days';

Example Worker Implementation

Here’s an example of a worker implemented in Python using psycopg2


import psycopg2
from psycopg2.extras import RealDictCursor

connection = psycopg2.connect("dbname=yourdb user=youruser")

while True:
    with connection.cursor(cursor_factory=RealDictCursor) as cursor:
        cursor.execute(
            """
            WITH next_job AS (
                SELECT id, job_data
                FROM job_queue
                WHERE status = 'pending'
                FOR UPDATE SKIP LOCKED
                LIMIT 1
            )
            UPDATE job_queue
            SET status = 'in_progress'
            FROM next_job
            WHERE job_queue.id = next_job.id
            RETURNING job_queue.id, job_queue.job_data;
            """
        )

        job = cursor.fetchone()
        if job:
            print(f"Processing job {job['id']}: {job['job_data']}")

            # Simulate job processing
            cursor.execute("UPDATE job_queue SET status = 'completed' WHERE id = %s", (job['id'],))

        else:
            print("No jobs available. Sleeping...")
            time.sleep(5)

    connection.commit()

Considerations

  1. Transaction Isolation: Use the REPEATABLE READ or SERIALIZABLE isolation level cautiously to avoid unnecessary locks.
  2. Row Locking: SKIP LOCKED only skips rows locked by other transactions, not those locked within the same transaction.
  3. Performance: Regularly archive or delete old jobs to prevent the table from growing indefinitely. Consider indexing the status column to improve query performance.
  4. Fault Tolerance: Ensure that workers handle crashes or timeouts gracefully. Use a timeout mechanism to revert jobs stuck in the ‘in_progress’ state.
  5. Scaling: Distribute workers across multiple nodes to handle a higher job throughput.
  6. The SKIP LOCKED clause only applies to row-level locks – the required ROW SHARE table-level lock is still taken normally.
  7. Using SKIP LOCKED provides an inconsistent view of the data by design. This is why it’s perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required.

Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

  1. Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
  2. Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

  1. Fixed Number of Partitions
    • The number of partitions is determined during system initialization (e.g., 8 partitions).
    • Data is assigned to these partitions based on a consistent hashing algorithm.
  2. Stable Data Mapping
    • Each piece of data is permanently mapped to a specific partition.
    • This eliminates the need for large-scale data reshuffling when scaling the system.
  3. Adjustable Partition-to-Server Mapping
    • Partitions can be reassigned to different servers as the system scales.
    • Only the physical location of the partitions changes; the fixed mapping remains intact.
  4. Balanced Load Distribution
    • Partitions are distributed evenly across servers to balance the workload.
    • Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

  • Server 1: transactions_p0, transactions_p1
  • Server 2: transactions_p2, transactions_p3
  • Server 3: transactions_p4, transactions_p5
  • Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

  • Server 1: transactions_p0
  • Server 2: transactions_p1
  • Server 3: transactions_p2, transactions_p3
  • Server 4: transactions_p4
  • Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

  • During the migration, transactions_p5 is copied from Server 3 to Server 5.
  • Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

  1. Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
  2. Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
  3. Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

  1. The copying is limited to the partition being reassigned.
  2. No data within the partition is reorganized.
  3. Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

  • Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
  • Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres

6 January 2025 at 14:07

Today, I learnt about various locking mechanism to prevent double update. In this blog, i make notes on Shared Lock and Exclusive Lock for my future self.

What Are Locks in Databases?

Locks are mechanisms used by a DBMS to control access to data. They ensure that transactions are executed in a way that maintains the ACID (Atomicity, Consistency, Isolation, Durability) properties of the database. Locks can be classified into several types, including

  • Shared Locks (S Locks): Allow multiple transactions to read a resource simultaneously but prevent any transaction from writing to it.
  • Exclusive Locks (X Locks): Allow a single transaction to modify a resource, preventing both reading and writing by other transactions.
  • Intent Locks: Used to signal the type of lock a transaction intends to acquire at a lower level.
  • Deadlock Prevention Locks: Special locks aimed at preventing deadlock scenarios.

Shared Lock

A shared lock is used when a transaction needs to read a resource (e.g., a database row or table) without altering it. Multiple transactions can acquire a shared lock on the same resource simultaneously. However, as long as one or more shared locks exist on a resource, no transaction can acquire an exclusive lock on that resource.


-- Transaction A: Acquire a shared lock on a row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Transaction B: Acquire a shared lock on the same row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Both transactions can read the row concurrently
-- Transaction C: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 1;
-- Transaction C will be blocked until Transactions A and B release their locks

Key Characteristics of Shared Locks

1. Concurrent Reads

  • Shared locks allow multiple transactions to read the same resource at the same time.
  • This is ideal for operations like SELECT queries that do not modify data.

2. Write Blocking

  • While a shared lock is active, no transaction can modify the locked resource.
  • Prevents dirty writes and ensures read consistency.

3. Compatibility

  • Shared locks are compatible with other shared locks but not with exclusive locks.

When Are Shared Locks Used?

Shared locks are typically employed in read operations under certain isolation levels. For instance,

1. Read Committed Isolation Level:

  • Shared locks are held for the duration of the read operation.
  • Prevents dirty reads by ensuring the data being read is not modified by other transactions during the read.

2. Repeatable Read Isolation Level:

  • Shared locks are held until the transaction completes.
  • Ensures that the data read during a transaction remains consistent and unmodified.

3. Snapshot Isolation:

  • Shared locks may not be explicitly used, as the DBMS creates a consistent snapshot of the data for the transaction.

    Exclusive Locks

    An exclusive lock is used when a transaction needs to modify a resource. Only one transaction can hold an exclusive lock on a resource at a time, ensuring no other transactions can read or write to the locked resource.

    
    -- Transaction X: Acquire an exclusive lock to update a row
    BEGIN;
    UPDATE employees SET salary = salary + 1000 WHERE id = 2;
    -- Transaction Y: Attempt to read the same row
    BEGIN;
    SELECT * FROM employees WHERE id = 2;
    -- Transaction Y will be blocked until Transaction X completes
    -- Transaction Z: Attempt to update the same row
    BEGIN;
    UPDATE employees SET salary = salary + 500 WHERE id = 2;
    -- Transaction Z will also be blocked until Transaction X completes
    

    Key Characteristics of Exclusive Locks

    1. Write Operations: Exclusive locks are essential for operations like INSERT, UPDATE, and DELETE.

    2. Blocking Reads and Writes: While an exclusive lock is active, no other transaction can read or write to the resource.

    3. Isolation: Ensures that changes made by one transaction are not visible to others until the transaction is complete.

      When Are Exclusive Locks Used?

      Exclusive locks are typically employed in write operations or any operation that modifies the database. For instance:

      1. Transactional Updates – A transaction that updates a row acquires an exclusive lock to ensure no other transaction can access or modify the row during the update.

      2. Table Modifications – When altering a table structure, the DBMS may place an exclusive lock on the entire table.

      Benefits of Shared and Exclusive Locks

      Benefits of Shared Locks

      1. Consistency in Multi-User Environments – Ensure that data being read is not altered by other transactions, preserving consistency.
      2. Concurrency Support – Allow multiple transactions to read data simultaneously, improving system performance.
      3. Data Integrity – Prevent dirty reads and writes, ensuring that operations yield reliable results.

      Benefits of Exclusive Locks

      1. Data Integrity During Modifications – Prevents other transactions from accessing data being modified, ensuring changes are applied safely.
      2. Isolation of Transactions – Ensures that modifications by one transaction are not visible to others until committed.

      Limitations and Challenges

      Shared Locks

      1. Potential for Deadlocks – Deadlocks can occur if two transactions simultaneously hold shared locks and attempt to upgrade to exclusive locks.
      2. Blocking Writes – Shared locks can delay write operations, potentially impacting performance in write-heavy systems.
      3. Lock Escalation – In systems with high concurrency, shared locks may escalate to table-level locks, reducing granularity and concurrency.

      Exclusive Locks

      1. Reduced Concurrency – Exclusive locks prevent other transactions from accessing the locked resource, which can lead to bottlenecks in highly concurrent systems.
      2. Risk of Deadlocks – Deadlocks can occur if two transactions attempt to acquire exclusive locks on resources held by each other.

      Lock Compatibility

      Learning Notes #39 – Compensation Pattern | Cloud Pattern

      5 January 2025 at 12:50

      Today i learnt about compensation pattern, where it rollback a transactions when it face some failures. In this blog i jot down notes on compensating pattern and how it relates with SAGA pattern.

      Distributed systems often involve multiple services working together to perform a business operation. Ensuring data consistency and reliability across these services is challenging, especially in cases of failure. One solution is the use of compensation transactions, a mechanism designed to maintain consistency by reversing the effects of previous operations when errors occur.

      What Are Compensation Transactions?

      A compensation transaction is an operation that undoes the effect of a previously executed operation. Unlike traditional rollback mechanisms in centralized databases, compensation transactions are explicitly defined and executed in distributed systems to maintain consistency after a failure.

      Key Characteristics

      • Explicit Definition: Compensation logic must be explicitly implemented.
      • Independent Execution: Compensation operations are separate from the main transaction.
      • Eventual Consistency: Ensures the system reaches a consistent state over time.
      • Asynchronous Nature: Often triggered asynchronously to avoid blocking main processes.

      Why Are Compensation Transactions Important?

      1. Handling Failures in Distributed Systems

      In a distributed architecture, such as microservices, different services may succeed or fail independently. Compensation transactions allow partial rollbacks to maintain overall consistency.

      2. Avoiding Global Locking

      Traditional transactions with global locks (e.g., two-phase commits) are not feasible in distributed systems due to performance and scalability concerns. Compensation transactions provide a more flexible alternative.

      3. Resilience and Fault Tolerance

      Compensation mechanisms make systems more resilient by allowing recovery from failures without manual intervention.

      How Compensation Transactions Work

      1. Perform Main Operations: Each service performs its assigned operation, such as creating a record or updating a database.
      2. Log Operations: Log actions and context to enable compensating transactions if needed.
      3. Detect Failure: Monitor the workflow for errors or failures in any service.
      4. Trigger Compensation: If a failure occurs, execute compensation transactions for all successfully completed operations to undo their effects.

      Example Workflow

      Imagine an e-commerce checkout process involving three steps

      • Step 1: Reserve inventory.
      • Step 2: Deduct payment.
      • Step 3: Confirm order.

      If Step 3 fails, compensation transactions for Steps 1 and 2 might include

      • Releasing the reserved inventory.
      • Refunding the payment.

      Design Considerations for Compensation Transactions

      1. Idempotency

      Ensure compensating actions are idempotent, meaning they can be executed multiple times without unintended side effects. This is crucial in distributed systems where retries are common.

      2. Consistency Model

      Adopt an eventual consistency model to align with the asynchronous nature of compensation transactions.

      3. Error Handling

      Design robust error-handling mechanisms for compensating actions, as these too can fail.

      4. Service Communication

      Use reliable communication protocols (e.g., message queues) to trigger and manage compensation transactions.

      5. Isolation of Compensation Logic

      Keep compensation logic isolated from the main business logic to maintain clarity and modularity.

      Use Cases for Compensation Transactions

      1. Financial Systems

      • Reversing failed fund transfers or unauthorized transactions.
      • Refunding payments in e-commerce platforms.

      2. Travel and Booking Systems

      • Canceling a hotel reservation if flight booking fails.
      • Releasing blocked seats if payment is not completed.

      3. Healthcare Systems

      • Undoing scheduled appointments if insurance validation fails.
      • Revoking prescriptions if a linked process encounters errors.

      4. Supply Chain Management

      • Canceling shipment orders if inventory updates fail.
      • Restocking items if order fulfillment is aborted.

      Challenges of Compensation Transactions

      1. Complexity in Implementation: Designing compensating logic for every operation can be tedious and error-prone.
      2. Performance Overhead: Logging operations and executing compensations can introduce latency.
      3. Partial Rollbacks: It may not always be possible to fully undo certain operations, such as sending emails or notifications.
      4. Failure in Compensating Actions: Compensation transactions themselves can fail, requiring additional mechanisms to handle such scenarios.

      Best Practices

      1. Plan for Compensation Early: Design compensating transactions as part of the initial development process.
      2. Use SAGA Pattern: Combine compensation transactions with the SAGA pattern to manage distributed workflows effectively.
      3. Test Extensively: Simulate failures and test compensating logic under various conditions.
      4. Monitor and Log: Maintain detailed logs of operations and compensations for debugging and audits.

      Learning Notes #34 – Consistency (Correctness) in ACID | Postgres

      4 January 2025 at 12:37

      As part of the ACID Series, i am refreshing on consistency. In this blog, i jot down notes on consistency (correctness) in postgres database.

      What is Consistency?

      Consistency ensures that a transaction brings the database from one valid state to another, adhering to predefined rules such as constraints, triggers, and relational integrity. If a transaction violates these rules, it is aborted, and the database remains unchanged. This guarantees that only valid data exists in the database.

      Consistency works together with other ACID properties:

      • Atomicity ensures the “all-or-nothing” execution of a transaction.
      • Isolation ensures transactions don’t interfere with each other.
      • Durability guarantees committed transactions persist despite system failures

      Key Aspects of Consistency in PostgreSQL

      1. Constraints
        • Primary Key: Ensures uniqueness of rows.
        • Foreign Key: Maintains referential integrity.
        • Check Constraints: Enforces custom business rules.
        • Not Null: Ensures that specific columns cannot have null values.
      2. Triggers
        • Custom logic executed before or after specific database events.
      3. Rules
        • Enforce application-specific invariants on the database.
      4. Transactions
        • Changes are made in a controlled environment, ensuring consistency even in the event of errors or system failures.

      Practical Examples of Consistency in PostgreSQL

      1. Primary Key Constraint

      Ensures that no two rows in a table have the same primary key value.

      
      CREATE TABLE accounts (
          account_id SERIAL PRIMARY KEY,
          account_holder_name VARCHAR(255) NOT NULL,
          balance NUMERIC(15, 2) NOT NULL CHECK (balance >= 0)
      );
      
      -- Attempt to insert duplicate primary keys.
      INSERT INTO accounts (account_id, account_holder_name, balance)
      VALUES (1, 'Alice', 1000.00);
      
      INSERT INTO accounts (account_id, account_holder_name, balance)
      VALUES (1, 'Bob', 2000.00); -- This will fail.
      

      2. Foreign Key Constraint

      Enforces referential integrity between tables.

      
      CREATE TABLE transactions (
          transaction_id SERIAL PRIMARY KEY,
          account_id INT NOT NULL REFERENCES accounts(account_id),
          amount NUMERIC(15, 2) NOT NULL,
          transaction_type VARCHAR(10) NOT NULL CHECK (transaction_type IN ('credit', 'debit')),
          transaction_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );
      
      -- Attempt to insert a transaction for a non-existent account.
      INSERT INTO transactions (account_id, amount, transaction_type)
      VALUES (999, 500, 'credit'); -- This will fail.
      

      3. Check Constraint

      Validates custom business rules.

      
      -- Ensure account balance cannot go negative.
      INSERT INTO accounts (account_holder_name, balance)
      VALUES ('Charlie', -500); -- This will fail due to the CHECK constraint.
      

      4. Trigger for Business Logic

      Ensures derived data or additional checks are implemented.

      
      CREATE OR REPLACE FUNCTION enforce_minimum_balance()
      RETURNS TRIGGER AS $$
      BEGIN
          IF NEW.balance < 0 THEN
              RAISE EXCEPTION 'Balance cannot be negative';
          END IF;
          RETURN NEW;
      END;
      $$ LANGUAGE plpgsql;
      
      CREATE TRIGGER check_balance_before_insert
      BEFORE INSERT OR UPDATE ON accounts
      FOR EACH ROW EXECUTE FUNCTION enforce_minimum_balance();
      
      -- Attempt to update an account with a negative balance.
      UPDATE accounts SET balance = -100 WHERE account_id = 1; -- This will fail.
      

      5. Transactions to Maintain Consistency

      A transaction groups multiple operations into a single unit, ensuring all succeed or none.

      
      BEGIN;
      
      -- Deduct from sender's account.
      UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;
      
      -- Credit to receiver's account.
      UPDATE accounts SET balance = balance + 500 WHERE account_id = 2;
      
      -- If any operation fails, rollback the transaction.
      COMMIT;
      

      If the system crashes before the COMMIT, the database remains unchanged, ensuring consistency.

      How Consistency Works with Other ACID Properties

      1. With Atomicity: If any step in a transaction violates a constraint, the entire transaction is rolled back, ensuring that the database remains consistent.
      2. With Isolation: Concurrent transactions operate independently, preventing inconsistent states caused by interference.
      3. With Durability: Once a transaction is committed, its consistency guarantees persist even in the event of a crash.

      Benefits of Consistency

      1. Data Integrity: Prevents invalid data from being stored.
      2. Application Reliability: Reduces the need for additional application-level checks.
      3. Simplified Maintenance: Developers can rely on the database to enforce business rules and relationships.
      4. Error Prevention: Constraints and triggers act as safeguards, catching mistakes early.

      Learning Notes #28 – Unlogged Table in Postgres

      2 January 2025 at 17:30

      Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

      Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

      Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

      What Are Unlogged Tables?

      Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

      In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

      The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

      Key Characteristics of Unlogged Tables

      1. No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
      2. No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
      3. Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
      4. Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
      5. Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

      Drawbacks of Unlogged Tables

      1. Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
      2. Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
      3. No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

      Creating an Unlogged Table

      Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.

      
      CREATE UNLOGGED TABLE temp_data (
          id SERIAL PRIMARY KEY,
          name VARCHAR(100),
          value INT
      );
      
      

      In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

      When to Avoid Unlogged Tables?

      • If you are working with critical data that needs to be durable and persistent across restarts.
      • If your application requires data replication, as unlogged tables are not replicated in standby servers.
      • If your workload involves frequent crash scenarios where data loss cannot be tolerated.

      Examples

      1. Temporary Storage for processing

      
      CREATE UNLOGGED TABLE etl_staging (
          source_id INT,
          raw_data JSONB,
          processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );
      
      -- Insert raw data into the staging table
      INSERT INTO etl_staging (source_id, raw_data)
      VALUES 
          (1, '{"key": "value1"}'),
          (2, '{"key": "value2"}');
      
      -- Perform transformations on the data
      INSERT INTO final_table (id, key, value)
      SELECT source_id, 
             raw_data->>'key' AS key, 
             'processed_value' AS value
      FROM etl_staging;
      
      -- Clear the staging table
      TRUNCATE TABLE etl_staging;
      
      

      2. Caching

      
      CREATE UNLOGGED TABLE user_sessions (
          session_id UUID PRIMARY KEY,
          user_id INT,
          last_accessed TIMESTAMP DEFAULT NOW()
      );
      
      -- Insert session data
      INSERT INTO user_sessions (session_id, user_id)
      VALUES 
          (uuid_generate_v4(), 101),
          (uuid_generate_v4(), 102);
      
      -- Update last accessed timestamp
      UPDATE user_sessions
      SET last_accessed = NOW()
      WHERE session_id = 'some-session-id';
      
      -- Delete expired sessions
      DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';
      

      Learning Notes #21 – Sharding (database) with Postgres

      31 December 2024 at 07:54

      …contd. Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

      As the volume of data grows, managing databases efficiently becomes critical and when we understood that vertical scaling has its limits, we have two common strategies to handle large datasets are partitioning and sharding. While they may sound similar, these techniques serve different purposes and are implemented differently. Let’s explore these concepts in detail.

      What is Sharding?

      Sharding is a type of database architecture where data is horizontally divided across multiple database instances, called shards. Each shard is an independent database, often hosted on separate servers. Sharding is commonly used to scale out databases.

      How Sharding Works

      1. Shard Key
        • A shard key determines how data is distributed across shards.
        • Example: A social media app might use user IDs as the shard key to ensure all data related to a user resides in the same shard.
      2. Data Distribution
        • Data is split horizontally; each shard contains a subset of the entire dataset.
        • Shards can be distributed geographically or across servers for better performance.

      Combining Partitioning and Sharding

      In some advanced architectures, partitioning and sharding are combined. Here, partitioned data is further distributed across shards. Each shard can manage its partitions independently, providing both scalability and query optimization.

      PostgreSQL Example with Citus

      1. Install the Citus extension (https://www.citusdata.com/download/, https://github.com/citusdata/citus?tab=readme-ov-file#getting-started)

      
      

      2. Create a distributed table with partitioning

      CREATE TABLE orders (
          id SERIAL,
          customer_id INT,
          order_date DATE NOT NULL,
          PRIMARY KEY (id, order_date)
      ) PARTITION BY RANGE (order_date);
      
      SELECT create_distributed_table('orders', 'customer_id');
      
      CREATE TABLE orders_jan PARTITION OF orders
          FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
      
      CREATE TABLE orders_feb PARTITION OF orders
          FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
      

      3. Add worker nodes to manage shards

      SELECT master_add_node('worker1', 5432);
      SELECT master_add_node('worker2', 5432);
      

      4. Data placement

      The data for each partition (e.g., orders_jan) is distributed across the shards (worker1, worker2) based on the shard key (customer_id).

      5. Insert and query data

      INSERT INTO orders (customer_id, order_date) VALUES (101, '2024-01-15');
      SELECT * FROM orders WHERE customer_id = 101;
      

      Learning Notes #20 – Partitioning (data) With Postgres

      31 December 2024 at 06:55

      Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

      As the volume of data grows, managing databases efficiently becomes critical and when we understood that vertical scaling has its limits, we have two common strategies to handle large datasets are partitioning and sharding. While they may sound similar, these techniques serve different purposes and are implemented differently. Let’s explore these concepts in detail.

      What is Partitioning?

      Partitioning involves dividing a large dataset into smaller, manageable segments, known as partitions. Each partition is stored separately but remains part of a single database instance. Partitioning is typically used to improve query performance and manageability.

      Types of Partitioning

      1. Range Partitioning

      • Data is divided based on ranges of a column’s values.
      • Example: A table storing customer orders might partition data by order date: January orders in one partition, February orders in another.

      PostgreSQL Example

      CREATE TABLE orders (
          id SERIAL,
          customer_id INT,
          order_date DATE NOT NULL,
          PRIMARY KEY (id, order_date) -- Include the partition key
      ) PARTITION BY RANGE (order_date);
      
      CREATE TABLE orders_jan PARTITION OF orders
          FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
      
      CREATE TABLE orders_feb PARTITION OF orders
          FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
      
      

      2. Hash Partitioning

      • A hash function determines the partition where a record will be stored.
      • Example: Orders can be distributed across partitions based on the hash of the customer ID.

      Postgres Example

      CREATE TABLE orders (
          id SERIAL ,
          customer_id INT,
          order_date DATE NOT NULL,
          PRIMARY KEY (id, customer_id)
      ) PARTITION BY HASH (customer_id, id);
      
      CREATE TABLE orders_part_1 PARTITION OF orders
          FOR VALUES WITH (MODULUS 2, REMAINDER 0);
      
      CREATE TABLE orders_part_2 PARTITION OF orders
          FOR VALUES WITH (MODULUS 2, REMAINDER 1);
      

      3. List Partitioning

      • Data is divided based on a predefined list of values.
      • Example: A table storing sales data could partition based on regions: North, South, East, and West

      Postgres Example

      CREATE TABLE sales (
          id SERIAL ,
          region TEXT NOT NULL,
          amount NUMERIC,
          PRIMARY KEY (id, region)
      ) PARTITION BY LIST (region);
      
      CREATE TABLE sales_north PARTITION OF sales
          FOR VALUES IN ('North');
      
      CREATE TABLE sales_south PARTITION OF sales
          FOR VALUES IN ('South');
      

      4. Composite Partitioning

      • Combines two or more partitioning strategies, such as range and list partitioning.
      • Example: A table partitioned by range on order date and sub-partitioned by list on region.

      Postgres Example

      CREATE TABLE orders (
          id SERIAL,
          customer_id INT,
          order_date DATE NOT NULL,
          region TEXT NOT NULL,
          PRIMARY KEY (id, order_date, region)
      ) PARTITION BY RANGE (order_date);
      
      CREATE TABLE orders_2024 PARTITION OF orders
          FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
          PARTITION BY LIST (region);
      
      CREATE TABLE orders_2024_north PARTITION OF orders_2024
          FOR VALUES IN ('North');
      
      CREATE TABLE orders_2024_south PARTITION OF orders_2024
          FOR VALUES IN ('South');
      
      

      04. தரவு ஒருங்கிணைவு (Data Integrity)

      By: Vijayan S
      20 November 2024 at 12:13

      தரவு ஒருங்கிணைவு (Data Integrity)

      தரவு ஒருங்கிணைவு என்பது தரவுத்தளத்தில் உள்ள தரவுகள் சரியானதாகவும், துல்லியமாகவும், நிலைத்தன்மையுடனும் இருப்பதை உறுதி செய்யும் செயல்முறையாகும். இது தரவுத்தளத்தின் நம்பகத்தன்மையை மேம்படுத்துகிறது மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.

      தரவு ஒருங்கிணைவின் முக்கிய வகைகள்:

      1. பண்பு ஒருங்கிணைவு (Domain Integrity):

        • ஒவ்வொரு பத்தியும் (column) அதற்கு ஒதுக்கப்பட்ட தரவு வகையை (data type) பின்பற்ற வேண்டும்.
        • உதாரணமாக, வயது பத்தியில் எண்களையே உள்ளிட முடியும், எழுத்துக்களை உள்ளிட முடியாது.
      2. நிறுவன ஒருங்கிணைவு (Entity Integrity):

        • ஒவ்வொரு அட்டவணையிலும் (table) உள்ள ஒவ்வொரு பதிவும் (record) தனித்துவமான முதன்மை விசையைக் (primary key) கொண்டிருக்க வேண்டும்.
        • உதாரணமாக, ஒரு பள்ளியின் மாணவர் பதிவேட்டில், மாணவர் கல்வி எண் (roll number) முதன்மை விசையாக இருக்கலாம்.
      3. குறிப்பு ஒருங்கிணைவு (Referential Integrity):

        • ஒரு அட்டவணையில் உள்ள வெளிநாட்டு விசை (foreign key) மற்றொரு அட்டவணையின் முதன்மை விசையை குறிக்க வேண்டும்.
        • உதாரணமாக, ஒரு விற்பனை அட்டவணையில் உள்ள வாடிக்கையாளர் ID வெளிநாட்டு விசையாக இருந்து, வாடிக்கையாளர் விவரங்கள் அட்டவணையின் வாடிக்கையாளர் ID முதன்மை விசையுடன் பொருந்த வேண்டும்.
      4. துணை ஒருங்கிணைவு (Tuple Integrity):

        • ஒவ்வொரு அட்டவணையிலும் உள்ள ஒவ்வொரு பதிவும் தனித்துவமானதாக இருக்க வேண்டும்.
        • உதாரணமாக, ஒரு ஊழியர் அட்டவணையில், இரண்டு ஊழியர்களுக்கும் ஒரே ஊழியர் ID இருக்க முடியாது.

      தரவு ஒருங்கிணைவு நன்மைகள்:

      • தரவு துல்லியம் மற்றும் நம்பகத்தன்மையை மேம்படுத்துகிறது.
      • தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.
      • தரவுத்தள செயல்திறனை மேம்படுத்துகிறது.
      • தரவு பாதுகாப்பை அதிகரிக்கிறது.

      தரவு ஒருங்கிணைவு என்பது தரவுத்தள மேலாண்மை அமைப்புகளில் (DBMS) மிக முக்கியமான அம்சமாகும். இது தரவுத்தளத்தின் சரியான செயல்பாட்டை உறுதி செய்து, தரவு இழப்பு மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தவிர்க்க உதவுகிறது.

      03. ரிலேஷனல் டேட்டாபேஸ் மாடல் என்றால் என்ன? What is Relational Database Model ? (RDBMS)

      By: Vijayan S
      20 November 2024 at 12:03

      1. RDBMS என்றால் என்ன?

      Relational Database Model (RDBMS) என்பது தரவுகளை தொடர்புபடுத்தி சேமித்து நிர்வகிக்க பயன்படும் ஒரு முறையாகும். இது தரவுகளை Tables (அட்டவணைகள்) வடிவில் சேமித்து, Relationships (தொடர்புகள்) மூலம் இணைக்கிறது. இது தரவுகளை திறமையாகவும், நெகிழ்வாகவும் நிர்வகிக்க உதவுகிறது.

      Tables (அட்டவணைகள்): தரவுகள் சேமிக்கப்படும் அடிப்படை அலகு.
      Rows (பத்திகள்): ஒவ்வொரு தரவு பதிவும் ஒரு row ஆகும்.
      Columns (நிரல்கள்): ஒவ்வொரு தரவு பண்பும் ஒரு column ஆகும்.
      Primary Key (முதன்மை விசை): ஒவ்வொரு row-யையும் தனித்து அடையாளம் காட்டும் column அல்லது column களின் தொகுப்பு.
      Foreign Key (வெளி விசை): ஒரு table-ல் உள்ள primary key-யை மற்றொரு table-ல் குறிப்பிடும் column அல்லது column களின் தொகுப்பு.

      தொடர்புகள் (Relationships)

      • One-to-One (ஒன்றுக்கு ஒன்று): ஒரு table-ல் உள்ள ஒவ்வொரு row-ம் மற்றொரு table-ல் உள்ள ஒரே ஒரு row-யுடன் தொடர்புடையது.
      • One-to-Many (ஒன்றுக்கு பல): ஒரு table-ல் உள்ள ஒவ்வொரு row-ம் மற்றொரு table-ல் உள்ள பல rows-களுடன் தொடர்புடையது.
      • Many-to-Many (பலவிற்கும் பல): ஒரு table-ல் உள்ள பல rows-கள் மற்றொரு table-ல் உள்ள பல rows-களுடன் தொடர்புடையது.

      • Normalization (நார்மலைசேஷன்): தரவு சேமிப்பை திறமையாகவும், தரவுகளின் ஒற்றுமையை பாதுகாக்கவும் பயன்படும் செயல்முறை.

      • Indexing (இன்டெக்ஸிங்): தரவுகளை விரைவாக தேட உதவும் தரவு அமைப்பு.

      • Views (வியூக்கள்): தரவுத்தளத்தின் ஒரு பகுதியை ஒரு குறிப்பிட்ட கோணத்தில் காட்டும் தருக்க அமைப்பு.

      • Stored Procedures (சேமிக்கப்பட்ட நடைமுறைகள்): அடிக்கடி பயன்படுத்தப்படும் SQL கட்டளைகளை ஒரே இடத்தில் சேமித்து மீண்டும் பயன்படுத்தும் வசதி.

      • Triggers (ட்ரிக்கர்கள்): தரவுத்தளத்தில் ஏற்படும் மாற்றங்களுக்கு தானாகவே செயல்படும் நிகழ்வுகள்.

      உதாரணம்

      ஒரு பள்ளியின் தரவுத்தளத்தை உருவாக்குவோம்:

      • Students Table: StudentID (Primary Key), StudentName, Age, Class
      • Courses Table: CourseID (Primary Key), CourseName, TeacherName
      • StudentCourses Table: StudentID (Foreign Key), CourseID (Foreign Key)

      இந்த தரவுத்தளத்தில், ஒரு மாணவர் பல பாடங்களில் சேரலாம், ஒரு பாடத்தில் பல மாணவர்கள் சேரலாம். இது Many-to-Many தொடர்புக்கு ஒரு உதாரணம்.

      02. DBMS என்றால் என்ன? What is a DBMS?

      By: Vijayan S
      20 November 2024 at 10:08

      1. DBMS என்றால் என்ன?

      Database Management System (DBMS) என்பது ஒரு மென்பொருள், இது ஒரு Database-ஐ நிர்வகிக்க, சேமிக்க, பெற மற்றும் மாற்ற பயன்படுகிறது. இது Database மற்றும் அதன் பயனர்களுக்கு இடையே ஒரு மத்தியஸ்தராக செயல்படுகிறது, தரவை எளிதாக அணுகவும் பாதுகாப்பாக வைத்திருக்கவும் உதவுகிறது.

      2. DBMS-இன் கூறுகள்

      DBMS-க்கு பின்வரும் முக்கிய கூறுகள் உள்ளன:

      a. Database Engine

      • இது DBMS-இன் மையம் ஆகும்.
      • Query Processing: பயனர்கள் கேட்ட தகவல்களை செயல்படுத்த (execute) செய்யும்.
      • பரிவர்த்தனை மேலாண்மை (Transaction Management): ACID Properties (Atomicity, Consistency - நிலைத்தன்மை, Isolation - தனிமைப்படுத்துதல், Durability - நிலைப்புத்தன்மை)-ஐ பின்பற்றும்.

      Atomicity என்பது ACID பண்புகளின் (Atomicity, Consistency, Isolation, Durability) ஒரு முக்கிய அம்சமாகும். இது ஒரு transaction-ஐ ஒரு முழுமையான, பிரிக்க முடியாத செயலாகக் கருதுகிறது. Atomicity என்பதன் மூலம் கீழ்காணும் இரண்டு தருணங்கள் உறுதி செய்யப்படும்:

      1. ஒரு transaction முழுமையாக நிறைவேற வேண்டும் அல்லது அது ஒரு பங்காகவே இல்லாதது போல இருக்க வேண்டும்.

      2.Transaction நடத்தியபோது எந்தவித தோல்வியும் (உதாரணமாக, system crash, network issue, அல்லது invalid operation) ஏற்பட்டால், அந்த transaction முழுமையாக rollback செய்யப்படும், மற்றும் database தனது முந்தைய நிலைக்கு திரும்பும்.

      Atomicity உதாரணம்
      ஒரு வங்கியின் $500 பணத்தை Account A-ல் இருந்து Account B-க்கு மாற்றும் நிகழ்வை கற்பனை செய்யுங்கள்:

      Account A-யில் இருந்து $500 debit செய்ய வேண்டும்.
      Account B-க்கு $500 credit செய்ய வேண்டும்.
      Atomicity பேணப்படுவதற்காக:

      இந்த இரண்டு படிகளும் (debit மற்றும் credit) வெற்றிகரமாக நிறைவேற வேண்டும். அதில் எதாவது ஒரு பக்கம் தோல்வியடைந்தால், எந்த மாற்றமும் database-இல் நிகழக்கூடாது.
      System crash ஏற்பட்டால் (உதாரணமாக, Account A-யில் இருந்து $500 debit செய்யப்பட்ட பிறகு Account B-க்கு credit செய்யும் முன்பு), rollback மூலம் Account A-யின் நிலை முந்தைய நிலைக்கு திரும்ப வேண்டும்.
      Atomicity இல்லையெனில், கீழ்கண்ட நிலைகள் ஏற்படலாம்:

      Account A-யில் இருந்து $500 குறைக்கப்படும் ஆனால் Account B-க்கு அது சேர்க்கப்படாது.


      b. Database Schema

      • Data-வை எப்படி structure செய்வது என்பதை வரையறுக்கிறது (உதாரணம்: tables, fields, relationships).

      c. Data Definition Language (DDL)

      • Database schema-ஐ வரையறுக்கும் மொழி.
      • உதாரணம்:
      CREATE TABLE Customers (ID INT, Name VARCHAR(50), Age INT);
      

      d. Data Manipulation Language (DML)

      • Data-வை Insert, Update, Delete, Select போன்றவை செய்ய உதவும்:
        • INSERT: புதிய தகவலைச் சேர்க்க.
        • UPDATE: உள்ள தகவலை மாற்ற.
        • DELETE: தேவையற்ற தகவலை அகற்ற.
        • SELECT: தரவை தேடி பெற.

      e. Metadata

      • Data பற்றி தகவல் (e.g., structure, constraints).

      f. Database Users

      • End-users: GUI அல்லது application மூலம் பயன்படுத்துவோர்.
      • DBA (Database Administrator): பாதுகாப்பு மற்றும் செயல்திறனை மேம்படுத்துவோர்.
      • Developers: Database-ஐ அடிப்படையாகக் கொண்டு செயலிகள் உருவாக்குவோர்.

      g. Query Processor

      • பயனர்களின் queries-ஐ database-க்கு புரியும் commands-ஆக மாற்றும்.

      h. Transaction Management

      • Transactions-ஐ பாதுகாப்பாக நிர்வகிக்கும்:
        • Atomicity: முழு transaction அல்லது எதுவும் இல்லை.
        • Consistency: Data சரியாக இருப்பதை உறுதி.
        • Isolation: ஒருவரின் transaction மற்றவரை பாதிக்கக்கூடாது.
        • Durability: Data நிச்சயமாகச் சேமிக்கப்படும்.

      3. DBMS Architecture

      DBMS-ஐ கீழே காட்டப்பட்டுள்ள architecture-களின் அடிப்படையில் அமைக்கலாம்:

      a. 1-Tier Architecture

      • DBMS மற்றும் database ஒரே machine-ல் இருக்கும். Single-user applications-க்கு ஏற்றது.

      b. 2-Tier Architecture

      • Client DBMS server-இன் மேல் நேரடியாக செயல்படும். சிறிய மற்றும் நடுத்தர அமைப்புகளில் பயன்படுத்தப்படும்.

      c. 3-Tier Architecture

      மூன்று அடுக்குகளைக் கொண்டது:

      1. Presentation Layer: GUI அல்லது web interfaces.
      2. Application Layer: Business logic (server-ல் இயங்கும்).
      3. Database Layer: Database மற்றும் DBMS.

      4. DBMS வகைகள்

      a. Relational DBMS (RDBMS)

      • Data-வை tables-இல் rows மற்றும் columns வடிவில் அமைக்கிறது.
      • SQL மூலம் செயல்படும்.
      • உதாரணங்கள்: MySQL, PostgreSQL, Oracle DB.
      • Advantages: எளிய querying, துல்லியமான structure, data integrity.

      Relational DBMSImage Source

      b. Hierarchical DBMS

      • Data-வை tree-like structure-ஆக அமைக்கிறது.
      • Example: IBM's IMS.
      • Usage: File systems.

      Hierarchical DBMSImage Source

      c. Network DBMS

      • Data-வை graph-ஆக நிறுவுகிறது, பல parent-child relationships ஐ ஆதரிக்கிறது.
      • Example: Integrated Data Store (IDS).

      Network DBMSImage Source

      d. Object-Oriented DBMS

      • Data-வை objects வடிவில் சேமிக்கிறது.
      • Examples: db4o, ObjectDB.

      e. NoSQL DBMS

      • Flexible schema கொண்டது; பெரிய அளவிலான மற்றும் அமைப்பில்லாத data-க்கு உகந்தது.
      • Types:
        • Document-based (MongoDB)
        • Key-Value Stores (Redis)
        • Column Stores (Cassandra)
        • Graph Databases (Neo4j).

      5. DBMS-இன் சிறப்பம்சங்கள்

      • Data Independence: Data structure-ல் மாற்றங்கள் applications-ஐ பாதிக்காது.
      • Data Security: Authentication, Access Control போன்றவை உண்டு.
      • Multi-user Support: பல பயனர்களின் ஒரே நேரத்திலான access-ஐ ஆதரிக்கிறது.
      • Backup and Recovery: Data-ஐ பாதுகாக்க உதவும்.
      • Data Consistency: Primary keys, Foreign keys போன்ற constraints மூலம் தரவின் துல்லியம் பாதுகாக்கப்படும்.

      6. DBMS-இன் நன்மைகள்

      1. Reduces Redundancy: Data duplication குறைக்கிறது.
      2. Ensures Data Integrity: Data துல்லியமாக இருக்கும்.
      3. Improves Accessibility: Data-ஐ எளிதாகத் தேட முடியும்.
      4. Enhances Collaboration: பலர் ஒரே நேரத்தில் data-ஐ அணுகலாம்.
      5. Automates Backup: Data loss-ஐ தடுக்கிறது.
      6. Scalability: Data அதிகமானால் கூட efficient-ஆக இயங்கும்.

      7. DBMS-இன் பயன்பாடுகள்

      a. Banking Systems

      • Accounts, Transactions மற்றும் பயனரின் தரவுகளை நிர்வகிக்கிறது.
      • Data security-ஐ உறுதிசெய்யும்.

      b. E-Commerce

      • Inventory, Orders, Customer Profiles நிர்வகிக்கிறது.
      • Example: Amazon, Flipkart போன்ற online platforms.

      c. Healthcare

      • Patient Records, Appointments மற்றும் மருந்து பரிந்துரைகளைச் சேமிக்கிறது.
      • Data confidentiality-ஐ பாதுகாக்கிறது.

      d. Education

      • Student Records, Courses, Grades ஆகியவற்றை நிர்வகிக்கிறது.

      e. Telecommunications

      • Call Records, Billing, Customer Management.

      8. DBMS பயன்படுத்தும் போது சவால்கள்

      • Cost: நிறுவுதல் மற்றும் பராமரிப்பு செலவு அதிகமாக இருக்கும்.
      • Complexity: திறமையான administrators தேவை.
      • Performance: சில நேரங்களில் file systems-ஐ விட மெதுவாக இயங்கும்.
      • Scalability Issues: சில பழைய DBMS-கள் பெரிய அளவிலான data-ஐ கையாள முடியாது.

      01. தரவுத்தளம் எவ்வாறு உருவானது, அதன் தேவை என்ன? How did the database come about, What is its need?

      By: Vijayan S
      18 November 2024 at 16:15

      தரவுத்தளம் எவ்வாறு உருவானது?
      தரவுத்தளங்கள் (Databases) என்பது 1960-1970களில் உருவான தொழில்நுட்பங்கள் ஆகும். ஆரம்பத்தில், தகவல்களை காகிதங்களில் அல்லது எலக்ட்ரானிக் வழிகளில் நகலெடுத்து சேமிப்பது பொதுவாக இருந்தது. ஆனால், தகவல்களை அதிகமாக சேமிப்பதும், அவற்றை எளிதாக அணுகுவது மற்றும் நிர்வகிப்பதும் கடினமாக இருந்தது. இதனால்தான் தரவுத்தளங்கள் வளர்ந்து வந்தன.

      தரவுத்தளங்களின் வளர்ச்சி:

      1. பாரம்பரிய நிரல்களை பயன்படுத்தி தரவு சேமிப்பு:
        ஆரம்பத்தில், தரவுகள் காகித வடிவில் அல்லது அட்டவணைகள் (Table) போன்ற பொருள்களில் சேமிக்கப்பட்டன. இதனால், தரவு பிரச்சனைகள் மற்றும் தரவு மீட்டெடுப்பதில் சிக்கல்கள் ஏற்பட்டன.

      2. Hierarchical and Network Models (1960-1970கள்):
        இதன் மூலம் பின்பற்றப்பட்டிருந்தது ஒரு கட்டமைப்பான தரவு தொகுப்புகள் ஆகும். இவை சில காலமாக பயன்படுத்தப்பட்டாலும், அவை வெற்றிகரமாக இருந்தன என்றாலும் பின்பு அவை தரவை எளிதாக அணுக முடியாத வகையில் இருந்தன.

      3. Relational Database Model (1970கள்):
        எட்வர்டு Codd என்ற கணினி விஞ்ஞானி 1970-இல் பீடினிய (Relational) தரவுத்தள மாதிரியை அறிமுகப்படுத்தினார். இது தரவுகளுக்கிடையேயான தொடர்புகளை எளிதாக அமைக்கவும், SQL (Structured Query Language) என்ற மொழியை பயன்படுத்தி தரவை எளிதாக அணுகவும் உதவியது. இதில், தரவை அட்டவணைகளில் (tables) சேமித்து, அவை இடையே உறவுகளை (relationships) உருவாக்க முடியும்.

      4. Modern Databases (1990களின் பிறகு):
        1990களில், முக்கிய தரவுத்தளங்கள் (MySQL, PostgreSQL, Oracle) உருவானதும், NoSQL போன்ற புதிய வகையான தரவுத்தளங்கள் (MongoDB, Cassandra) பிறந்ததும், தரவின் அளவு மற்றும் தேவைகளுக்கு ஏற்ப புதிய வடிவங்களில் தரவுத்தளங்கள் வளர்ச்சி பெற்றன.

      தரவுத்தளங்களின் தேவை
      தரவுத்தளங்களின் தேவை மிகப்பெரியது, ஏனெனில் அவை தரவுகளை எளிதாக சேமிக்க, அணுக, பராமரிக்க, பாதுகாக்க, மற்றும் பகுப்பாய்வு செய்ய உதவுகின்றன. தற்போது தரவுத்தளங்கள் பன்முக துறைகளில் பயன்படுத்தப்படுகின்றன, அதாவது தொழில்நுட்பம், வணிகம், கல்வி, அரசியல், மருத்துவம் போன்ற பல துறைகளில் அவை முக்கிய பங்கு வகிக்கின்றன. கீழே தரவுத்தளங்களின் முக்கிய தேவைகள் குறித்து விரிவாக விளக்கப்படுகிறது:

      1. தரவு சேமிப்பு மற்றும் ஒழுங்கு

      தரவு சேமிப்பு: தரவுத்தளங்கள் மூலம் பல கோடி, கோடிக்கும் மேற்பட்ட தரவுகளை ஒரே இடத்தில் ஒழுங்குபடுத்தி சேமிக்க முடியும். இது குறிப்பாக பெரிய நிறுவனங்கள் மற்றும் இணையதளங்கள், வணிக அமைப்புகள் போன்றவற்றுக்கு முக்கியமானது.

      ஒழுங்கு: தரவுத்தளங்களில் தரவை அட்டவணைகளாக (tables) அல்லது கட்டமைப்புகளாக (structures) ஒழுங்குபடுத்துவதால் தரவுகள் எளிதாக கையாளப்படுகின்றன.

      2. தரவு அணுகல் மற்றும் மீட்டெடுப்பு

      விரைவான அணுகல்: தரவுத்தளங்களில் சேமிக்கப்பட்ட தரவை விரைவாக மற்றும் எளிதாக அணுக முடியும். வணிகத்தளங்கள், வங்கி கணக்குகள், இணைய சேவைகள் அனைத்திலும் தரவு அணுகலுக்கான தேவைகள் அதிகமாக இருக்கின்றன.

      அதிக அளவில் தரவு மீட்டெடுப்பு: தரவுத்தளங்கள் பெரிய அளவில், விரைவாக தரவுகளை மீட்டெடுக்க (retrieve) உதவுகின்றன.

      3. தரவு ஒருங்கிணைப்பு (Data Consistency)

      ஒரே தரவினை பயன்படுத்துதல்: பல இடங்களில் பரவியுள்ள தரவுகளுக்கிடையில் ஒரே தரவின் புதுப்பிப்புகளை (updates) ஒருங்கிணைக்கும் திறன் தரவுத்தளங்களுக்குப் முக்கியமானது.

      ஒரே மாதிரியில் தரவு பராமரிப்பு: தரவுத்தளங்கள் தரவு ஒருங்கிணைப்பை (data normalization) செய்கின்றன, இதனால் தரவு பிழைகள் மற்றும் மறுமொழிகள் (redundancies) தவிர்க்கப்படுகின்றன.

      4. தரவு பாதுகாப்பு மற்றும் அனுமதிகள்

      பயனர் பாதுகாப்பு: தரவுத்தளங்களில் தகவல்கள் காப்பு மற்றும் குறியாக்கம் (encryption) மூலம் பாதுகாக்கப்படுகின்றன. அதனால் உள்நுழைவதற்கான அனுமதியுடன் மட்டுமே பயனர்கள் தரவை அணுக முடியும்.

      பயனர் அனுமதிகள்: பயனர்களுக்கான அனுமதிகளை (permissions) நிர்வகிக்கும்போது, குறிப்பிட்ட தரவு குறிப்பட்ட பயனருக்கு மட்டுமே கிடைக்கின்றது.

      5. தரவு மீட்பு (Backup and Recovery)

      தரவு பிழைகள் மற்றும் இழப்புகள்: கணினி செயலிழக்கும் அல்லது தவறாக செயல்படும் போது, தரவுத்தளங்கள் தங்களின் தரவு மீட்பு (backup) மற்றும் மீட்டெடுப்புக் (recovery) முறைமைகள் மூலம் தரவை மீண்டும் பெற முடியும்.

      செயல்பாட்டு தொடர்ச்சி: வேறு வழிகளில் தரவு இழப்புகள் ஏற்பட்டால், தரவுத்தளம் தானாகவே அதனை மீட்டெடுக்க முடியும், இதனால் நிறுவனங்கள் அல்லது பயன்பாடுகள் தொடர்ந்தும் இயங்கும்.

      6. ஒத்திசைவு கட்டுப்பாடு (Concurrency Control)

      பல பயனர்களின் அணுகல்: பல பயனர்கள் ஒரே நேரத்தில் தரவை அணுகும்போது, தரவுத்தளம் அதனை ஒத்திசைவு கட்டுப்பாடு முறைகள் மூலம் கையாள்கின்றது. இதில் ஒவ்வொரு பயனருக்கும் தனிப்பட்ட அனுமதிகள் மற்றும் சரியான தரவுத்தொகுப்புகளை அளிக்கின்றது.

      பயனருக்கிடையே ஒப்பந்தப்படுத்தல்: சில சந்தர்ப்பங்களில், பல பயனர்கள் ஒரே தரவை ஒரே நேரத்தில் மாற்றினால், அது மோதலை (conflict) ஏற்படுத்தக்கூடும். இந்த மோதலை சரிசெய்யும் திறன் தரவுத்தளங்களில் உள்ளது.

      7. தரவு பகுப்பாய்வு மற்றும் அறிக்கைகள்

      பயன்பாட்டு தரவு பகுப்பாய்வு: தரவுத்தளங்கள் உள்ள தரவை வணிகம் அல்லது ஆராய்ச்சி கருதுகோள்களில் எளிதாக பகுப்பாய்வு செய்ய உதவுகின்றன. SQL போன்ற மொழிகள் மூலம் பயனர்கள் பல்வேறு கேள்விகளை முன்மொழிந்து தரவு பகுப்பாய்வு செய்ய முடியும்.

      அறிக்கைகள் மற்றும் பட்டியல்: வணிகங்கள் மற்றும் நிறுவனங்கள் தரவுத்தளங்களை பயன்படுத்தி பல தரவுகளின் அடிப்படையில் மாதாந்திர அறிக்கைகள் மற்றும் பட்டியல்களை உருவாக்க முடியும்.

      8. பெரிய அளவு தரவுகளை கையாளுதல் (Scalability)

      பெரிய அளவு தரவு: தரவுத்தளங்கள் பெரும்பாலும் அதிகமான தரவுகளை எளிதாக கையாள முடியும். இது குறிப்பாக இணையதளம், மிகப்பெரிய நிறுவனம் அல்லது e-commerce தளங்களில் அதிக பயனர்களுடன் தரவை எளிதாக பராமரிக்க உதவுகிறது.

      தரம் மற்றும் வேகத்தில் விரிவாக்கம்: தரவுத்தளங்கள் உயர்ந்த அளவிலான தரவை எளிதாக கையாளும் திறன் (scalability) வழங்குகின்றன.

      9. தொகுப்புகள் மற்றும் உறவுகள் (Relationships)

      தரவு தொடர்புகள்: தரவுத்தளங்களில் உள்ள விவரங்கள் இடையே உறவுகள் (relationships) உருவாக்கப்படுகின்றன. இது பல தரவு தொகுப்புகளை (tables) இணைத்து பரிமாற்ற (integration) மற்றும் பகுப்பாய்வு செய்ய உதவுகின்றது.

      பெரிய தரவு தொகுப்புகள்: (One-to-Many), (Many-to-Many) போன்ற பல உறவுகளை கொண்ட தரவுகளை பின்பற்ற முடியும்.

      10. கிளவுட் தரவுத்தளங்கள் (Cloud Databases)
      அனைத்திலும் அணுகல்: இன்று கிளவுட் தரவுத்தளங்களைப் பயன்படுத்தி தரவை அனைத்திடத்திலும் எளிதாக அணுக முடியும். அவை உயர் நிலை நம்பகத்தன்மை (high availability) மற்றும் பாதுகாப்பு (security) கொண்டுள்ளன.

      சில தரவுத்தளங்கள்:

      Relational Databases (RDBMS): MySQL, PostgreSQL, Oracle, SQL Server.

      NoSQL Databases: MongoDB, Cassandra, Firebase, CouchDB.

      Cloud Databases: Amazon RDS, Google Cloud SQL, Microsoft Azure SQL.

      In-memory Databases: Redis, Memcached.

      ❌
      ❌