Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

About SQL

By: vsraj80
31 January 2025 at 16:15

Structured Query Language

Relational Data-Base Management System

SQL is a Free Open Source Software

MySQL Client – front end MySQL Server – back end

Functions of SQL Client

  1. Validating the password and authenticating

2. Receiving input from client end and convert it as token and send to sql server

3. Getting the results from SQL server to user

Functions of SQL Server

SQL server consists 2 Major part

Receiving the request from client and return the response after processing

1.Management Layer

a.Decoding the data

b.Validating and parsing(analyzing) the data

c.Sending the catched queries to Storage Engine

2.Storage Engine

a.Managing Database,tables,indexes

b.sending the data to other shared SQL Server

Install SQL in Ubuntu

sudo apt-get install mysql-server

To make secure configure as below

sudo mysql_secure_installation

1.It used to removes Anonymous users

2.Allow the root only from the local host

3.Removing the test database

MySQL Configuration options

/etc/mysql is the MySQL configuration directory

To Start MySQL

sudo service mysql start

To Stop MySQL

sudo service mysql stop

To Restart MySQL

sudo service mysql restart

MySQL Clients

Normally we will use mysql in command line

But in linux we can access through following GUI

MySQL Work Bench

sudo apt-­get install MySQL­-workbench

MySQL Navigator

sudo apt­-get install MySQL­-navigator

EMMA

sudo apt­-get install emma

PHP MYAdmin

sudo aptitude install phpmyadmin

MySQL Admin

sudo apt­-get install MySQL­-admin

Kinds of MySQL

1.GUI based Desktop based application

2.Web based application

3.Shell based application -(text-only based applications)

To connect the server with MySQL client

mysql -u root -p

To connect with a particular host , user name, database name

mysql - h

mysql -u

mysql -p

if not given the above host/username/password , it will take default local server/ uinux user name and without password for authentication.

to find more options about mysql

mysql -?

to disconnect the client with server

exit

from page 33 to 39 need to understand and read agan.

Build a Product Rental App in Node.js

By: krishna
29 January 2025 at 10:47

Introduction

I created a website called Vinmeen that allows users to rent products for temporary needs at a low cost. The goal was to design a simple UI for users to easily rent things they need temporarily.

Technologies Used

  • Node.js & Express
  • Node Packages
    • Express
    • EJS
    • Nodemailer
    • Bcrypt
    • Multer
    • Sync-SQL
    • MySQL
  • MySQL

What I Learned from This Project

This project helped me understand how dynamic websites work and how template rendering is done. I used EJS for rendering templates, MySQL for database handling, and Bcrypt for securely storing user passwords through hashing. I also learned how to send email notifications with OTP and rent requests, among other things.

Hosting

I hosted the site using two different services

Website Hosting – Render.com

Render provides free hosting for experimentation and student projects. This plan has minimal resources, but it’s great for learning and testing.

MySQL Database – Filess.io:

Files.io offers a free MySQL database with a 10MB size limit and a maximum of 5 concurrent connections. It’s ideal for students and self-study projects, but not recommended for startups or businesses.

Links

Learning Notes #51 – Postgres as a Queue using SKIP LOCKED

11 January 2025 at 06:56

Yesterday, i came across a blog from inferable.ai https://www.inferable.ai/blog/posts/postgres-skip-locked, which walkthrough about using postgres as a queue. In this blog, i jot down notes on using postgres as a queue for future references.

PostgreSQL is a robust relational database that can be used for more than just storing structured data. With the SKIP LOCKED feature introduced in PostgreSQL 9.5, you can efficiently turn a PostgreSQL table into a job queue for distributed processing.

Why Use PostgreSQL as a Queue?

Using PostgreSQL as a queue can be advantageous because,

  • Familiarity: If you’re already using PostgreSQL, there’s no need for an additional message broker.
  • Durability: PostgreSQL ensures ACID compliance, offering reliability for your job processing.
  • Simplicity: No need to manage another component like RabbitMQ or Kafka

Implementing a Queue with SKIP LOCKED

1. Create a Queue Table

To start, you need a table to store the jobs,


CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    job_data JSONB NOT NULL,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This table has the following columns,

  • id: A unique identifier for each job.
  • job_data: The data or payload for the job.
  • status: Tracks the job’s state (‘pending’, ‘in_progress’, or ‘completed’).
  • created_at: Timestamp of job creation.

2. Insert Jobs into the Queue

Adding jobs is straightforward,


INSERT INTO job_queue (job_data)
VALUES ('{"task": "send_email", "email": "user@example.com"}');

3. Fetch Jobs for Processing with SKIP LOCKED

Workers will fetch jobs from the queue using SELECT ... FOR UPDATE SKIP LOCKED to avoid contention,

WITH next_job AS (
    SELECT id, job_data
    FROM job_queue
    WHERE status = 'pending'
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
UPDATE job_queue
SET status = 'in_progress'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.id, job_queue.job_data;

Key Points:

  • FOR UPDATE locks the selected row to prevent other workers from picking it up.
  • SKIP LOCKED ensures locked rows are skipped, enabling concurrent workers to operate without waiting.
  • LIMIT 1 processes one job at a time per worker.

4. Mark Jobs as Completed

Once a worker finishes processing a job, it should update the job’s status,


UPDATE job_queue
SET status = 'completed'
WHERE id = $1; -- Replace $1 with the job ID

5. Delete Old or Processed Jobs

To keep the table clean, you can periodically remove completed jobs,


DELETE FROM job_queue
WHERE status = 'completed' AND created_at < NOW() - INTERVAL '30 days';

Example Worker Implementation

Here’s an example of a worker implemented in Python using psycopg2


import psycopg2
from psycopg2.extras import RealDictCursor

connection = psycopg2.connect("dbname=yourdb user=youruser")

while True:
    with connection.cursor(cursor_factory=RealDictCursor) as cursor:
        cursor.execute(
            """
            WITH next_job AS (
                SELECT id, job_data
                FROM job_queue
                WHERE status = 'pending'
                FOR UPDATE SKIP LOCKED
                LIMIT 1
            )
            UPDATE job_queue
            SET status = 'in_progress'
            FROM next_job
            WHERE job_queue.id = next_job.id
            RETURNING job_queue.id, job_queue.job_data;
            """
        )

        job = cursor.fetchone()
        if job:
            print(f"Processing job {job['id']}: {job['job_data']}")

            # Simulate job processing
            cursor.execute("UPDATE job_queue SET status = 'completed' WHERE id = %s", (job['id'],))

        else:
            print("No jobs available. Sleeping...")
            time.sleep(5)

    connection.commit()

Considerations

  1. Transaction Isolation: Use the REPEATABLE READ or SERIALIZABLE isolation level cautiously to avoid unnecessary locks.
  2. Row Locking: SKIP LOCKED only skips rows locked by other transactions, not those locked within the same transaction.
  3. Performance: Regularly archive or delete old jobs to prevent the table from growing indefinitely. Consider indexing the status column to improve query performance.
  4. Fault Tolerance: Ensure that workers handle crashes or timeouts gracefully. Use a timeout mechanism to revert jobs stuck in the ‘in_progress’ state.
  5. Scaling: Distribute workers across multiple nodes to handle a higher job throughput.
  6. The SKIP LOCKED clause only applies to row-level locks – the required ROW SHARE table-level lock is still taken normally.
  7. Using SKIP LOCKED provides an inconsistent view of the data by design. This is why it’s perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required.

Learning Notes #50 – Fixed Partition Pattern | Distributed Pattern

9 January 2025 at 16:51

Today, i learnt about fixed partition, where it handles about balancing the data among servers without high movement of data. In this blog, i jot down notes on how fixed partition helps in solving the problem.

This entire blog is inspired from https://www.linkedin.com/pulse/distributed-systems-design-pattern-fixed-partitions-retail-kumar-v-c34pc/?trackingId=DMovSwEZSfCzKZEKa7yJrg%3D%3D

Problem Statement

In a distributed key-value store system, data items need to be mapped to a set of cluster nodes to ensure efficient storage and retrieval. The system must satisfy the following requirements,

  1. Uniform Distribution: Data should be evenly distributed across all cluster nodes to avoid overloading any single node.
  2. Deterministic Mapping: Given a data item, the specific node responsible for storing it should be determinable without querying all the nodes in the cluster.

A common approach to achieve these goals is to use hashing with a modulo operation. For example, if there are three nodes in the cluster, the key is hashed, and the hash value modulo the number of nodes determines the node to store the data. However, this method has a critical drawback,

Rebalancing Issue: When the cluster size changes (e.g., nodes are added or removed), the mapping for most keys changes. This requires the system to move almost all the data to new nodes, leading to significant overhead in terms of time and resources, especially when dealing with large data volumes.

Challenge: How can we design a mapping mechanism that minimizes data movement during cluster size changes while maintaining uniform distribution and deterministic mapping?

Solution

There is a concept of Fixed Partitioning,

What Is Fixed Partitioning?

This pattern organizes data into a predefined number of fixed partitions that remain constant over time. Data is assigned to these partitions using a hashing algorithm, ensuring that the mapping of data to partitions is permanent. The system separates the fixed partitioning of data from the physical servers managing these partitions, enabling seamless scaling.

Key Features of Fixed Partitioning

  1. Fixed Number of Partitions
    • The number of partitions is determined during system initialization (e.g., 8 partitions).
    • Data is assigned to these partitions based on a consistent hashing algorithm.
  2. Stable Data Mapping
    • Each piece of data is permanently mapped to a specific partition.
    • This eliminates the need for large-scale data reshuffling when scaling the system.
  3. Adjustable Partition-to-Server Mapping
    • Partitions can be reassigned to different servers as the system scales.
    • Only the physical location of the partitions changes; the fixed mapping remains intact.
  4. Balanced Load Distribution
    • Partitions are distributed evenly across servers to balance the workload.
    • Adding new servers involves reassigning partitions without moving or reorganizing data within the partitions.

Naive Example

We have a banking system with transactions stored in 8 fixed partitions, distributed based on a customer’s account ID.


CREATE TABLE transactions (
    id SERIAL PRIMARY KEY,
    account_id INT NOT NULL,
    transaction_amount NUMERIC(10, 2) NOT NULL,
    transaction_date DATE NOT NULL
) PARTITION BY HASH (account_id);

1. Create Partition


DO $$
BEGIN
    FOR i IN 0..7 LOOP
        EXECUTE format(
            'CREATE TABLE transactions_p%s PARTITION OF transactions FOR VALUES WITH (modulus 8, remainder %s);',
            i, i
        );
    END LOOP;
END $$;

This creates 8 partitions (transactions_p0 to transactions_p7) based on the hash remainder of account_id modulo 8.

2. Inserting Data

When inserting data into the transactions table, PostgreSQL automatically places it into the correct partition based on the account_id.


INSERT INTO transactions (account_id, transaction_amount, transaction_date)
VALUES (12345, 500.00, '2025-01-01');

The hash of 12345 % 8 determines the target partition (e.g., transactions_p5).

3. Querying Data

Querying the base table works transparently across all partitions


SELECT * FROM transactions WHERE account_id = 12345;

PostgreSQL automatically routes the query to the correct partition.

4. Scaling by Adding Servers

Initial Setup:

Suppose we have 4 servers managing the partitions,

  • Server 1: transactions_p0, transactions_p1
  • Server 2: transactions_p2, transactions_p3
  • Server 3: transactions_p4, transactions_p5
  • Server 4: transactions_p6, transactions_p7

Adding a New Server:

When a 5th server is added, we redistribute partitions,

  • Server 1: transactions_p0
  • Server 2: transactions_p1
  • Server 3: transactions_p2, transactions_p3
  • Server 4: transactions_p4
  • Server 5: transactions_p5, transactions_p6, transactions_p7

Partition Migration

  • During the migration, transactions_p5 is copied from Server 3 to Server 5.
  • Once the migration is complete, Server 5 becomes responsible for transactions_p5.

Benefits:

  1. Minimal Data Movement – When scaling, only the partitions being reassigned are copied to new servers. Data within partitions remains stable.
  2. Optimized Performance – Queries are routed directly to the relevant partition, minimizing scan times.
  3. Scalability – Adding servers is straightforward, as it involves reassigning partitions, not reorganizing data.

What happens when a new server is added then. Don’t we need to copy the data ?

When a partition is moved to a new server (e.g., partition_b from server_A to server_B), the data in the partition must be copied to the new server. However,

  1. The copying is limited to the partition being reassigned.
  2. No data within the partition is reorganized.
  3. Once the partition is fully migrated, the original copy is typically deleted.

For example, in PostgreSQL,

  • Export the Partition pg_dump -t partition_b -h server_A -U postgres > partition_b.sql
  • Import on New Server: psql -h server_B -U postgres -d mydb < partition_b.sql

Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres

6 January 2025 at 14:07

Today, I learnt about various locking mechanism to prevent double update. In this blog, i make notes on Shared Lock and Exclusive Lock for my future self.

What Are Locks in Databases?

Locks are mechanisms used by a DBMS to control access to data. They ensure that transactions are executed in a way that maintains the ACID (Atomicity, Consistency, Isolation, Durability) properties of the database. Locks can be classified into several types, including

  • Shared Locks (S Locks): Allow multiple transactions to read a resource simultaneously but prevent any transaction from writing to it.
  • Exclusive Locks (X Locks): Allow a single transaction to modify a resource, preventing both reading and writing by other transactions.
  • Intent Locks: Used to signal the type of lock a transaction intends to acquire at a lower level.
  • Deadlock Prevention Locks: Special locks aimed at preventing deadlock scenarios.

Shared Lock

A shared lock is used when a transaction needs to read a resource (e.g., a database row or table) without altering it. Multiple transactions can acquire a shared lock on the same resource simultaneously. However, as long as one or more shared locks exist on a resource, no transaction can acquire an exclusive lock on that resource.


-- Transaction A: Acquire a shared lock on a row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Transaction B: Acquire a shared lock on the same row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Both transactions can read the row concurrently
-- Transaction C: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 1;
-- Transaction C will be blocked until Transactions A and B release their locks

Key Characteristics of Shared Locks

1. Concurrent Reads

  • Shared locks allow multiple transactions to read the same resource at the same time.
  • This is ideal for operations like SELECT queries that do not modify data.

2. Write Blocking

  • While a shared lock is active, no transaction can modify the locked resource.
  • Prevents dirty writes and ensures read consistency.

3. Compatibility

  • Shared locks are compatible with other shared locks but not with exclusive locks.

When Are Shared Locks Used?

Shared locks are typically employed in read operations under certain isolation levels. For instance,

1. Read Committed Isolation Level:

  • Shared locks are held for the duration of the read operation.
  • Prevents dirty reads by ensuring the data being read is not modified by other transactions during the read.

2. Repeatable Read Isolation Level:

  • Shared locks are held until the transaction completes.
  • Ensures that the data read during a transaction remains consistent and unmodified.

3. Snapshot Isolation:

  • Shared locks may not be explicitly used, as the DBMS creates a consistent snapshot of the data for the transaction.

    Exclusive Locks

    An exclusive lock is used when a transaction needs to modify a resource. Only one transaction can hold an exclusive lock on a resource at a time, ensuring no other transactions can read or write to the locked resource.

    
    -- Transaction X: Acquire an exclusive lock to update a row
    BEGIN;
    UPDATE employees SET salary = salary + 1000 WHERE id = 2;
    -- Transaction Y: Attempt to read the same row
    BEGIN;
    SELECT * FROM employees WHERE id = 2;
    -- Transaction Y will be blocked until Transaction X completes
    -- Transaction Z: Attempt to update the same row
    BEGIN;
    UPDATE employees SET salary = salary + 500 WHERE id = 2;
    -- Transaction Z will also be blocked until Transaction X completes
    

    Key Characteristics of Exclusive Locks

    1. Write Operations: Exclusive locks are essential for operations like INSERT, UPDATE, and DELETE.

    2. Blocking Reads and Writes: While an exclusive lock is active, no other transaction can read or write to the resource.

    3. Isolation: Ensures that changes made by one transaction are not visible to others until the transaction is complete.

      When Are Exclusive Locks Used?

      Exclusive locks are typically employed in write operations or any operation that modifies the database. For instance:

      1. Transactional Updates – A transaction that updates a row acquires an exclusive lock to ensure no other transaction can access or modify the row during the update.

      2. Table Modifications – When altering a table structure, the DBMS may place an exclusive lock on the entire table.

      Benefits of Shared and Exclusive Locks

      Benefits of Shared Locks

      1. Consistency in Multi-User Environments – Ensure that data being read is not altered by other transactions, preserving consistency.
      2. Concurrency Support – Allow multiple transactions to read data simultaneously, improving system performance.
      3. Data Integrity – Prevent dirty reads and writes, ensuring that operations yield reliable results.

      Benefits of Exclusive Locks

      1. Data Integrity During Modifications – Prevents other transactions from accessing data being modified, ensuring changes are applied safely.
      2. Isolation of Transactions – Ensures that modifications by one transaction are not visible to others until committed.

      Limitations and Challenges

      Shared Locks

      1. Potential for Deadlocks – Deadlocks can occur if two transactions simultaneously hold shared locks and attempt to upgrade to exclusive locks.
      2. Blocking Writes – Shared locks can delay write operations, potentially impacting performance in write-heavy systems.
      3. Lock Escalation – In systems with high concurrency, shared locks may escalate to table-level locks, reducing granularity and concurrency.

      Exclusive Locks

      1. Reduced Concurrency – Exclusive locks prevent other transactions from accessing the locked resource, which can lead to bottlenecks in highly concurrent systems.
      2. Risk of Deadlocks – Deadlocks can occur if two transactions attempt to acquire exclusive locks on resources held by each other.

      Lock Compatibility

      Learning Notes #28 – Unlogged Table in Postgres

      2 January 2025 at 17:30

      Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

      Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

      Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

      What Are Unlogged Tables?

      Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

      In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

      The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

      Key Characteristics of Unlogged Tables

      1. No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
      2. No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
      3. Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
      4. Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
      5. Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

      Drawbacks of Unlogged Tables

      1. Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
      2. Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
      3. No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

      Creating an Unlogged Table

      Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.

      
      CREATE UNLOGGED TABLE temp_data (
          id SERIAL PRIMARY KEY,
          name VARCHAR(100),
          value INT
      );
      
      

      In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

      When to Avoid Unlogged Tables?

      • If you are working with critical data that needs to be durable and persistent across restarts.
      • If your application requires data replication, as unlogged tables are not replicated in standby servers.
      • If your workload involves frequent crash scenarios where data loss cannot be tolerated.

      Examples

      1. Temporary Storage for processing

      
      CREATE UNLOGGED TABLE etl_staging (
          source_id INT,
          raw_data JSONB,
          processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );
      
      -- Insert raw data into the staging table
      INSERT INTO etl_staging (source_id, raw_data)
      VALUES 
          (1, '{"key": "value1"}'),
          (2, '{"key": "value2"}');
      
      -- Perform transformations on the data
      INSERT INTO final_table (id, key, value)
      SELECT source_id, 
             raw_data->>'key' AS key, 
             'processed_value' AS value
      FROM etl_staging;
      
      -- Clear the staging table
      TRUNCATE TABLE etl_staging;
      
      

      2. Caching

      
      CREATE UNLOGGED TABLE user_sessions (
          session_id UUID PRIMARY KEY,
          user_id INT,
          last_accessed TIMESTAMP DEFAULT NOW()
      );
      
      -- Insert session data
      INSERT INTO user_sessions (session_id, user_id)
      VALUES 
          (uuid_generate_v4(), 101),
          (uuid_generate_v4(), 102);
      
      -- Update last accessed timestamp
      UPDATE user_sessions
      SET last_accessed = NOW()
      WHERE session_id = 'some-session-id';
      
      -- Delete expired sessions
      DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';
      

      Basic SQL Queries, Stored Proc, Function in PostgreSQL

      By: Sugirtha
      2 January 2025 at 06:17

      DDL, DML, DQL Queries:

      CREATE TABLE Employees (
          EmployeeID INTEGER PRIMARY KEY, 
          Name VARCHAR(50), 
          Age INTEGER, 
          DepartmentID INTEGER, 
          FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
      );
      
      INSERT INTO Employees(empid, ename, age, deptid) VALUES(1, 'Kavi', 32, 101), (2, 'Sugi', 30, 102);
      
      UPDATE Employees SET age=31 WHERE Name='Nila';
      
      DELETE FROM Employees WHERE Name='Nila';
      
      SELECT e.*, d.DepartmentName 
      FROM Employees e 
      JOIN Departments d ON e.DepartmentID = d.DepartmentID;
      

      SELECT e.EmpName AS Employee, m.EmpName AS Manager
      FROM Employees e
      JOIN Employees m ON e.ManagerID = m.EmpID;
      

      INNER JOIN:

      • Returns only the rows where there is a match between the columns in both tables.
      • If no match is found, the row is not included in the result.
      • It’s the most common type of join.

      OUTER JOIN:

      • Returns all rows from one or both tables, even if there is no match in the other table.
        • LEFT OUTER JOIN (or just LEFT JOIN): Returns all rows from the left table, and the matched rows from the right table. If no match, the result will have NULL values for columns from the right table.
        • RIGHT OUTER JOIN (or just RIGHT JOIN): Returns all rows from the right table, and the matched rows from the left table. If no match, the result will have NULL values for columns from the left table.
        • FULL OUTER JOIN: Returns all rows from both tables. If there is no match, the result will have NULL values for the non-matching table’s columns.

      GROUP BY:

      • Groups rows that have the same values in specified columns into summary rows (like finding the total count, sum, average, etc.).
      • It is typically used with aggregate functions such as COUNT(), SUM(), AVG(), MAX(), MIN().

      HAVING:

      • Used to filter records after the GROUP BY has been applied.
      • It works similarly to the WHERE clause, but WHERE is used for filtering individual rows before grouping, while HAVING filters the grouped results.
      SELECT DeptName, COUNT(*)
      FROM Employees
      GROUP BY DeptName;
      

      DISTINCT:

      • Used to remove duplicate rows from the result set based on the specified columns.
      • If you specify only one column, it will return the distinct values of that column.
      • If you specify multiple columns, the combination of values in those columns will be considered to determine uniqueness.
      SELECT DISTINCT DeptName FROM Employees;
      
      SELECT DISTINCT DeptName, EmpName FROM Employees;
      

      Difference between DELETE and TRUNCATE:

      • Removes rows one by one and logs each deletion, which can be slower for large datasets.
      • You can use a WHERE clause to specify which rows to delete.
      • Can be rolled back if you’re working within a transaction (assuming no COMMIT has been done).
      • Can fire triggers if there are any triggers defined on the table (for example, BEFORE DELETE or AFTER DELETE triggers).

      TRUNCATE:

      • Removes all rows in the table in one go, without scanning them individually.
      • Does not support a WHERE clause, so it always deletes all rows.
      • It’s much faster than DELETE because it doesn’t log individual row deletions (but it does log the deallocation of the table’s data pages).
      • Cannot be rolled back in most databases (unless in a transaction, depending on the DBMS), and there are no triggers involved.

      UNION:

      • Combines the results of two or more queries.
      • Removes duplicates: Only unique rows are included in the final result.
      • It performs a sort operation to eliminate duplicates, which can have a slight performance cost.

      UNION ALL:

      • Also combines the results of two or more queries.
      • Keeps duplicates: All rows from the queries are included in the final result, even if they are the same.
      • It doesn’t perform the sort operation, which usually makes it faster than UNION.
      SELECT EmpID, EmpName FROM Employees
      UNION ALL
      SELECT EmpID, EmpName FROM Contractors;
      
      SELECT EmpID, EmpName FROM Employees
      UNION 
      SELECT EmpID, EmpName FROM Contractors;
      

      COALESCE():

      First Non null value will be taken, For ex. in select statement, some names are null, that time some default value can be used or another field value.
      SELECT COALESCE(NULL, ‘Hello’, ‘World’);
      Output: Hello

      INSERT INTO users (name, nickname) VALUES
      (‘Alice’, NULL),
      (NULL, ‘Bob’),
      (NULL, NULL);

      SELECT id, COALESCE(name, nickname, ‘Unknown’) AS display_name FROM users;

      NULLIF()

      NULLIF(expression1, expression2)
      Returns null if both expressions or column values are equal, else return first the first column value, ie expression1
      SELECT NULLIF(10, 10); — Output: NULL
      SELECT NULLIF(10, 20); — Output: 10
      SELECT NULLIF(10, NULL) OR — Output: 10
      SELECT NULLIF(NULL, 10) — Output: NULL

      IF Condition:

      The IF statement is used to check conditions and execute SQL code accordingly.

      IF condition THEN
          -- Code to execute if the condition is true
      ELSIF condition THEN
          -- Code block to execute if another condition is true
      ELSE
          -- Code to execute if the condition is false
      END IF;
      
      IF NOT FOUND THEN
          RAISE NOTICE 'Employee with ID % not found!', emp_id;
          emp_bonus := 0;
      END IF;
      

      CASE WHEN:

      The CASE WHEN expression is used for conditional logic within a query (similar to IF but more flexible in SQL).

      SELECT 
          name,
          salary,
          CASE 
              WHEN salary > 5000 THEN 'High Salary'
              WHEN salary BETWEEN 3000 AND 5000 THEN 'Average Salary'
              ELSE 'Low Salary'
          END AS salary_category
      FROM employees;
      

      FOR LOOP:

      DECLARE 
          i INT;
      BEGIN
          FOR i IN 1..5 LOOP
              -- Perform an action for each iteration (e.g., insert or update a record)
              INSERT INTO audit_log (action, timestamp) 
              VALUES ('Employee update', NOW());
          END LOOP;
      END;
      

      FOR record IN SELECT column1, column2 FROM employees LOOP
      -- Code block using record.column1, record.column2
      END LOOP;
      

      RAISE – used for printing something (SOP in java)

      RAISE NOTICE ‘Employee: %, Salary: %’, emp_name, emp_salary;
      RAISE EXCEPTION ‘An error occurred: %’, error_message; — This will print and halt the execution.
      RAISE INFO ‘Employee: %, Salary: %’, emp_name, emp_salary;

      Stored Procedures in SQL:

      A stored procedure is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. Stored procedures are used for:

      • Modularizing complex SQL logic.
      • Improving performance by reducing network traffic.
      • Ensuring code reuse and security (by granting permissions to execute rather than to the tables directly).

      Example:

      A stored procedure to insert a new employee record:

      CREATE PROCEDURE add_employee(emp_name VARCHAR, emp_salary NUMERIC)
      LANGUAGE plpgsql AS 
      $$ 
      BEGIN 
        INSERT INTO employees (name, salary) VALUES (emp_name, emp_salary); 
      END; 
      $$;
      

      Execution:

      CALL add_employee(‘John Doe’, 50000);

      Functions in SQL:

      A SQL function is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. It is similar to a procedure but returns a single value or table. Functions are typically used for computations or transformations.
      Example: A function to calculate the yearly salary:

      CREATE FUNCTION calculate_yearly_salary(monthly_salary NUMERIC)
      RETURNS NUMERIC
      LANGUAGE plpgsql AS 
      $$
      BEGIN
        RETURN monthly_salary * 12;
      END;
      $$;
      

      Execution:

      SELECT calculate_yearly_salary(5000); OR EXECUTE calculate_yearly_salary(5000); (If we are using inside a trigger)

      Key Differences Between Procedures and Functions:

      Return Type:

      • Function: Always returns a value.
      • Procedure: Does not return a value.

      Usage:

      • Function: Can be used in SQL queries (e.g., SELECT).
      • Procedure: Called using CALL, cannot be used in SQL queries.

      Transaction Control:

      • Function: Cannot manage transactions.
      • Procedure: Can manage transactions (e.g., COMMIT, ROLLBACK).

      Side Effects:

      • Function: Should not have side effects (e.g., modifying data).
      • Procedure: Can modify data and have side effects.

      Calling Mechanism:

      Procedure: Called using CALL procedure_name().

      Function: Called within SQL expressions, like SELECT function_name().

      TRIGGER:

      A trigger is a special kind of stored procedure that automatically executes (or “fires”) when certain events occur in the database, such as INSERT, UPDATE, or DELETE. Triggers can be used to enforce business rules, validate data, or maintain audit logs.
      Key Points:

      Types of Triggers:

      • BEFORE Trigger: Fires before the actual operation (INSERT, UPDATE, DELETE).
      • AFTER Trigger: Fires after the actual operation.
      • INSTEAD OF Trigger: Used to override the standard operation, useful in views. (This is in SQL Server only not in postgres)

      • Trigger Actions: The trigger action can be an operation like logging data, updating related tables, or enforcing data integrity.
      • Trigger Events: A trigger can be set to fire on certain events, such as when a row is inserted, updated, or deleted.
      • Trigger Scope: Triggers can be defined to act on either a row (executing once for each affected row) or a statement (executing once for the entire statement).
      • A trigger can be created to log changes in a Users table whenever a record is updated, or it could prevent deleting a record if certain conditions aren’t met.

      Example:

      CREATE TRIGGER LogEmployeeAgeUpdate
      AFTER UPDATE ON Employees
      FOR EACH ROW
      BEGIN
          IF OLD.Age <> NEW.Age THEN
              INSERT INTO EmployeeLogs (EmployeeID, OldAge, NewAge)
              VALUES (OLD.EmployeeID, OLD.Age, NEW.Age);
          END IF;
      END;
      

      Example:

      CREATE OR REPLACE FUNCTION prevent_employee_delete()
      RETURNS TRIGGER AS 
      $$
      BEGIN
      -- Check if the employee is in a protected department (for example, department_id = 10)
        IF OLD.department_id = 10 THEN
           RAISE EXCEPTION 'Cannot delete employee in department 10';
        END IF;
        RETURN OLD;
      END;
      $$ 
      LANGUAGE plpgsql;
      
      -- Attach the function to a trigger
      CREATE TRIGGER prevent_employee_delete_trigger
      BEFORE DELETE ON Employees
      FOR EACH ROW
      EXECUTE FUNCTION prevent_employee_delete();
      

      Creates a trigger which is used to log age and related whenever insert, delete, update action on employee rows:

      CREATE OR REPLACE FUNCTION log_employee_changes()
      RETURNS TRIGGER AS 
      $$
      BEGIN
      -- Handle INSERT operation
        IF (TG_OP = 'INSERT') THEN
          INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, NewAge,    ChangeTime)
          VALUES (NEW.EmployeeID, 'INSERT', NEW.Age, CURRENT_TIMESTAMP);
          RETURN NEW;
           -- Handle UPDATE operation
        ELSIF (TG_OP = 'UPDATE') THEN
          INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, NewAge, ChangeTime)
          VALUES (OLD.EmployeeID, 'UPDATE', OLD.Age, NEW.Age,  CURRENT_TIMESTAMP);
          RETURN NEW;
        -- Handle DELETE operation
        ELSIF (TG_OP = 'DELETE') THEN
          INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, ChangeTime)
          VALUES (OLD.EmployeeID, 'DELETE', OLD.Age, CURRENT_TIMESTAMP);
          RETURN OLD;
        END IF;
      RETURN NULL;
      END;
      $$
      LANGUAGE plpgsql;
      
      CREATE TRIGGER log_employee_changes_trigger
      AFTER INSERT OR UPDATE OR DELETE 
      ON Employees
      FOR EACH ROW
      EXECUTE FUNCTION log_employee_changes();
      

      Step 3: Attach the Trigger to the Employees Table

      Now that we have the function, we can attach it to the Employees table to log changes. We’ll create a trigger that fires on insert, update, and delete operations.

      TG_OP: This is a special variable in PostgreSQL that holds the operation type (either INSERT, UPDATE, or DELETE).
      NEW and OLD: These are references to the row being inserted or updated (NEW) or the row before it was updated or deleted (OLD).
      EmployeeChangeLog: This table stores the details of the changes (employee ID, operation type, old and new values, timestamp). – Programmer defined.

      What happens when you omit FOR EACH ROW?

      1. Statement-Level Trigger: The trigger will fire once per SQL statement, regardless of how many rows are affected. This means it won’t have access to the individual rows being modified.
        • For example, if you run an UPDATE statement that affects 10 rows, the trigger will fire once (for the statement) rather than for each of those 10 rows.
      2. No Access to Row-Specific Data: You won’t be able to use OLD or NEW values to capture the individual row’s data. The trigger will just execute as a whole, without row-specific actions.
      3. With FOR EACH ROW: The trigger works on each row affected, and you can track specific changes (e.g., old vs new values).Without FOR EACH ROW: The trigger fires once per statement and doesn’t have access to specific row data.
      CREATE TRIGGER LogEmployeeAgeUpdate
      AFTER UPDATE ON Employees
      BEGIN
          -- Perform some operation, but it won't track individual rows.
          INSERT INTO AuditLogs (EventDescription)
          VALUES ('Employees table updated');
      END;
      

      NORMALIZATION:

      1st NF:
      1. Each column/attribute should have atomic value or indivisible value, ie only one value.
      2. Rows should not be repeated, ie unique rows, there is not necessary to have PKey here.
      2nd NF:
      1. Must fulfill the 1st NF. [cadidate key(composite key to form the uniqueness)]
      2. All non-candidate-key columns should be fully dependent on the each attribute/column of the composite keys to form the cadidate key. For ex. If the DB is in denormalalized form (ie before normalization, all tables and values are together in a single table) and the candidate key is (orderId+ProductId), then the non-key(not part of the candidate key) if you take orderdate, orderedStatus, qty, item_price are not dependent on each part of the candidate key ie it depends only orderId, not ProductId, ProductName are not dependent on Order, like that customer details are not dependent on ProductId. So only related items should be there in a table, so the table is partitioned based on the column values, so that each attribute will depend on its candidate key.
        So Products goto separate table, orders separate and customers going to separate table.
      3. Primary key is created based for each separated table and ensure that all non-key columns completely dependent on the primary key. Then the foreign key relationships also established to connect all the tablesis not fullly dependent on.
      3rd NF:
      1. Must fulfill till 2ndNF.
      2. Remove the transitional dependency (In a decentralized DB, One column value(Order ID) is functionally dependent on another column(Product ID) and OrderId is functionally dependent on the OrderId, so that disturbing one value will affect another row with same column value), so to avoid that separate the table, for Ex. from orders table Sales People’s data is separated.

      What is a Transitive Dependency? Let’s break this down with a simple example:
      StudentID Department HODName
      S001 IT Dr. Rajan
      S002 CS Dr. Priya

      Primary Key: StudentID
      Non-prime attributes: Department, HODName

      StudentID → Department (StudentID determines the department).
      Department → HODName (Department determines the HOD name). It should be like StudentID only should determine HOD, not the dept. HODName depends indirectly on StudentID through Department.

      This is a transitive dependency, and we need to remove it.

      A transitive dependency means a non-prime attribute (not part of the candidate key) depends indirectly on the primary key through another non-prime attribute.

      Reference: https://www.youtube.com/watch?v=rBPQ5fg_kiY and Learning with the help of chatGPT

      Learning Notes #20 – Partitioning (data) With Postgres

      31 December 2024 at 06:55

      Early Morning today, i watched a video on partitioning and sharding. In that video, Arpit explained the limitation of Vertical Scaling and ways to infinite scale DB with Sharding and Partitioning. In this blog, i jot down notes on partioining with single node implementation with postgres for my future self.

      As the volume of data grows, managing databases efficiently becomes critical and when we understood that vertical scaling has its limits, we have two common strategies to handle large datasets are partitioning and sharding. While they may sound similar, these techniques serve different purposes and are implemented differently. Let’s explore these concepts in detail.

      What is Partitioning?

      Partitioning involves dividing a large dataset into smaller, manageable segments, known as partitions. Each partition is stored separately but remains part of a single database instance. Partitioning is typically used to improve query performance and manageability.

      Types of Partitioning

      1. Range Partitioning

      • Data is divided based on ranges of a column’s values.
      • Example: A table storing customer orders might partition data by order date: January orders in one partition, February orders in another.

      PostgreSQL Example

      CREATE TABLE orders (
          id SERIAL,
          customer_id INT,
          order_date DATE NOT NULL,
          PRIMARY KEY (id, order_date) -- Include the partition key
      ) PARTITION BY RANGE (order_date);
      
      CREATE TABLE orders_jan PARTITION OF orders
          FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
      
      CREATE TABLE orders_feb PARTITION OF orders
          FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
      
      

      2. Hash Partitioning

      • A hash function determines the partition where a record will be stored.
      • Example: Orders can be distributed across partitions based on the hash of the customer ID.

      Postgres Example

      CREATE TABLE orders (
          id SERIAL ,
          customer_id INT,
          order_date DATE NOT NULL,
          PRIMARY KEY (id, customer_id)
      ) PARTITION BY HASH (customer_id, id);
      
      CREATE TABLE orders_part_1 PARTITION OF orders
          FOR VALUES WITH (MODULUS 2, REMAINDER 0);
      
      CREATE TABLE orders_part_2 PARTITION OF orders
          FOR VALUES WITH (MODULUS 2, REMAINDER 1);
      

      3. List Partitioning

      • Data is divided based on a predefined list of values.
      • Example: A table storing sales data could partition based on regions: North, South, East, and West

      Postgres Example

      CREATE TABLE sales (
          id SERIAL ,
          region TEXT NOT NULL,
          amount NUMERIC,
          PRIMARY KEY (id, region)
      ) PARTITION BY LIST (region);
      
      CREATE TABLE sales_north PARTITION OF sales
          FOR VALUES IN ('North');
      
      CREATE TABLE sales_south PARTITION OF sales
          FOR VALUES IN ('South');
      

      4. Composite Partitioning

      • Combines two or more partitioning strategies, such as range and list partitioning.
      • Example: A table partitioned by range on order date and sub-partitioned by list on region.

      Postgres Example

      CREATE TABLE orders (
          id SERIAL,
          customer_id INT,
          order_date DATE NOT NULL,
          region TEXT NOT NULL,
          PRIMARY KEY (id, order_date, region)
      ) PARTITION BY RANGE (order_date);
      
      CREATE TABLE orders_2024 PARTITION OF orders
          FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
          PARTITION BY LIST (region);
      
      CREATE TABLE orders_2024_north PARTITION OF orders_2024
          FOR VALUES IN ('North');
      
      CREATE TABLE orders_2024_south PARTITION OF orders_2024
          FOR VALUES IN ('South');
      
      

      Query Optimization

      By: Sugirtha
      29 December 2024 at 09:35

      Query Optimization:

      Query Optimization is the process of improving the performance of a SQL query by reducing the amount of time and resources (like CPU, memory, and I/O) required to execute the query. The goal is to retrieve the desired data as quickly and efficiently as possible.

      Important implementation of Query Optimization:

      1. Indexing: Indexes on frequently used columns: As you mentioned, indexing columns that are part of the WHERE, JOIN, or ORDER BY clauses can significantly improve performance. For example, if you’re querying a salary column frequently, indexing it can speed up those queries.
        Composite indexes: If a query filters by multiple columns, a composite index on those columns might improve performance. For instance, INDEX (first_name, last_name) could be more efficient than two separate indexes on first_name and last_name.
      2. Instead of SELECT * FROM, can use the required columns and use of LIMIT for the required no. of rows.
      3. Optimizing JOIN Operations: Use appropriate join types: For example, avoid OUTER JOIN if INNER JOIN would suffice. Redundant or unnecessary joins increase query complexity and processing time.
      4. Use of EXPLAIN to Analyze Query Plan:
        Running EXPLAIN before a query allows you to understand how the database is executing it. You can spot areas where indexes are not being used, unnecessary full table scans are happening, or joins are inefficient.

      How to Implement Query Optimization:

      1. Use Indexes:
      • Create indexes on columns that are frequently queried or used in JOIN, WHERE, or ORDER BY clauses. For example, if you frequently query a column like user_id, an index on user_id will speed up lookups. Use multi-column indexes for queries involving multiple columns.
      • CREATE INDEX idx_user_id ON users(user_id);

      2. Rewrite Queries:

      • Avoid using SELECT * and instead select only the necessary columns.
      • Break complex queries into simpler ones and use temporary tables or Common Table Expressions (CTEs) if needed.
      • SELECT name, age FROM users WHERE age > 18;

      3. Use Joins Efficiently:

      • Ensure that you are using the most efficient join type for your query (e.g., prefer INNER JOIN over OUTER JOIN when possible).
      • Join on indexed columns to speed up the process.

      4. Optimize WHERE Clauses:

      • Make sure conditions in WHERE clauses are selective and reduce the number of rows as early as possible.
      • Use AND and OR operators appropriately to filter data early in the query.

      5. Limit the Number of Rows:

      • Use the LIMIT clause when dealing with large datasets to fetch only a required subset of data.
      • Avoid retrieving unnecessary data from the database.

      6. Avoid Subqueries When Possible:

      • Subqueries can be inefficient because they often lead to additional scans of the same data. Use joins instead of subqueries when possible.
      • If you must use subqueries, try to write them in a way that they don’t perform repeated calculations.

      7. Analyze Execution Plans:

      • Use EXPLAIN to see how the database is executing your query. This will give you insights into whether indexes are being used, how tables are being scanned, etc.
      • Example:
      1. EXPLAIN SELECT * FROM users WHERE age > 18;

      8. Use Proper Data Types:

      1. Choose the most efficient data types for your columns. For instance, use INTEGER for numeric values rather than VARCHAR, which takes more space and requires more processing.

      9. Avoid Functions on Indexed Columns:

      1. Using functions like UPPER(), LOWER(), or DATE() on indexed columns in WHERE clauses can prevent the database from using indexes effectively.
      2. Instead, try to perform transformations outside the query or ensure indexes are used.

      10. Database Configuration:

      1. Ensure the database system is configured properly for the hardware it’s running on. For example, memory and cache settings can significantly affect query performance.

      Example of Optimized Query:

      Non-Optimized Query:

      SELECT * FROM orders
      WHERE customer_id = 1001
      AND order_date > '2023-01-01';

      This query might perform a full table scan if customer_id and order_date are not indexed.

      Optimized Query:

      CREATE INDEX idx_customer_order_date ON orders(customer_id, order_date);
      
      SELECT order_id, order_date, total_amount
      FROM orders
      WHERE customer_id = 1001
      AND order_date > '2023-01-01';

      In this optimized version, an index on customer_id and order_date helps the database efficiently filter the rows without scanning the entire table.

      Reference : Learnt from ChatGPT

      SQL – Postgres – Few Advance Topics

      By: Sugirtha
      29 December 2024 at 09:31

      The order of execution in a SQL query:

      FROM and/or JOIN
      WHERE
      GROUP BY
      HAVING
      SELECT
      DISTINCT
      ORDER BY
      LIMIT nad/or OFFSET

      Command Types:

      References : Aysha Beevi

      CAST()

      CAST is used to typecast or we can use ::target data type.

      SELECT ‘The current date is: ‘ || CURRENT_DATE::TEXT;
      SELECT ‘2024-12-21’::DATE::TEXT;
      SELECT CAST(‘2024-12-21’ AS DATE);

      || –> Concatenation operator

      DATE functions:

      SELECT CURRENT_DATE; — Output: 2024-12-21
      SELECT CURRENT_TIME; — Output: 09:15:34.123456+05:30
      SELECT NOW(); — Output: 2024-12-21 09:15:34.123456+05:30
      SELECT AGE(‘2020-01-01’, ‘2010-01-01’); — Output: 10 years 0 mons 0 days
      SELECT AGE(‘1990-05-15’); — Output: 34 years 7 mons 6 days (calculated from NOW())
      SELECT EXTRACT(YEAR FROM NOW()); — Output: 2024
      SELECT EXTRACT(MONTH FROM CURRENT_DATE); — Output: 12
      SELECT EXTRACT(DAY FROM TIMESTAMP ‘2024-12-25 10:15:00’); — Output: 25

      The DATE_TRUNC() function truncates a date or timestamp to the specified precision. This means it “resets” smaller parts of the date/time to their starting values.
      SELECT DATE_TRUNC(‘month’, TIMESTAMP ‘2024-12-21 10:45:30’);
      — Output: 2024-12-01 00:00:00 –> The ‘month’ precision resets the day to the 1st, and the time to 00:00:00.
      SELECT DATE_TRUNC(‘year’, TIMESTAMP ‘2024-12-21 10:45:30’);
      — Output: 2024-01-01 00:00:00
      SELECT DATE_TRUNC(‘day’, TIMESTAMP ‘2024-12-21 10:45:30’);
      — Output: 2024-12-21 00:00:00

      SELECT NOW() + INTERVAL ‘1 year’;
      — Output: Current timestamp + 1 year
      SELECT CURRENT_DATE – INTERVAL ’30 days’;
      — Output: Today’s date – 30 days
      SELECT NOW() + INTERVAL ‘2 hours’;
      — Output: Current timestamp + 2 hours
      SELECT NOW() + INTERVAL ‘1 year’ + INTERVAL ‘3 months’ – INTERVAL ’15 days’;

      Window Functions

      This is the function that will operate over the specified window. Common window functions include ROW_NUMBER(), RANK(), SUM(), AVG(), etc

      .PARTITION BY: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.ORDER BY: (Optional) Orders the rows in each partition before the window function is applied.

      window_function() OVER (--RANK() or SUM() etc. can come in window_function
          PARTITION BY column_name(s)
          ORDER BY column_name(s)
       );
      
      SELECT 
          department_id,
          employee_id,
          salary,
          SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) AS running_total
      FROM employees;
      

      CURSOR:

      DO $$
      DECLARE
      emp_name VARCHAR;
      emp_salary DECIMAL;
      emp_cursor CURSOR FOR SELECT name, salary FROM employees;
      BEGIN
      OPEN emp_cursor;
      LOOP
      FETCH emp_cursor INTO emp_name, emp_salary;
      EXIT WHEN NOT FOUND; — Exit the loop when no rows are left
      RAISE NOTICE ‘Employee: %, Salary: %’, emp_name, emp_salary;
      END LOOP;
      CLOSE emp_cursor;

      Basic Data Types in PostgreSQL

      TEXT, VARCHAR, CHAR: Working with strings.
      INTEGER, BIGINT, NUMERIC: Handling numbers.
      DATE, TIMESTAMP: Date and time handling.
      

      OVER CLAUSE

      In PostgreSQL, the OVER() clause is used in window functions to define a window of rows over which a function operates. Just create a serial number (Row_number) from 1 (Rows are already ordered by salary desc)
      SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
      FROM employees
      WHERE row_num <= 5;

      RANK()

      Parition the table records based on the dept id, then inside each partition order by salary desc with rank 1,2,3… – In RANK() if same salary then RANK repeats.

      SELECT department_id, name, salary,
      RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
      FROM employees
      Output:
      department_id name salary rank
      101 Charlie 70,000 1
      101 Alice 50,000 2
      101 Frank 50,000 2
      102 Eve 75,000 1
      102 Bob 60,000 2
      103 David 55,000 1

      • Divides employees into 3 equal salary buckets (quartiles).
        SELECT id, name, salary,
        NTILE(3) OVER (ORDER BY salary DESC) AS quartile
        FROM employees;
        id name salary quartile
        5 Eve 75,000 1
        3 Charlie 70,000 1
        2 Bob 60,000 2
        4 David 55,000 2
        1 Alice 50,000 3
        6 Frank 50,000 3
      • Retrieves the first name in each department based on descending salary.
        SELECT department_id, name, salary,
        FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS top_earner
        FROM employees;
        Output:
        department_id name salary top_earner
        101 Charlie 70,000 Charlie
        101 Alice 50,000 Charlie
        101 Frank 50,000 Charlie
        102 Eve 75,000 Eve
        102 Bob 60,000 Eve
        103 David 55,000 David

      First from table will be taken, then WHERE condition will be applied

      • In the WHERE clause directly you cannot call the RANK(), it should be stored in result set, from there only we can call it. So only RANK() will get executed ie Windows CTE (Common Table Expression), that’s why first the CTE will get executed and stored in a temp result set, then SELECT from that result set.
      • Below we gave in the subquery, so it will get executed and then that value is getting used by the outer query.

      In each dept top earner name with his name and salary (consider the above table employees)
      SELECT department_id, name, salary
      FROM (
      SELECT department_id, name, salary,
      RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
      FROM employees
      ) ranked_employees
      WHERE rank = 1;

      department_id name salary
      101 Charlie 70,000
      102 Eve 75,000
      103 David 55,000

      Resultset – here RankedSalaries is Resultset

      WITH RankedSalaries AS (
      SELECT salary, RANK() OVER (ORDER BY salary DESC) AS rank
      FROM employees
      )
      SELECT salary
      FROM RankedSalaries WHERE rank = 2;

      Here, RankedSalaries is a temporary result set or CTE (Common Table Expression)

      Reference: Learnt from ChatGPT and Picture from Ms.Aysha

      04. தரவு ஒருங்கிணைவு (Data Integrity)

      By: Vijayan S
      20 November 2024 at 12:13

      தரவு ஒருங்கிணைவு (Data Integrity)

      தரவு ஒருங்கிணைவு என்பது தரவுத்தளத்தில் உள்ள தரவுகள் சரியானதாகவும், துல்லியமாகவும், நிலைத்தன்மையுடனும் இருப்பதை உறுதி செய்யும் செயல்முறையாகும். இது தரவுத்தளத்தின் நம்பகத்தன்மையை மேம்படுத்துகிறது மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.

      தரவு ஒருங்கிணைவின் முக்கிய வகைகள்:

      1. பண்பு ஒருங்கிணைவு (Domain Integrity):

        • ஒவ்வொரு பத்தியும் (column) அதற்கு ஒதுக்கப்பட்ட தரவு வகையை (data type) பின்பற்ற வேண்டும்.
        • உதாரணமாக, வயது பத்தியில் எண்களையே உள்ளிட முடியும், எழுத்துக்களை உள்ளிட முடியாது.
      2. நிறுவன ஒருங்கிணைவு (Entity Integrity):

        • ஒவ்வொரு அட்டவணையிலும் (table) உள்ள ஒவ்வொரு பதிவும் (record) தனித்துவமான முதன்மை விசையைக் (primary key) கொண்டிருக்க வேண்டும்.
        • உதாரணமாக, ஒரு பள்ளியின் மாணவர் பதிவேட்டில், மாணவர் கல்வி எண் (roll number) முதன்மை விசையாக இருக்கலாம்.
      3. குறிப்பு ஒருங்கிணைவு (Referential Integrity):

        • ஒரு அட்டவணையில் உள்ள வெளிநாட்டு விசை (foreign key) மற்றொரு அட்டவணையின் முதன்மை விசையை குறிக்க வேண்டும்.
        • உதாரணமாக, ஒரு விற்பனை அட்டவணையில் உள்ள வாடிக்கையாளர் ID வெளிநாட்டு விசையாக இருந்து, வாடிக்கையாளர் விவரங்கள் அட்டவணையின் வாடிக்கையாளர் ID முதன்மை விசையுடன் பொருந்த வேண்டும்.
      4. துணை ஒருங்கிணைவு (Tuple Integrity):

        • ஒவ்வொரு அட்டவணையிலும் உள்ள ஒவ்வொரு பதிவும் தனித்துவமானதாக இருக்க வேண்டும்.
        • உதாரணமாக, ஒரு ஊழியர் அட்டவணையில், இரண்டு ஊழியர்களுக்கும் ஒரே ஊழியர் ID இருக்க முடியாது.

      தரவு ஒருங்கிணைவு நன்மைகள்:

      • தரவு துல்லியம் மற்றும் நம்பகத்தன்மையை மேம்படுத்துகிறது.
      • தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தடுக்கிறது.
      • தரவுத்தள செயல்திறனை மேம்படுத்துகிறது.
      • தரவு பாதுகாப்பை அதிகரிக்கிறது.

      தரவு ஒருங்கிணைவு என்பது தரவுத்தள மேலாண்மை அமைப்புகளில் (DBMS) மிக முக்கியமான அம்சமாகும். இது தரவுத்தளத்தின் சரியான செயல்பாட்டை உறுதி செய்து, தரவு இழப்பு மற்றும் தவறான தகவல்களால் ஏற்படும் சிக்கல்களைத் தவிர்க்க உதவுகிறது.

      03. ரிலேஷனல் டேட்டாபேஸ் மாடல் என்றால் என்ன? What is Relational Database Model ? (RDBMS)

      By: Vijayan S
      20 November 2024 at 12:03

      1. RDBMS என்றால் என்ன?

      Relational Database Model (RDBMS) என்பது தரவுகளை தொடர்புபடுத்தி சேமித்து நிர்வகிக்க பயன்படும் ஒரு முறையாகும். இது தரவுகளை Tables (அட்டவணைகள்) வடிவில் சேமித்து, Relationships (தொடர்புகள்) மூலம் இணைக்கிறது. இது தரவுகளை திறமையாகவும், நெகிழ்வாகவும் நிர்வகிக்க உதவுகிறது.

      Tables (அட்டவணைகள்): தரவுகள் சேமிக்கப்படும் அடிப்படை அலகு.
      Rows (பத்திகள்): ஒவ்வொரு தரவு பதிவும் ஒரு row ஆகும்.
      Columns (நிரல்கள்): ஒவ்வொரு தரவு பண்பும் ஒரு column ஆகும்.
      Primary Key (முதன்மை விசை): ஒவ்வொரு row-யையும் தனித்து அடையாளம் காட்டும் column அல்லது column களின் தொகுப்பு.
      Foreign Key (வெளி விசை): ஒரு table-ல் உள்ள primary key-யை மற்றொரு table-ல் குறிப்பிடும் column அல்லது column களின் தொகுப்பு.

      தொடர்புகள் (Relationships)

      • One-to-One (ஒன்றுக்கு ஒன்று): ஒரு table-ல் உள்ள ஒவ்வொரு row-ம் மற்றொரு table-ல் உள்ள ஒரே ஒரு row-யுடன் தொடர்புடையது.
      • One-to-Many (ஒன்றுக்கு பல): ஒரு table-ல் உள்ள ஒவ்வொரு row-ம் மற்றொரு table-ல் உள்ள பல rows-களுடன் தொடர்புடையது.
      • Many-to-Many (பலவிற்கும் பல): ஒரு table-ல் உள்ள பல rows-கள் மற்றொரு table-ல் உள்ள பல rows-களுடன் தொடர்புடையது.

      • Normalization (நார்மலைசேஷன்): தரவு சேமிப்பை திறமையாகவும், தரவுகளின் ஒற்றுமையை பாதுகாக்கவும் பயன்படும் செயல்முறை.

      • Indexing (இன்டெக்ஸிங்): தரவுகளை விரைவாக தேட உதவும் தரவு அமைப்பு.

      • Views (வியூக்கள்): தரவுத்தளத்தின் ஒரு பகுதியை ஒரு குறிப்பிட்ட கோணத்தில் காட்டும் தருக்க அமைப்பு.

      • Stored Procedures (சேமிக்கப்பட்ட நடைமுறைகள்): அடிக்கடி பயன்படுத்தப்படும் SQL கட்டளைகளை ஒரே இடத்தில் சேமித்து மீண்டும் பயன்படுத்தும் வசதி.

      • Triggers (ட்ரிக்கர்கள்): தரவுத்தளத்தில் ஏற்படும் மாற்றங்களுக்கு தானாகவே செயல்படும் நிகழ்வுகள்.

      உதாரணம்

      ஒரு பள்ளியின் தரவுத்தளத்தை உருவாக்குவோம்:

      • Students Table: StudentID (Primary Key), StudentName, Age, Class
      • Courses Table: CourseID (Primary Key), CourseName, TeacherName
      • StudentCourses Table: StudentID (Foreign Key), CourseID (Foreign Key)

      இந்த தரவுத்தளத்தில், ஒரு மாணவர் பல பாடங்களில் சேரலாம், ஒரு பாடத்தில் பல மாணவர்கள் சேரலாம். இது Many-to-Many தொடர்புக்கு ஒரு உதாரணம்.

      02. DBMS என்றால் என்ன? What is a DBMS?

      By: Vijayan S
      20 November 2024 at 10:08

      1. DBMS என்றால் என்ன?

      Database Management System (DBMS) என்பது ஒரு மென்பொருள், இது ஒரு Database-ஐ நிர்வகிக்க, சேமிக்க, பெற மற்றும் மாற்ற பயன்படுகிறது. இது Database மற்றும் அதன் பயனர்களுக்கு இடையே ஒரு மத்தியஸ்தராக செயல்படுகிறது, தரவை எளிதாக அணுகவும் பாதுகாப்பாக வைத்திருக்கவும் உதவுகிறது.

      2. DBMS-இன் கூறுகள்

      DBMS-க்கு பின்வரும் முக்கிய கூறுகள் உள்ளன:

      a. Database Engine

      • இது DBMS-இன் மையம் ஆகும்.
      • Query Processing: பயனர்கள் கேட்ட தகவல்களை செயல்படுத்த (execute) செய்யும்.
      • பரிவர்த்தனை மேலாண்மை (Transaction Management): ACID Properties (Atomicity, Consistency - நிலைத்தன்மை, Isolation - தனிமைப்படுத்துதல், Durability - நிலைப்புத்தன்மை)-ஐ பின்பற்றும்.

      Atomicity என்பது ACID பண்புகளின் (Atomicity, Consistency, Isolation, Durability) ஒரு முக்கிய அம்சமாகும். இது ஒரு transaction-ஐ ஒரு முழுமையான, பிரிக்க முடியாத செயலாகக் கருதுகிறது. Atomicity என்பதன் மூலம் கீழ்காணும் இரண்டு தருணங்கள் உறுதி செய்யப்படும்:

      1. ஒரு transaction முழுமையாக நிறைவேற வேண்டும் அல்லது அது ஒரு பங்காகவே இல்லாதது போல இருக்க வேண்டும்.

      2.Transaction நடத்தியபோது எந்தவித தோல்வியும் (உதாரணமாக, system crash, network issue, அல்லது invalid operation) ஏற்பட்டால், அந்த transaction முழுமையாக rollback செய்யப்படும், மற்றும் database தனது முந்தைய நிலைக்கு திரும்பும்.

      Atomicity உதாரணம்
      ஒரு வங்கியின் $500 பணத்தை Account A-ல் இருந்து Account B-க்கு மாற்றும் நிகழ்வை கற்பனை செய்யுங்கள்:

      Account A-யில் இருந்து $500 debit செய்ய வேண்டும்.
      Account B-க்கு $500 credit செய்ய வேண்டும்.
      Atomicity பேணப்படுவதற்காக:

      இந்த இரண்டு படிகளும் (debit மற்றும் credit) வெற்றிகரமாக நிறைவேற வேண்டும். அதில் எதாவது ஒரு பக்கம் தோல்வியடைந்தால், எந்த மாற்றமும் database-இல் நிகழக்கூடாது.
      System crash ஏற்பட்டால் (உதாரணமாக, Account A-யில் இருந்து $500 debit செய்யப்பட்ட பிறகு Account B-க்கு credit செய்யும் முன்பு), rollback மூலம் Account A-யின் நிலை முந்தைய நிலைக்கு திரும்ப வேண்டும்.
      Atomicity இல்லையெனில், கீழ்கண்ட நிலைகள் ஏற்படலாம்:

      Account A-யில் இருந்து $500 குறைக்கப்படும் ஆனால் Account B-க்கு அது சேர்க்கப்படாது.


      b. Database Schema

      • Data-வை எப்படி structure செய்வது என்பதை வரையறுக்கிறது (உதாரணம்: tables, fields, relationships).

      c. Data Definition Language (DDL)

      • Database schema-ஐ வரையறுக்கும் மொழி.
      • உதாரணம்:
      CREATE TABLE Customers (ID INT, Name VARCHAR(50), Age INT);
      

      d. Data Manipulation Language (DML)

      • Data-வை Insert, Update, Delete, Select போன்றவை செய்ய உதவும்:
        • INSERT: புதிய தகவலைச் சேர்க்க.
        • UPDATE: உள்ள தகவலை மாற்ற.
        • DELETE: தேவையற்ற தகவலை அகற்ற.
        • SELECT: தரவை தேடி பெற.

      e. Metadata

      • Data பற்றி தகவல் (e.g., structure, constraints).

      f. Database Users

      • End-users: GUI அல்லது application மூலம் பயன்படுத்துவோர்.
      • DBA (Database Administrator): பாதுகாப்பு மற்றும் செயல்திறனை மேம்படுத்துவோர்.
      • Developers: Database-ஐ அடிப்படையாகக் கொண்டு செயலிகள் உருவாக்குவோர்.

      g. Query Processor

      • பயனர்களின் queries-ஐ database-க்கு புரியும் commands-ஆக மாற்றும்.

      h. Transaction Management

      • Transactions-ஐ பாதுகாப்பாக நிர்வகிக்கும்:
        • Atomicity: முழு transaction அல்லது எதுவும் இல்லை.
        • Consistency: Data சரியாக இருப்பதை உறுதி.
        • Isolation: ஒருவரின் transaction மற்றவரை பாதிக்கக்கூடாது.
        • Durability: Data நிச்சயமாகச் சேமிக்கப்படும்.

      3. DBMS Architecture

      DBMS-ஐ கீழே காட்டப்பட்டுள்ள architecture-களின் அடிப்படையில் அமைக்கலாம்:

      a. 1-Tier Architecture

      • DBMS மற்றும் database ஒரே machine-ல் இருக்கும். Single-user applications-க்கு ஏற்றது.

      b. 2-Tier Architecture

      • Client DBMS server-இன் மேல் நேரடியாக செயல்படும். சிறிய மற்றும் நடுத்தர அமைப்புகளில் பயன்படுத்தப்படும்.

      c. 3-Tier Architecture

      மூன்று அடுக்குகளைக் கொண்டது:

      1. Presentation Layer: GUI அல்லது web interfaces.
      2. Application Layer: Business logic (server-ல் இயங்கும்).
      3. Database Layer: Database மற்றும் DBMS.

      4. DBMS வகைகள்

      a. Relational DBMS (RDBMS)

      • Data-வை tables-இல் rows மற்றும் columns வடிவில் அமைக்கிறது.
      • SQL மூலம் செயல்படும்.
      • உதாரணங்கள்: MySQL, PostgreSQL, Oracle DB.
      • Advantages: எளிய querying, துல்லியமான structure, data integrity.

      Relational DBMSImage Source

      b. Hierarchical DBMS

      • Data-வை tree-like structure-ஆக அமைக்கிறது.
      • Example: IBM's IMS.
      • Usage: File systems.

      Hierarchical DBMSImage Source

      c. Network DBMS

      • Data-வை graph-ஆக நிறுவுகிறது, பல parent-child relationships ஐ ஆதரிக்கிறது.
      • Example: Integrated Data Store (IDS).

      Network DBMSImage Source

      d. Object-Oriented DBMS

      • Data-வை objects வடிவில் சேமிக்கிறது.
      • Examples: db4o, ObjectDB.

      e. NoSQL DBMS

      • Flexible schema கொண்டது; பெரிய அளவிலான மற்றும் அமைப்பில்லாத data-க்கு உகந்தது.
      • Types:
        • Document-based (MongoDB)
        • Key-Value Stores (Redis)
        • Column Stores (Cassandra)
        • Graph Databases (Neo4j).

      5. DBMS-இன் சிறப்பம்சங்கள்

      • Data Independence: Data structure-ல் மாற்றங்கள் applications-ஐ பாதிக்காது.
      • Data Security: Authentication, Access Control போன்றவை உண்டு.
      • Multi-user Support: பல பயனர்களின் ஒரே நேரத்திலான access-ஐ ஆதரிக்கிறது.
      • Backup and Recovery: Data-ஐ பாதுகாக்க உதவும்.
      • Data Consistency: Primary keys, Foreign keys போன்ற constraints மூலம் தரவின் துல்லியம் பாதுகாக்கப்படும்.

      6. DBMS-இன் நன்மைகள்

      1. Reduces Redundancy: Data duplication குறைக்கிறது.
      2. Ensures Data Integrity: Data துல்லியமாக இருக்கும்.
      3. Improves Accessibility: Data-ஐ எளிதாகத் தேட முடியும்.
      4. Enhances Collaboration: பலர் ஒரே நேரத்தில் data-ஐ அணுகலாம்.
      5. Automates Backup: Data loss-ஐ தடுக்கிறது.
      6. Scalability: Data அதிகமானால் கூட efficient-ஆக இயங்கும்.

      7. DBMS-இன் பயன்பாடுகள்

      a. Banking Systems

      • Accounts, Transactions மற்றும் பயனரின் தரவுகளை நிர்வகிக்கிறது.
      • Data security-ஐ உறுதிசெய்யும்.

      b. E-Commerce

      • Inventory, Orders, Customer Profiles நிர்வகிக்கிறது.
      • Example: Amazon, Flipkart போன்ற online platforms.

      c. Healthcare

      • Patient Records, Appointments மற்றும் மருந்து பரிந்துரைகளைச் சேமிக்கிறது.
      • Data confidentiality-ஐ பாதுகாக்கிறது.

      d. Education

      • Student Records, Courses, Grades ஆகியவற்றை நிர்வகிக்கிறது.

      e. Telecommunications

      • Call Records, Billing, Customer Management.

      8. DBMS பயன்படுத்தும் போது சவால்கள்

      • Cost: நிறுவுதல் மற்றும் பராமரிப்பு செலவு அதிகமாக இருக்கும்.
      • Complexity: திறமையான administrators தேவை.
      • Performance: சில நேரங்களில் file systems-ஐ விட மெதுவாக இயங்கும்.
      • Scalability Issues: சில பழைய DBMS-கள் பெரிய அளவிலான data-ஐ கையாள முடியாது.

      01. தரவுத்தளம் எவ்வாறு உருவானது, அதன் தேவை என்ன? How did the database come about, What is its need?

      By: Vijayan S
      18 November 2024 at 16:15

      தரவுத்தளம் எவ்வாறு உருவானது?
      தரவுத்தளங்கள் (Databases) என்பது 1960-1970களில் உருவான தொழில்நுட்பங்கள் ஆகும். ஆரம்பத்தில், தகவல்களை காகிதங்களில் அல்லது எலக்ட்ரானிக் வழிகளில் நகலெடுத்து சேமிப்பது பொதுவாக இருந்தது. ஆனால், தகவல்களை அதிகமாக சேமிப்பதும், அவற்றை எளிதாக அணுகுவது மற்றும் நிர்வகிப்பதும் கடினமாக இருந்தது. இதனால்தான் தரவுத்தளங்கள் வளர்ந்து வந்தன.

      தரவுத்தளங்களின் வளர்ச்சி:

      1. பாரம்பரிய நிரல்களை பயன்படுத்தி தரவு சேமிப்பு:
        ஆரம்பத்தில், தரவுகள் காகித வடிவில் அல்லது அட்டவணைகள் (Table) போன்ற பொருள்களில் சேமிக்கப்பட்டன. இதனால், தரவு பிரச்சனைகள் மற்றும் தரவு மீட்டெடுப்பதில் சிக்கல்கள் ஏற்பட்டன.

      2. Hierarchical and Network Models (1960-1970கள்):
        இதன் மூலம் பின்பற்றப்பட்டிருந்தது ஒரு கட்டமைப்பான தரவு தொகுப்புகள் ஆகும். இவை சில காலமாக பயன்படுத்தப்பட்டாலும், அவை வெற்றிகரமாக இருந்தன என்றாலும் பின்பு அவை தரவை எளிதாக அணுக முடியாத வகையில் இருந்தன.

      3. Relational Database Model (1970கள்):
        எட்வர்டு Codd என்ற கணினி விஞ்ஞானி 1970-இல் பீடினிய (Relational) தரவுத்தள மாதிரியை அறிமுகப்படுத்தினார். இது தரவுகளுக்கிடையேயான தொடர்புகளை எளிதாக அமைக்கவும், SQL (Structured Query Language) என்ற மொழியை பயன்படுத்தி தரவை எளிதாக அணுகவும் உதவியது. இதில், தரவை அட்டவணைகளில் (tables) சேமித்து, அவை இடையே உறவுகளை (relationships) உருவாக்க முடியும்.

      4. Modern Databases (1990களின் பிறகு):
        1990களில், முக்கிய தரவுத்தளங்கள் (MySQL, PostgreSQL, Oracle) உருவானதும், NoSQL போன்ற புதிய வகையான தரவுத்தளங்கள் (MongoDB, Cassandra) பிறந்ததும், தரவின் அளவு மற்றும் தேவைகளுக்கு ஏற்ப புதிய வடிவங்களில் தரவுத்தளங்கள் வளர்ச்சி பெற்றன.

      தரவுத்தளங்களின் தேவை
      தரவுத்தளங்களின் தேவை மிகப்பெரியது, ஏனெனில் அவை தரவுகளை எளிதாக சேமிக்க, அணுக, பராமரிக்க, பாதுகாக்க, மற்றும் பகுப்பாய்வு செய்ய உதவுகின்றன. தற்போது தரவுத்தளங்கள் பன்முக துறைகளில் பயன்படுத்தப்படுகின்றன, அதாவது தொழில்நுட்பம், வணிகம், கல்வி, அரசியல், மருத்துவம் போன்ற பல துறைகளில் அவை முக்கிய பங்கு வகிக்கின்றன. கீழே தரவுத்தளங்களின் முக்கிய தேவைகள் குறித்து விரிவாக விளக்கப்படுகிறது:

      1. தரவு சேமிப்பு மற்றும் ஒழுங்கு

      தரவு சேமிப்பு: தரவுத்தளங்கள் மூலம் பல கோடி, கோடிக்கும் மேற்பட்ட தரவுகளை ஒரே இடத்தில் ஒழுங்குபடுத்தி சேமிக்க முடியும். இது குறிப்பாக பெரிய நிறுவனங்கள் மற்றும் இணையதளங்கள், வணிக அமைப்புகள் போன்றவற்றுக்கு முக்கியமானது.

      ஒழுங்கு: தரவுத்தளங்களில் தரவை அட்டவணைகளாக (tables) அல்லது கட்டமைப்புகளாக (structures) ஒழுங்குபடுத்துவதால் தரவுகள் எளிதாக கையாளப்படுகின்றன.

      2. தரவு அணுகல் மற்றும் மீட்டெடுப்பு

      விரைவான அணுகல்: தரவுத்தளங்களில் சேமிக்கப்பட்ட தரவை விரைவாக மற்றும் எளிதாக அணுக முடியும். வணிகத்தளங்கள், வங்கி கணக்குகள், இணைய சேவைகள் அனைத்திலும் தரவு அணுகலுக்கான தேவைகள் அதிகமாக இருக்கின்றன.

      அதிக அளவில் தரவு மீட்டெடுப்பு: தரவுத்தளங்கள் பெரிய அளவில், விரைவாக தரவுகளை மீட்டெடுக்க (retrieve) உதவுகின்றன.

      3. தரவு ஒருங்கிணைப்பு (Data Consistency)

      ஒரே தரவினை பயன்படுத்துதல்: பல இடங்களில் பரவியுள்ள தரவுகளுக்கிடையில் ஒரே தரவின் புதுப்பிப்புகளை (updates) ஒருங்கிணைக்கும் திறன் தரவுத்தளங்களுக்குப் முக்கியமானது.

      ஒரே மாதிரியில் தரவு பராமரிப்பு: தரவுத்தளங்கள் தரவு ஒருங்கிணைப்பை (data normalization) செய்கின்றன, இதனால் தரவு பிழைகள் மற்றும் மறுமொழிகள் (redundancies) தவிர்க்கப்படுகின்றன.

      4. தரவு பாதுகாப்பு மற்றும் அனுமதிகள்

      பயனர் பாதுகாப்பு: தரவுத்தளங்களில் தகவல்கள் காப்பு மற்றும் குறியாக்கம் (encryption) மூலம் பாதுகாக்கப்படுகின்றன. அதனால் உள்நுழைவதற்கான அனுமதியுடன் மட்டுமே பயனர்கள் தரவை அணுக முடியும்.

      பயனர் அனுமதிகள்: பயனர்களுக்கான அனுமதிகளை (permissions) நிர்வகிக்கும்போது, குறிப்பிட்ட தரவு குறிப்பட்ட பயனருக்கு மட்டுமே கிடைக்கின்றது.

      5. தரவு மீட்பு (Backup and Recovery)

      தரவு பிழைகள் மற்றும் இழப்புகள்: கணினி செயலிழக்கும் அல்லது தவறாக செயல்படும் போது, தரவுத்தளங்கள் தங்களின் தரவு மீட்பு (backup) மற்றும் மீட்டெடுப்புக் (recovery) முறைமைகள் மூலம் தரவை மீண்டும் பெற முடியும்.

      செயல்பாட்டு தொடர்ச்சி: வேறு வழிகளில் தரவு இழப்புகள் ஏற்பட்டால், தரவுத்தளம் தானாகவே அதனை மீட்டெடுக்க முடியும், இதனால் நிறுவனங்கள் அல்லது பயன்பாடுகள் தொடர்ந்தும் இயங்கும்.

      6. ஒத்திசைவு கட்டுப்பாடு (Concurrency Control)

      பல பயனர்களின் அணுகல்: பல பயனர்கள் ஒரே நேரத்தில் தரவை அணுகும்போது, தரவுத்தளம் அதனை ஒத்திசைவு கட்டுப்பாடு முறைகள் மூலம் கையாள்கின்றது. இதில் ஒவ்வொரு பயனருக்கும் தனிப்பட்ட அனுமதிகள் மற்றும் சரியான தரவுத்தொகுப்புகளை அளிக்கின்றது.

      பயனருக்கிடையே ஒப்பந்தப்படுத்தல்: சில சந்தர்ப்பங்களில், பல பயனர்கள் ஒரே தரவை ஒரே நேரத்தில் மாற்றினால், அது மோதலை (conflict) ஏற்படுத்தக்கூடும். இந்த மோதலை சரிசெய்யும் திறன் தரவுத்தளங்களில் உள்ளது.

      7. தரவு பகுப்பாய்வு மற்றும் அறிக்கைகள்

      பயன்பாட்டு தரவு பகுப்பாய்வு: தரவுத்தளங்கள் உள்ள தரவை வணிகம் அல்லது ஆராய்ச்சி கருதுகோள்களில் எளிதாக பகுப்பாய்வு செய்ய உதவுகின்றன. SQL போன்ற மொழிகள் மூலம் பயனர்கள் பல்வேறு கேள்விகளை முன்மொழிந்து தரவு பகுப்பாய்வு செய்ய முடியும்.

      அறிக்கைகள் மற்றும் பட்டியல்: வணிகங்கள் மற்றும் நிறுவனங்கள் தரவுத்தளங்களை பயன்படுத்தி பல தரவுகளின் அடிப்படையில் மாதாந்திர அறிக்கைகள் மற்றும் பட்டியல்களை உருவாக்க முடியும்.

      8. பெரிய அளவு தரவுகளை கையாளுதல் (Scalability)

      பெரிய அளவு தரவு: தரவுத்தளங்கள் பெரும்பாலும் அதிகமான தரவுகளை எளிதாக கையாள முடியும். இது குறிப்பாக இணையதளம், மிகப்பெரிய நிறுவனம் அல்லது e-commerce தளங்களில் அதிக பயனர்களுடன் தரவை எளிதாக பராமரிக்க உதவுகிறது.

      தரம் மற்றும் வேகத்தில் விரிவாக்கம்: தரவுத்தளங்கள் உயர்ந்த அளவிலான தரவை எளிதாக கையாளும் திறன் (scalability) வழங்குகின்றன.

      9. தொகுப்புகள் மற்றும் உறவுகள் (Relationships)

      தரவு தொடர்புகள்: தரவுத்தளங்களில் உள்ள விவரங்கள் இடையே உறவுகள் (relationships) உருவாக்கப்படுகின்றன. இது பல தரவு தொகுப்புகளை (tables) இணைத்து பரிமாற்ற (integration) மற்றும் பகுப்பாய்வு செய்ய உதவுகின்றது.

      பெரிய தரவு தொகுப்புகள்: (One-to-Many), (Many-to-Many) போன்ற பல உறவுகளை கொண்ட தரவுகளை பின்பற்ற முடியும்.

      10. கிளவுட் தரவுத்தளங்கள் (Cloud Databases)
      அனைத்திலும் அணுகல்: இன்று கிளவுட் தரவுத்தளங்களைப் பயன்படுத்தி தரவை அனைத்திடத்திலும் எளிதாக அணுக முடியும். அவை உயர் நிலை நம்பகத்தன்மை (high availability) மற்றும் பாதுகாப்பு (security) கொண்டுள்ளன.

      சில தரவுத்தளங்கள்:

      Relational Databases (RDBMS): MySQL, PostgreSQL, Oracle, SQL Server.

      NoSQL Databases: MongoDB, Cassandra, Firebase, CouchDB.

      Cloud Databases: Amazon RDS, Google Cloud SQL, Microsoft Azure SQL.

      In-memory Databases: Redis, Memcached.

      PostgreSQL Tasks:

      By: Sakthivel
      4 September 2024 at 18:03

      Table I created table name is “movies”

      TASK 1 : Show details for Vijay acted movies.

      SELECT * FROM movies WHERE actor=’vijay’;


      TASK 2 : Show Vijay movies names only.

      SELECT movie FROM movies WHERE actor=’vijay’;


      TASK 3 : Show details, lokesh kanagaraj directed movie and actor name starts with ‘k’ and ends with vowels[a,e,i,o,u].

      SELECT * FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE ‘k%’ AND actor SIMILAR TO ‘%[aeiou]’;


      TASK 4 : Show movie name only, lokesh kanagaraj directed movie and actor name starts with ‘k’ and ends with vowels[a,e,i,o,u].

      SELECT movie FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE ‘k%’ AND actor SIMILAR TO ‘%[aeiou]’;


      TASK 5 : Show movie name and actor name only, lokesh kanagaraj directed movie and actor name starts with ‘k’ and ends with vowels[a,e,i,o,u].

      SELECT movie,actor FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE ‘k%’ AND actor SIMILAR TO ‘%[aeiou]’;


      TASK 6 : Show the details, director name starts and ends with same letter.

      SELECT * FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 7 : Show the movie name only, the director name starts and ends with same letter.

      SELECT movie FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 8 : Show the director name only, the director name start and ends with same letter.

      SELECT director FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 9 : Show the movie name, the actors have only five character.

      SELECT movie FROM movies WHERE LENGTH(actor)=5;


      TASK 10 : Show the movie name and actors name, actors have only five character.

      SELECT movie,actor FROM movies WHERE LENGTH(actor)=5;

      TASK 11 : Add a column salary_in_crore and values.

      ALTER TABLE movies ADD COLUMN salary_in_crore INT;

      UPDATE movies SET salary_in_crore =100 WHERE actor = ‘vijay’;

      UPDATE movies SET salary_in_crore =70 WHERE actor = ‘kamal’;

      UPDATE movies SET salary_in_crore =90 WHERE actor = ‘ajith’;

      UPDATE movies SET salary_in_crore =40 WHERE actor = ‘karthi’;

      UPDATE movies SET salary_in_crore =110 WHERE actor = ‘rajini’;

      UPDATE movies SET salary_in_crore =50 WHERE actor = ‘dhanush’;

      UPDATE movies SET salary_in_crore =5 WHERE actor = ‘soori’;

      SELECT * FROM movies;


      TASK 12 : Show actor name and salary,order by salary high to low.

      SELECT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC;


      TASK 13 : Show top 3 actor based on salary.

      SELECT DISTINCT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC LIMIT 3;


      To import a .CSV/Excel file to PostgreSQL table:

      By: Sakthivel
      4 September 2024 at 15:11

      Insert values From a .csv file

      Step 1: Create a .csv or .xlsx file

      Step 2: Insert values from a .csv or .xlsx file

      Step 3: Copy the file and paste in to the location where pSQL installed:

      -> “/var/lib/postgresql” and paste the file

      Step 4: Go to the terminal and connect the database.

      Step 5: Create table in the database with the columns given in the file header.

      Step 6: Enter the follwing query in psql


      -> COPY table_name FROM ‘/var/lib/postgresql/fine_name.csv’ WITH CSV HEADER;


      Here, “movies” is the table name and the location where the “movies.csv” file is present.

      9 values copied

      Step 7: Then, write a select query to fetch all the rows.

      successfully import the datas in to postgres database from csv or excel file


      Different Database Models

      23 August 2024 at 01:50

      Database models define the structure, relationships, and operations that can be performed on a database. Different database models are used based on the specific needs of an application or organization. Here are the most common types of database models:

      1. Hierarchical Database Model

      • Structure: Data is organized in a tree-like structure with a single root, where each record has a single parent but can have multiple children.
      • Usage: Best for applications with a clear hierarchical relationship, like organizational structures or file systems.
      • Example: IBM’s Information Management System (IMS).
      • Advantages: Fast access to data through parent-child relationships.
      • Disadvantages: Rigid structure; difficult to reorganize or restructure.

      2. Network Database Model

      • Structure: Data is organized in a graph structure, where each record can have multiple parent and child records, forming a network of relationships.
      • Usage: Useful for complex relationships, such as in telecommunications or transportation networks.
      • Example: Integrated Data Store (IDS).
      • Advantages: Flexible representation of complex relationships.
      • Disadvantages: Complex design and navigation; can be difficult to maintain.

      3. Relational Database Model

      • Structure: Data is organized into tables (relations) where each table consists of rows (records) and columns (fields). Relationships between tables are managed through keys.
      • Usage: Widely used in various applications, including finance, retail, and enterprise software.
      • Example: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.
      • Advantages: Simplicity, data integrity, flexibility in querying through SQL.
      • Disadvantages: Can be slower for very large datasets or highly complex queries.

      4. Object-Oriented Database Model

      • Structure: Data is stored as objects, similar to objects in object-oriented programming. Each object contains both data and methods for processing the data.
      • Usage: Suitable for applications that require the modeling of complex data and relationships, such as CAD, CAM, and multimedia databases.
      • Example: db4o, ObjectDB.
      • Advantages: Seamless integration with object-oriented programming languages, reusability of objects.
      • Disadvantages: Complexity, not as widely adopted as relational databases.

      5. Document-Oriented Database Model

      • Structure: Data is stored in document collections, with each document being a self-contained piece of data often in JSON, BSON, or XML format.
      • Usage: Ideal for content management systems, real-time analytics, and big data applications.
      • Example: MongoDB, CouchDB.
      • Advantages: Flexible schema design, scalability, ease of storing hierarchical data.
      • Disadvantages: May require denormalization, leading to potential data redundancy.

      6. Key-Value Database Model

      • Structure: Data is stored as key-value pairs, where each key is unique, and the value can be a string, number, or more complex data structure.
      • Usage: Best for applications requiring fast access to simple data, such as caching, session management, and real-time analytics.
      • Example: Redis, DynamoDB, Riak.
      • Advantages: High performance, simplicity, scalability.
      • Disadvantages: Limited querying capabilities, lack of complex relationships.

      7. Column-Family Database Model

      • Structure: Data is stored in columns rather than rows, with each column family containing a set of columns that are logically related.
      • Usage: Suitable for distributed databases, handling large volumes of data across multiple servers.
      • Example: Apache Cassandra, HBase.
      • Advantages: High write and read performance, efficient storage of sparse data.
      • Disadvantages: Complexity in design and maintenance, not as flexible for ad-hoc queries.

      8. Graph Database Model

      • Structure: Data is stored as nodes (entities) and edges (relationships) forming a graph. Each node represents an object, and edges represent the relationships between objects.
      • Usage: Ideal for social networks, recommendation engines, fraud detection, and any scenario where relationships between entities are crucial.
      • Example: Neo4j, Amazon Neptune.
      • Advantages: Efficient traversal and querying of complex relationships, flexible schema.
      • Disadvantages: Not as efficient for operations on large sets of unrelated data.

      9. Multimodel Database

      • Structure: Supports multiple data models (e.g., relational, document, graph) within a single database engine.
      • Usage: Useful for applications that require different types of data storage and querying mechanisms.
      • Example: ArangoDB, Microsoft Azure Cosmos DB.
      • Advantages: Flexibility, ability to handle diverse data requirements within a single system.
      • Disadvantages: Complexity in management and optimization.

      10. Time-Series Database Model

      • Structure: Specifically designed to handle time-series data, where each record is associated with a timestamp.
      • Usage: Best for applications like monitoring, logging, and real-time analytics where data changes over time.
      • Example: InfluxDB, TimescaleDB.
      • Advantages: Optimized for handling and querying large volumes of time-stamped data.
      • Disadvantages: Limited use cases outside of time-series data.

      11. NoSQL Database Model

      • Structure: An umbrella term for various non-relational database models, including key-value, document, column-family, and graph databases.
      • Usage: Ideal for handling unstructured or semi-structured data, and scenarios requiring high scalability and flexibility.
      • Example: MongoDB, Cassandra, Couchbase, Neo4j.
      • Advantages: Flexibility, scalability, high performance for specific use cases.
      • Disadvantages: Lack of standardization, potential data consistency challenges.

      Summary

      Each database model serves different purposes, and the choice of model depends on the specific requirements of the application, such as data structure, relationships, performance needs, and scalability. While relational databases are still the most widely used, NoSQL and specialized databases have become increasingly important for handling diverse data types and large-scale applications.

      What is Relational Database and Postgres Sql ?

      21 August 2024 at 01:56

      In the city of Data, the citizens relied heavily on organizing their information. The city was home to many different types of data numbers, names, addresses, and even some exotic types like images and documents. But as the city grew, so did the complexity of managing all this information.

      One day, the city’s leaders called a meeting to discuss how best to handle the growing data. They were split between two different systems

      1. the old and trusted Relational Database Management System (RDBMS)
      2. the new, flashy NoSQL databases.

      Enter Relational Databases:

      Relational databases were like the city’s libraries. They had rows of neatly organized shelves (tables) where every book (data entry) was placed according to a specific category (columns).

      Each book had a unique ID (primary key) so that anyone could find it quickly. These libraries had been around for decades, and everyone knew how to use them.

      The RDBMS was more than just a library. It enforced rules (constraints) to ensure that no book went missing, was duplicated, or misplaced. It even allowed librarians (queries) to connect different books using relationships (joins).

      If you wanted to find all the books by a particular author that were published in the last five years, the RDBMS could do it in a heartbeat.

      The Benefits of RDBMS:

      The citizens loved the RDBMS because it was:

      1. Organized: Everything was in its place, and data was easy to find.
      2. Reliable: The rules ensured data integrity, so they didn’t have to worry about inconsistencies.
      3. Powerful: It could handle complex queries, making it easy to get insights from their data.
      4. Secure: Access to the data could be controlled, keeping it safe from unauthorized users.

      The Rise of NoSQL:

      But then came the NoSQL databases, which were more like vast, sprawling warehouses. These warehouses didn’t care much about organization; they just stored everything in a big open space. You could toss in anything, and it would accept it—no need for strict categories or relationships. This flexibility appealed to the tech-savvy citizens who wanted to store newer, more diverse types of data like social media posts, images, and videos.

      NoSQL warehouses were fast. They could handle enormous amounts of data without breaking a sweat and were perfect for real-time applications like chat systems and analytics.

      The PostgreSQL Advantage:

      PostgreSQL was a superstar in the world of RDBMS. It combined the organization and reliability of traditional relational databases with some of the flexibility of NoSQL. It allowed citizens to store structured data in tables while also offering support for unstructured data types like JSON. This made PostgreSQL a versatile choice, bridging the gap between the old and new worlds.

      For installing postgres : https://www.postgresql.org/download/

      The Dilemma: PostgreSQL vs. NoSQL:

      The city faced a dilemma. Should they stick with PostgreSQL, which offered the best of both worlds, or fully embrace NoSQL for its speed and flexibility? The answer wasn’t simple. It depended on what the city valued more: the structured, reliable nature of PostgreSQL or the unstructured, flexible approach of NoSQL.

      For applications that required strict data integrity and complex queries, PostgreSQL was the way to go. But for projects that needed to handle massive amounts of unstructured data quickly, NoSQL was the better choice.

      Conclusion:

      In the end, the city of Data realized that there was no one-size-fits-all solution. They decided to use PostgreSQL for applications where data relationships and integrity were crucial, and NoSQL for those that required speed and flexibility with diverse data types.

      And so, the citizens of Data lived happily, managing their information with the right tools for the right tasks, knowing that both systems had their place in the ever-growing city.

      Setting up your own High Availability managed WordPress hosting using Amazon RDS

      1 January 2023 at 05:18

      Hosting your own WordPress website is interesting right!! Ok, come on let’s do it!!

      We are going to do this practical from Scratch. From the Creation of our Own VPC, Subnets, Internet Gateway, Route tables to Deployment of WordPress.

      Here, we are going to use Amazon Web Service’s RDS service for hosting our own WordPress site. Before that, let’s take a look at a basic introduction to RDS service.

      Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running in the cloud designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically.

      Features of AWS RDS

      • Lower administrative burden. Easy to use
      • Performance. General Purpose (SSD) Storage
      • Scalability. Push-button compute scaling
      • Availability and durability. Automated backups
      • Security. Encryption at rest and in transit
      • Manageability. Monitoring and metrics
      • Cost-effectiveness. Pay only for what you use

      Ok, let’s jump onto the practical part!!

      We will do this practical from scratch. Since it will be big, so we divided this into 5 small parts namely

      • Creating a MySQL database with RDS
      • Creating an EC2 instance
      • Configuring your RDS database
      • Configuring WordPress on EC2
      • Deployment of WordPress website

      Creating a MySQL database with RDS

      Before that, we have to do two pre-works namely the Creation of Virtual Private Cloud(VPC), Subnets and Security groups. These are more important because in order to have a reliable connection between WordPress and MySQL database, they should be located in the same VPC and should have the same Security Group.

      Since Instances are launched on Subnets only, Moreover RDS will launch your MySQL database in EC2 instance only that we cannot able to see since it is fully managed by AWS.

      VPC Dashboard

      We are going to create our own VPC. For that, we have to specify IP range and CIDR. We specified IP and CIDR as 192.168.0.0/16.

      What is CIDR?. I explained this in my previous blog in very detail. You can refer here.

      Lets come to the point. After specifying the IP range and CIDR, enter your VPC name.

      Now, VPC is successfully created with our specified details.

      Next step is to launch the subnet in the above VPC.

      Subnet Dashboard

      For Creating Subnets, you have to specify which VPC the lab should launch. We already have our own VPC named “myvpc123”.

      And then we have to specify the range of Subnet IP and CIDR. Please note that the Subnet range should come under VPC range, it should not exceed VPC range.

      For achieving the property of High Availability, We have to launch minimum two subnets, so that Amazon RDS will launch its database in two subnets, if one subnet collapsed means, it won’t cause any trouble.

      Now, two Subnets with their specified range of IPs and CIDR are launched successfully inside our own VPC and they are available.

      Next step is to create a security group in order to secure the WordPress and MySQL databases. Note that both should have the same Security Group or else it won’t connect.

      For creating a Security Group, we have to specify which VPC it should be launched and adding a Description is mandatory.

      Then we have to specify inbound rules, for making this practical simple, we are allowing all traffic to access our instance.

      Now, the Security Group is successfully created with our specified details.

      Now let’s jump into part 1 which is about Creating a MySQL database with RDS.

      RDS dashboard

      Select Create database, then select Standard create and specify the database type.

      Then you have to specify the Version. Version plays a major role in MySQL when integrating with WordPress, so select the compactible version or else it will cause serious trouble at the end. Then select the template, here we are using Free-tier since it won’t be chargeable.

      Then you have to specify the credentials such as Database Instance name, Master username and Master password.

      Most important part is a selection of VPC, you should select the same VPC where you will launch your EC2 instance for your WordPress and we can’t modify the VPC once the database is created. Then select the Public access as No for providing more security to our database. Now, the people outside of your VPC can’t connect to your database.

      Then you have to specify the security group for your database. Note that the Security Group for your database and WordPress should be the same or else it will cause serious trouble.

      Note that Security Groups is created per VPC. After selecting Security Group, then click Ok to create the RDS database.

      Creating an EC2 instance

      Before creating an instance, there should be two things you configured namely Internet Gateway and Route tables. It is used for providing outside internet connectivity to an instance launched in the subnet.

      Internet Gateway Dashboard

      Internet Gateway is created per VPC. First, we have to create one new Internet Gateway with the specified details.

      Then you have to attach Internet Gateway to the VPC

      Next step is to create Routing tables. Note that Route table is created per Subnet.

      We have to specify which VPC in which your subnet is available to attach routing table with it, specify Name and click create to create the route table.

      Then click Edit route to edit the route details namely destination and target. Enter destination as 0.0.0.0/0 for accessing any IP anywhere on the Internet and target is your Internet Gateway.

      After entering the details, click Save routes.

      We created a Route table, then we have to attach that table to your Subnet. For that click Edit route table association and select your subnet where you want to attach the route table with it.

      Now, lets jump into the task of creating an EC2 instance.

      First, you have to choose the AMI image in which you used for creating an EC2 instance, here I selected Amazon Linux 2 AMI for that.

      Then you have to select Instance type, here I selected t2.micro since it comes under free tier.

      Then you have to specify the VPC, Subnet for your instance and you have to enable Auto-assign Public IP in order to get your Public IP to your instance.

      Then you have to add storage for your instance. It is optional only.

      Then you have to specify the tags which will be more useful especially for automation.

      Then you have to select the Security Group for your instance. It should be the same as your database have.

      And click Review and Launch. Then you have to add Keypair to launch your EC2 instance. If you didn’t have Keypair means, you can create at that time.

      Configuring your RDS database

      At this point, you have created an RDS database and an EC2 instance. Now, we will configure the RDS database to allow access to specific entities.

      You have to run the below command in your EC2 instance in order to establish the connection with your database.

      export MYSQL_HOST=<your-endpoint>

      You can find your endpoint by clicking database in the RDS dashboard. Then you have to run the following command.

      mysql --user=<user> --password=<password> dbname

      This output shows the database is successfully connected to an EC2 instance.

      In the MySQL command terminal, you have to run the following commands in order to get all privileges to your account.

      CREATE USER 'vishnu' IDENTIFIED BY 'vishnupassword';
      GRANT ALL PRIVILEGES ON dbname.* TO vishnu;
      FLUSH PRIVILEGES;
      Exit

      Configuring WordPress on EC2

      For Configuring WordPress on EC2 instance, the first step is to configure the webserver, here I am using Apache webserver. For that, you have to run the following commands.

      sudo yum install -y httpd
      sudo service httpd start

      Next step would be download the WordPress application from the internet by using wget command. Run the following code to download the WordPress application.

      wget https://wordpress.org/latest.tar.gz
      tar -xzf latest.tar.gz

      Then we have to do some configuration, for this follow the below steps.

      cd wordpress
      cp wp-config-sample.php wp-config.php
      cd wp-config.php

      Go inside the wp-config.php file and enter your credentials (including your password too)

      Then, Goto this link and copy all and paste it to replace the existing lines of code.

      Next step is to deploy the WordPress application. For that, you have to run the following commands in order to solve the dependencies and deploy WordPress in the webserver.

      sudo amazon-linux-extras install -y lamp-mariadb10.2-php7.2 php7.2
      sudo cp -r wordpress/* /var/www/html/
      sudo service httpd restart

      That’s it. You have a live, publicly-accessible WordPress installation using a fully-managed MySQL database on Amazon RDS.

      Then if you enter your WordPress instance IP in your browser, you will land your WordPress home page.

      After you filled in your credentials, you will get your own homepage.

      That’s it. You launched your own application in your own instance and your database is managed by AWS RDS service.


      Thank you all for your reads. Stay tuned for my next article.

      ❌
      ❌