❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

🎯 PostgreSQL Zero to Hero with Parottasalna – 2 Day Bootcamp (FREE!) πŸš€

2 March 2025 at 07:09

Databases power the backbone of modern applications, and PostgreSQL is one of the most powerful open-source relational databases trusted by top companies worldwide. Whether you’re a beginner or a developer looking to sharpen your database skills, this FREE bootcamp will take you from Zero to Hero in PostgreSQL!

What You’ll Learn?

βœ… PostgreSQL fundamentals & installation

βœ… Postgres Architecture
βœ… Writing optimized queries
βœ… Indexing & performance tuning
βœ… Transactions & locking mechanisms
βœ… Advanced joins, CTEs & subqueries
βœ… Real-world best practices & hands-on exercises

This intensive hands on bootcamp is designed for developers, DBAs, and tech enthusiasts who want to master PostgreSQL from scratch and apply it in real-world scenarios.

Who Should Attend?

πŸ”Ή Beginners eager to learn databases
πŸ”Ή Developers & Engineers working with PostgreSQL
πŸ”Ή Anyone looking to optimize their SQL skills

πŸ“… Date: March 22, 23 -> (Moved to April 5, 6)
⏰ Time: Will be finalized later.
πŸ“ Location: Online
πŸ’° Cost: 100% FREE πŸŽ‰

πŸ”— RSVP Here

Prerequisite

  1. Checkout this playlist of our previous postgres session https://www.youtube.com/playlist?list=PLiutOxBS1Miy3PPwxuvlGRpmNo724mAlt

πŸŽ‰ This bootcamp is completely FREE – Learn without any cost! πŸŽ‰

πŸ’‘ Spots are limited – RSVP now to reserve your seat!

Learning Notes #51 – Postgres as a Queue using SKIP LOCKED

11 January 2025 at 06:56

Yesterday, i came across a blog from inferable.ai https://www.inferable.ai/blog/posts/postgres-skip-locked, which walkthrough about using postgres as a queue. In this blog, i jot down notes on using postgres as a queue for future references.

PostgreSQL is a robust relational database that can be used for more than just storing structured data. With the SKIP LOCKED feature introduced in PostgreSQL 9.5, you can efficiently turn a PostgreSQL table into a job queue for distributed processing.

Why Use PostgreSQL as a Queue?

Using PostgreSQL as a queue can be advantageous because,

  • Familiarity: If you’re already using PostgreSQL, there’s no need for an additional message broker.
  • Durability: PostgreSQL ensures ACID compliance, offering reliability for your job processing.
  • Simplicity: No need to manage another component like RabbitMQ or Kafka

Implementing a Queue with SKIP LOCKED

1. Create a Queue Table

To start, you need a table to store the jobs,


CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    job_data JSONB NOT NULL,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This table has the following columns,

  • id: A unique identifier for each job.
  • job_data: The data or payload for the job.
  • status: Tracks the job’s state (β€˜pending’, β€˜in_progress’, or β€˜completed’).
  • created_at: Timestamp of job creation.

2. Insert Jobs into the Queue

Adding jobs is straightforward,


INSERT INTO job_queue (job_data)
VALUES ('{"task": "send_email", "email": "user@example.com"}');

3. Fetch Jobs for Processing with SKIP LOCKED

Workers will fetch jobs from the queue using SELECT ... FOR UPDATE SKIP LOCKED to avoid contention,

WITH next_job AS (
    SELECT id, job_data
    FROM job_queue
    WHERE status = 'pending'
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
UPDATE job_queue
SET status = 'in_progress'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.id, job_queue.job_data;

Key Points:

  • FOR UPDATE locks the selected row to prevent other workers from picking it up.
  • SKIP LOCKED ensures locked rows are skipped, enabling concurrent workers to operate without waiting.
  • LIMIT 1 processes one job at a time per worker.

4. Mark Jobs as Completed

Once a worker finishes processing a job, it should update the job’s status,


UPDATE job_queue
SET status = 'completed'
WHERE id = $1; -- Replace $1 with the job ID

5. Delete Old or Processed Jobs

To keep the table clean, you can periodically remove completed jobs,


DELETE FROM job_queue
WHERE status = 'completed' AND created_at < NOW() - INTERVAL '30 days';

Example Worker Implementation

Here’s an example of a worker implemented in Python using psycopg2


import psycopg2
from psycopg2.extras import RealDictCursor

connection = psycopg2.connect("dbname=yourdb user=youruser")

while True:
    with connection.cursor(cursor_factory=RealDictCursor) as cursor:
        cursor.execute(
            """
            WITH next_job AS (
                SELECT id, job_data
                FROM job_queue
                WHERE status = 'pending'
                FOR UPDATE SKIP LOCKED
                LIMIT 1
            )
            UPDATE job_queue
            SET status = 'in_progress'
            FROM next_job
            WHERE job_queue.id = next_job.id
            RETURNING job_queue.id, job_queue.job_data;
            """
        )

        job = cursor.fetchone()
        if job:
            print(f"Processing job {job['id']}: {job['job_data']}")

            # Simulate job processing
            cursor.execute("UPDATE job_queue SET status = 'completed' WHERE id = %s", (job['id'],))

        else:
            print("No jobs available. Sleeping...")
            time.sleep(5)

    connection.commit()

Considerations

  1. Transaction Isolation: Use the REPEATABLE READ or SERIALIZABLE isolation level cautiously to avoid unnecessary locks.
  2. Row Locking: SKIP LOCKED only skips rows locked by other transactions, not those locked within the same transaction.
  3. Performance: Regularly archive or delete old jobs to prevent the table from growing indefinitely. Consider indexing the status column to improve query performance.
  4. Fault Tolerance: Ensure that workers handle crashes or timeouts gracefully. Use a timeout mechanism to revert jobs stuck in the β€˜in_progress’ state.
  5. Scaling: Distribute workers across multiple nodes to handle a higher job throughput.
  6. The SKIP LOCKED clause only applies to row-level locks – the required ROW SHARE table-level lock is still taken normally.
  7. Using SKIP LOCKED provides an inconsistent view of the data by design. This is why it’s perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required.

Learning Notes #41 – Shared Lock and Exclusive Locks | Postgres

6 January 2025 at 14:07

Today, I learnt about various locking mechanism to prevent double update. In this blog, i make notes on Shared Lock and Exclusive Lock for my future self.

What Are Locks in Databases?

Locks are mechanisms used by a DBMS to control access to data. They ensure that transactions are executed in a way that maintains the ACID (Atomicity, Consistency, Isolation, Durability) properties of the database. Locks can be classified into several types, including

  • Shared Locks (S Locks): Allow multiple transactions to read a resource simultaneously but prevent any transaction from writing to it.
  • Exclusive Locks (X Locks): Allow a single transaction to modify a resource, preventing both reading and writing by other transactions.
  • Intent Locks: Used to signal the type of lock a transaction intends to acquire at a lower level.
  • Deadlock Prevention Locks: Special locks aimed at preventing deadlock scenarios.

Shared Lock

A shared lock is used when a transaction needs to read a resource (e.g., a database row or table) without altering it. Multiple transactions can acquire a shared lock on the same resource simultaneously. However, as long as one or more shared locks exist on a resource, no transaction can acquire an exclusive lock on that resource.


-- Transaction A: Acquire a shared lock on a row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Transaction B: Acquire a shared lock on the same row
BEGIN;
SELECT * FROM employees WHERE id = 1 FOR SHARE;
-- Both transactions can read the row concurrently
-- Transaction C: Attempt to update the same row
BEGIN;
UPDATE employees SET salary = salary + 1000 WHERE id = 1;
-- Transaction C will be blocked until Transactions A and B release their locks

Key Characteristics of Shared Locks

1. Concurrent Reads

  • Shared locks allow multiple transactions to read the same resource at the same time.
  • This is ideal for operations like SELECT queries that do not modify data.

2. Write Blocking

  • While a shared lock is active, no transaction can modify the locked resource.
  • Prevents dirty writes and ensures read consistency.

3. Compatibility

  • Shared locks are compatible with other shared locks but not with exclusive locks.

When Are Shared Locks Used?

Shared locks are typically employed in read operations under certain isolation levels. For instance,

1. Read Committed Isolation Level:

  • Shared locks are held for the duration of the read operation.
  • Prevents dirty reads by ensuring the data being read is not modified by other transactions during the read.

2. Repeatable Read Isolation Level:

  • Shared locks are held until the transaction completes.
  • Ensures that the data read during a transaction remains consistent and unmodified.

3. Snapshot Isolation:

  • Shared locks may not be explicitly used, as the DBMS creates a consistent snapshot of the data for the transaction.

    Exclusive Locks

    An exclusive lock is used when a transaction needs to modify a resource. Only one transaction can hold an exclusive lock on a resource at a time, ensuring no other transactions can read or write to the locked resource.

    
    -- Transaction X: Acquire an exclusive lock to update a row
    BEGIN;
    UPDATE employees SET salary = salary + 1000 WHERE id = 2;
    -- Transaction Y: Attempt to read the same row
    BEGIN;
    SELECT * FROM employees WHERE id = 2;
    -- Transaction Y will be blocked until Transaction X completes
    -- Transaction Z: Attempt to update the same row
    BEGIN;
    UPDATE employees SET salary = salary + 500 WHERE id = 2;
    -- Transaction Z will also be blocked until Transaction X completes
    

    Key Characteristics of Exclusive Locks

    1. Write Operations: Exclusive locks are essential for operations like INSERT, UPDATE, and DELETE.

    2. Blocking Reads and Writes: While an exclusive lock is active, no other transaction can read or write to the resource.

    3. Isolation: Ensures that changes made by one transaction are not visible to others until the transaction is complete.

      When Are Exclusive Locks Used?

      Exclusive locks are typically employed in write operations or any operation that modifies the database. For instance:

      1. Transactional Updates – A transaction that updates a row acquires an exclusive lock to ensure no other transaction can access or modify the row during the update.

      2. Table Modifications – When altering a table structure, the DBMS may place an exclusive lock on the entire table.

      Benefits of Shared and Exclusive Locks

      Benefits of Shared Locks

      1. Consistency in Multi-User Environments – Ensure that data being read is not altered by other transactions, preserving consistency.
      2. Concurrency Support – Allow multiple transactions to read data simultaneously, improving system performance.
      3. Data Integrity – Prevent dirty reads and writes, ensuring that operations yield reliable results.

      Benefits of Exclusive Locks

      1. Data Integrity During Modifications – Prevents other transactions from accessing data being modified, ensuring changes are applied safely.
      2. Isolation of Transactions – Ensures that modifications by one transaction are not visible to others until committed.

      Limitations and Challenges

      Shared Locks

      1. Potential for Deadlocks – Deadlocks can occur if two transactions simultaneously hold shared locks and attempt to upgrade to exclusive locks.
      2. Blocking Writes – Shared locks can delay write operations, potentially impacting performance in write-heavy systems.
      3. Lock Escalation – In systems with high concurrency, shared locks may escalate to table-level locks, reducing granularity and concurrency.

      Exclusive Locks

      1. Reduced Concurrency – Exclusive locks prevent other transactions from accessing the locked resource, which can lead to bottlenecks in highly concurrent systems.
      2. Risk of Deadlocks – Deadlocks can occur if two transactions attempt to acquire exclusive locks on resources held by each other.

      Lock Compatibility

      Learning Notes #28 – Unlogged Table in Postgres

      2 January 2025 at 17:30

      Today, As part of daily reading, i came across https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/ where they discussing about postgres as a cache ! and comparing it with redis !! I was surprised at the title so gave a read through. Then i came across a concept of UNLOGGED table which act as a fast retrieval as cache. In this blog i jot down notes on unlogged table for future reference.

      Highly Recommended Links: https://martinheinz.dev/blog/105, https://raphaeldelio.com/2024/07/14/can-postgres-replace-redis-as-a-cache/, https://www.crunchydata.com/blog/postgresl-unlogged-tables

      Unlogged tables offer unique benefits in scenarios where speed is paramount, and durability (the guarantee that data is written to disk and will survive crashes) is not critical.

      What Are Unlogged Tables?

      Postgres Architecture : https://miro.com/app/board/uXjVLD2T5os=/

      In PostgreSQL, a table is a basic unit of data storage. By default, PostgreSQL ensures that data in regular tables is durable. This means that all data is written to the disk and will survive server crashes. However, in some situations, durability is not necessary. Unlogged tables are special types of tables in PostgreSQL where the database does not write data changes to the WAL (Write-Ahead Log).

      The absence of WAL logging for unlogged tables makes them faster than regular tables because PostgreSQL doesn’t need to ensure data consistency across crashes for these tables. However, this also means that if the server crashes or the system is powered off, the data in unlogged tables is lost.

      Key Characteristics of Unlogged Tables

      1. No Write-Ahead Logging (WAL) – By default, PostgreSQL writes changes to the WAL to ensure data durability. For unlogged tables, this step is skipped, making operations like INSERTs, UPDATEs, and DELETEs faster.
      2. No Durability – The absence of WAL means that unlogged tables will lose their data if the database crashes or if the server is restarted. This makes them unsuitable for critical data.
      3. Faster Performance – Since WAL writes are skipped, unlogged tables are faster for data insertion and modification. This can be beneficial for use cases where data is transient and doesn’t need to persist beyond the current session.
      4. Support for Indexes and Constraints – Unlogged tables can have indexes and constraints like regular tables. However, the data in these tables is still non-durable.
      5. Automatic Cleanup – When the PostgreSQL server restarts, the data in unlogged tables is automatically dropped. Therefore, unlogged tables only hold data during the current database session.

      Drawbacks of Unlogged Tables

      1. Data Loss on Crash – The most significant disadvantage of unlogged tables is the loss of data in case of a crash or restart. If the application depends on this data, then using unlogged tables would not be appropriate.
      2. Not Suitable for Critical Applications – Applications that require data persistence (such as financial or inventory systems) should avoid using unlogged tables, as the risk of data loss outweighs any performance benefits.
      3. No Replication – Unlogged tables are not replicated in standby servers in a replication setup, as the data is not written to the WAL.

      Creating an Unlogged Table

      Creating an unlogged table is very straightforward in PostgreSQL. You simply need to add the UNLOGGED keyword when creating the table.

      
      CREATE UNLOGGED TABLE temp_data (
          id SERIAL PRIMARY KEY,
          name VARCHAR(100),
          value INT
      );
      
      

      In this example, temp_data is an unlogged table. All operations performed on this table will not be logged to the WAL.

      When to Avoid Unlogged Tables?

      • If you are working with critical data that needs to be durable and persistent across restarts.
      • If your application requires data replication, as unlogged tables are not replicated in standby servers.
      • If your workload involves frequent crash scenarios where data loss cannot be tolerated.

      Examples

      1. Temporary Storage for processing

      
      CREATE UNLOGGED TABLE etl_staging (
          source_id INT,
          raw_data JSONB,
          processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );
      
      -- Insert raw data into the staging table
      INSERT INTO etl_staging (source_id, raw_data)
      VALUES 
          (1, '{"key": "value1"}'),
          (2, '{"key": "value2"}');
      
      -- Perform transformations on the data
      INSERT INTO final_table (id, key, value)
      SELECT source_id, 
             raw_data->>'key' AS key, 
             'processed_value' AS value
      FROM etl_staging;
      
      -- Clear the staging table
      TRUNCATE TABLE etl_staging;
      
      

      2. Caching

      
      CREATE UNLOGGED TABLE user_sessions (
          session_id UUID PRIMARY KEY,
          user_id INT,
          last_accessed TIMESTAMP DEFAULT NOW()
      );
      
      -- Insert session data
      INSERT INTO user_sessions (session_id, user_id)
      VALUES 
          (uuid_generate_v4(), 101),
          (uuid_generate_v4(), 102);
      
      -- Update last accessed timestamp
      UPDATE user_sessions
      SET last_accessed = NOW()
      WHERE session_id = 'some-session-id';
      
      -- Delete expired sessions
      DELETE FROM user_sessions WHERE last_accessed < NOW() - INTERVAL '1 hour';
      

      Query Optimization

      By: Sugirtha
      29 December 2024 at 09:35

      Query Optimization:

      Query Optimization is the process of improving the performance of a SQL query by reducing the amount of time and resources (like CPU, memory, and I/O) required to execute the query. The goal is to retrieve the desired data as quickly and efficiently as possible.

      Important implementation of Query Optimization:

      1. Indexing: Indexes on frequently used columns: As you mentioned, indexing columns that are part of the WHERE, JOIN, or ORDER BY clauses can significantly improve performance. For example, if you’re querying a salary column frequently, indexing it can speed up those queries.
        Composite indexes: If a query filters by multiple columns, a composite index on those columns might improve performance. For instance, INDEX (first_name, last_name) could be more efficient than two separate indexes on first_name and last_name.
      2. Instead of SELECT * FROM, can use the required columns and use of LIMIT for the required no. of rows.
      3. Optimizing JOIN Operations: Use appropriate join types: For example, avoid OUTER JOIN if INNER JOIN would suffice. Redundant or unnecessary joins increase query complexity and processing time.
      4. Use of EXPLAIN to Analyze Query Plan:
        Running EXPLAIN before a query allows you to understand how the database is executing it. You can spot areas where indexes are not being used, unnecessary full table scans are happening, or joins are inefficient.

      How to Implement Query Optimization:

      1. Use Indexes:
      • Create indexes on columns that are frequently queried or used in JOIN, WHERE, or ORDER BY clauses. For example, if you frequently query a column like user_id, an index on user_id will speed up lookups. Use multi-column indexes for queries involving multiple columns.
      • CREATE INDEX idx_user_id ON users(user_id);

      2. Rewrite Queries:

      • Avoid using SELECT * and instead select only the necessary columns.
      • Break complex queries into simpler ones and use temporary tables or Common Table Expressions (CTEs) if needed.
      • SELECT name, age FROM users WHERE age > 18;

      3. Use Joins Efficiently:

      • Ensure that you are using the most efficient join type for your query (e.g., prefer INNER JOIN over OUTER JOIN when possible).
      • Join on indexed columns to speed up the process.

      4. Optimize WHERE Clauses:

      • Make sure conditions in WHERE clauses are selective and reduce the number of rows as early as possible.
      • Use AND and OR operators appropriately to filter data early in the query.

      5. Limit the Number of Rows:

      • Use the LIMIT clause when dealing with large datasets to fetch only a required subset of data.
      • Avoid retrieving unnecessary data from the database.

      6. Avoid Subqueries When Possible:

      • Subqueries can be inefficient because they often lead to additional scans of the same data. Use joins instead of subqueries when possible.
      • If you must use subqueries, try to write them in a way that they don’t perform repeated calculations.

      7. Analyze Execution Plans:

      • Use EXPLAIN to see how the database is executing your query. This will give you insights into whether indexes are being used, how tables are being scanned, etc.
      • Example:
      1. EXPLAIN SELECT * FROM users WHERE age > 18;

      8. Use Proper Data Types:

      1. Choose the most efficient data types for your columns. For instance, use INTEGER for numeric values rather than VARCHAR, which takes more space and requires more processing.

      9. Avoid Functions on Indexed Columns:

      1. Using functions like UPPER(), LOWER(), or DATE() on indexed columns in WHERE clauses can prevent the database from using indexes effectively.
      2. Instead, try to perform transformations outside the query or ensure indexes are used.

      10. Database Configuration:

      1. Ensure the database system is configured properly for the hardware it’s running on. For example, memory and cache settings can significantly affect query performance.

      Example of Optimized Query:

      Non-Optimized Query:

      SELECT * FROM orders
      WHERE customer_id = 1001
      AND order_date > '2023-01-01';

      This query might perform a full table scan if customer_id and order_date are not indexed.

      Optimized Query:

      CREATE INDEX idx_customer_order_date ON orders(customer_id, order_date);
      
      SELECT order_id, order_date, total_amount
      FROM orders
      WHERE customer_id = 1001
      AND order_date > '2023-01-01';

      In this optimized version, an index on customer_id and order_date helps the database efficiently filter the rows without scanning the entire table.

      Reference : Learnt from ChatGPT

      SQL – Postgres – Few Advance Topics

      By: Sugirtha
      29 December 2024 at 09:31

      The order of execution in a SQL query:

      FROM and/or JOIN
      WHERE
      GROUP BY
      HAVING
      SELECT
      DISTINCT
      ORDER BY
      LIMIT nad/or OFFSET

      Command Types:

      References : Aysha Beevi

      CAST()

      CAST is used to typecast or we can use ::target data type.

      SELECT β€˜The current date is: β€˜ || CURRENT_DATE::TEXT;
      SELECT β€˜2024-12-21’::DATE::TEXT;
      SELECT CAST(β€˜2024-12-21’ AS DATE);

      || –> Concatenation operator

      DATE functions:

      SELECT CURRENT_DATE; β€” Output: 2024-12-21
      SELECT CURRENT_TIME; β€” Output: 09:15:34.123456+05:30
      SELECT NOW(); β€” Output: 2024-12-21 09:15:34.123456+05:30
      SELECT AGE(β€˜2020-01-01’, β€˜2010-01-01’); β€” Output: 10 years 0 mons 0 days
      SELECT AGE(β€˜1990-05-15’); β€” Output: 34 years 7 mons 6 days (calculated from NOW())
      SELECT EXTRACT(YEAR FROM NOW()); β€” Output: 2024
      SELECT EXTRACT(MONTH FROM CURRENT_DATE); β€” Output: 12
      SELECT EXTRACT(DAY FROM TIMESTAMP β€˜2024-12-25 10:15:00’); β€” Output: 25

      The DATE_TRUNC() function truncates a date or timestamp to the specified precision. This means it β€œresets” smaller parts of the date/time to their starting values.
      SELECT DATE_TRUNC(β€˜month’, TIMESTAMP β€˜2024-12-21 10:45:30’);
      β€” Output: 2024-12-01 00:00:00 –> The β€˜month’ precision resets the day to the 1st, and the time to 00:00:00.
      SELECT DATE_TRUNC(β€˜year’, TIMESTAMP β€˜2024-12-21 10:45:30’);
      β€” Output: 2024-01-01 00:00:00
      SELECT DATE_TRUNC(β€˜day’, TIMESTAMP β€˜2024-12-21 10:45:30’);
      β€” Output: 2024-12-21 00:00:00

      SELECT NOW() + INTERVAL β€˜1 year’;
      β€” Output: Current timestamp + 1 year
      SELECT CURRENT_DATE – INTERVAL ’30 days’;
      β€” Output: Today’s date – 30 days
      SELECT NOW() + INTERVAL β€˜2 hours’;
      β€” Output: Current timestamp + 2 hours
      SELECT NOW() + INTERVAL β€˜1 year’ + INTERVAL β€˜3 months’ – INTERVAL ’15 days’;

      Window Functions

      This is the function that will operate over the specified window. Common window functions include ROW_NUMBER(), RANK(), SUM(), AVG(), etc

      .PARTITION BY: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.ORDER BY: (Optional) Orders the rows in each partition before the window function is applied.

      window_function() OVER (--RANK() or SUM() etc. can come in window_function
          PARTITION BY column_name(s)
          ORDER BY column_name(s)
       );
      
      SELECT 
          department_id,
          employee_id,
          salary,
          SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) AS running_total
      FROM employees;
      

      CURSOR:

      DO $$
      DECLARE
      emp_name VARCHAR;
      emp_salary DECIMAL;
      emp_cursor CURSOR FOR SELECT name, salary FROM employees;
      BEGIN
      OPEN emp_cursor;
      LOOP
      FETCH emp_cursor INTO emp_name, emp_salary;
      EXIT WHEN NOT FOUND; β€” Exit the loop when no rows are left
      RAISE NOTICE β€˜Employee: %, Salary: %’, emp_name, emp_salary;
      END LOOP;
      CLOSE emp_cursor;

      Basic Data Types in PostgreSQL

      TEXT, VARCHAR, CHAR: Working with strings.
      INTEGER, BIGINT, NUMERIC: Handling numbers.
      DATE, TIMESTAMP: Date and time handling.
      

      OVER CLAUSE

      In PostgreSQL, the OVER() clause is used in window functions to define a window of rows over which a function operates. Just create a serial number (Row_number) from 1 (Rows are already ordered by salary desc)
      SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
      FROM employees
      WHERE row_num <= 5;

      RANK()

      Parition the table records based on the dept id, then inside each partition order by salary desc with rank 1,2,3… – In RANK() if same salary then RANK repeats.

      SELECT department_id, name, salary,
      RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
      FROM employees
      Output:
      department_id name salary rank
      101 Charlie 70,000 1
      101 Alice 50,000 2
      101 Frank 50,000 2
      102 Eve 75,000 1
      102 Bob 60,000 2
      103 David 55,000 1

      • Divides employees into 3 equal salary buckets (quartiles).
        SELECT id, name, salary,
        NTILE(3) OVER (ORDER BY salary DESC) AS quartile
        FROM employees;
        id name salary quartile
        5 Eve 75,000 1
        3 Charlie 70,000 1
        2 Bob 60,000 2
        4 David 55,000 2
        1 Alice 50,000 3
        6 Frank 50,000 3
      • Retrieves the first name in each department based on descending salary.
        SELECT department_id, name, salary,
        FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS top_earner
        FROM employees;
        Output:
        department_id name salary top_earner
        101 Charlie 70,000 Charlie
        101 Alice 50,000 Charlie
        101 Frank 50,000 Charlie
        102 Eve 75,000 Eve
        102 Bob 60,000 Eve
        103 David 55,000 David

      First from table will be taken, then WHERE condition will be applied

      • In the WHERE clause directly you cannot call the RANK(), it should be stored in result set, from there only we can call it. So only RANK() will get executed ie Windows CTE (Common Table Expression), that’s why first the CTE will get executed and stored in a temp result set, then SELECT from that result set.
      • Below we gave in the subquery, so it will get executed and then that value is getting used by the outer query.

      In each dept top earner name with his name and salary (consider the above table employees)
      SELECT department_id, name, salary
      FROM (
      SELECT department_id, name, salary,
      RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
      FROM employees
      ) ranked_employees
      WHERE rank = 1;

      department_id name salary
      101 Charlie 70,000
      102 Eve 75,000
      103 David 55,000

      Resultset – here RankedSalaries is Resultset

      WITH RankedSalaries AS (
      SELECT salary, RANK() OVER (ORDER BY salary DESC) AS rank
      FROM employees
      )
      SELECT salary
      FROM RankedSalaries WHERE rank = 2;

      Here, RankedSalaries is a temporary result set or CTE (Common Table Expression)

      Reference: Learnt from ChatGPT and Picture from Ms.Aysha

      PostgreSQL Tasks:

      By: Sakthivel
      4 September 2024 at 18:03

      Table I created table name is β€œmovies”

      TASK 1 : Show details for Vijay acted movies.

      SELECT * FROM movies WHERE actor=’vijay’;


      TASK 2 : Show Vijay movies names only.

      SELECT movie FROM movies WHERE actor=’vijay’;


      TASK 3 : Show details, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

      SELECT * FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


      TASK 4 : Show movie name only, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

      SELECT movie FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


      TASK 5 : Show movie name and actor name only, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

      SELECT movie,actor FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


      TASK 6 : Show the details, director name starts and ends with same letter.

      SELECT * FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 7 : Show the movie name only, the director name starts and ends with same letter.

      SELECT movie FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 8 : Show the director name only, the director name start and ends with same letter.

      SELECT director FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


      TASK 9 : Show the movie name, the actors have only five character.

      SELECT movie FROM movies WHERE LENGTH(actor)=5;


      TASK 10 : Show the movie name and actors name, actors have only five character.

      SELECT movie,actor FROM movies WHERE LENGTH(actor)=5;

      TASK 11 : Add a column salary_in_crore and values.

      ALTER TABLE movies ADD COLUMN salary_in_crore INT;

      UPDATE movies SET salary_in_crore =100 WHERE actor = β€˜vijay’;

      UPDATE movies SET salary_in_crore =70 WHERE actor = β€˜kamal’;

      UPDATE movies SET salary_in_crore =90 WHERE actor = β€˜ajith’;

      UPDATE movies SET salary_in_crore =40 WHERE actor = β€˜karthi’;

      UPDATE movies SET salary_in_crore =110 WHERE actor = β€˜rajini’;

      UPDATE movies SET salary_in_crore =50 WHERE actor = β€˜dhanush’;

      UPDATE movies SET salary_in_crore =5 WHERE actor = β€˜soori’;

      SELECT * FROM movies;


      TASK 12 : Show actor name and salary,order by salary high to low.

      SELECT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC;


      TASK 13 : Show top 3 actor based on salary.

      SELECT DISTINCT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC LIMIT 3;


      To import a .CSV/Excel file to PostgreSQL table:

      By: Sakthivel
      4 September 2024 at 15:11

      Insert values From a .csv file

      Step 1: Create a .csv or .xlsx file

      Step 2: Insert values from a .csv or .xlsx file

      Step 3: Copy the file and paste in to the location where pSQL installed:

      -> β€œ/var/lib/postgresql” and paste the file

      Step 4: Go to the terminal and connect the database.

      Step 5: Create table in the database with the columns given in the file header.

      Step 6: Enter the follwing query in psql


      -> COPY table_name FROM β€˜/var/lib/postgresql/fine_name.csv’ WITH CSV HEADER;


      Here, β€œmovies” is the table name and the location where the β€œmovies.csv” file is present.

      9 values copied

      Step 7: Then, write a select query to fetch all the rows.

      successfully import the datas in to postgres database from csv or excel file


      Different Database Models

      23 August 2024 at 01:50

      Database models define the structure, relationships, and operations that can be performed on a database. Different database models are used based on the specific needs of an application or organization. Here are the most common types of database models:

      1. Hierarchical Database Model

      • Structure: Data is organized in a tree-like structure with a single root, where each record has a single parent but can have multiple children.
      • Usage: Best for applications with a clear hierarchical relationship, like organizational structures or file systems.
      • Example: IBM’s Information Management System (IMS).
      • Advantages: Fast access to data through parent-child relationships.
      • Disadvantages: Rigid structure; difficult to reorganize or restructure.

      2. Network Database Model

      • Structure: Data is organized in a graph structure, where each record can have multiple parent and child records, forming a network of relationships.
      • Usage: Useful for complex relationships, such as in telecommunications or transportation networks.
      • Example: Integrated Data Store (IDS).
      • Advantages: Flexible representation of complex relationships.
      • Disadvantages: Complex design and navigation; can be difficult to maintain.

      3. Relational Database Model

      • Structure: Data is organized into tables (relations) where each table consists of rows (records) and columns (fields). Relationships between tables are managed through keys.
      • Usage: Widely used in various applications, including finance, retail, and enterprise software.
      • Example: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.
      • Advantages: Simplicity, data integrity, flexibility in querying through SQL.
      • Disadvantages: Can be slower for very large datasets or highly complex queries.

      4. Object-Oriented Database Model

      • Structure: Data is stored as objects, similar to objects in object-oriented programming. Each object contains both data and methods for processing the data.
      • Usage: Suitable for applications that require the modeling of complex data and relationships, such as CAD, CAM, and multimedia databases.
      • Example: db4o, ObjectDB.
      • Advantages: Seamless integration with object-oriented programming languages, reusability of objects.
      • Disadvantages: Complexity, not as widely adopted as relational databases.

      5. Document-Oriented Database Model

      • Structure: Data is stored in document collections, with each document being a self-contained piece of data often in JSON, BSON, or XML format.
      • Usage: Ideal for content management systems, real-time analytics, and big data applications.
      • Example: MongoDB, CouchDB.
      • Advantages: Flexible schema design, scalability, ease of storing hierarchical data.
      • Disadvantages: May require denormalization, leading to potential data redundancy.

      6. Key-Value Database Model

      • Structure: Data is stored as key-value pairs, where each key is unique, and the value can be a string, number, or more complex data structure.
      • Usage: Best for applications requiring fast access to simple data, such as caching, session management, and real-time analytics.
      • Example: Redis, DynamoDB, Riak.
      • Advantages: High performance, simplicity, scalability.
      • Disadvantages: Limited querying capabilities, lack of complex relationships.

      7. Column-Family Database Model

      • Structure: Data is stored in columns rather than rows, with each column family containing a set of columns that are logically related.
      • Usage: Suitable for distributed databases, handling large volumes of data across multiple servers.
      • Example: Apache Cassandra, HBase.
      • Advantages: High write and read performance, efficient storage of sparse data.
      • Disadvantages: Complexity in design and maintenance, not as flexible for ad-hoc queries.

      8. Graph Database Model

      • Structure: Data is stored as nodes (entities) and edges (relationships) forming a graph. Each node represents an object, and edges represent the relationships between objects.
      • Usage: Ideal for social networks, recommendation engines, fraud detection, and any scenario where relationships between entities are crucial.
      • Example: Neo4j, Amazon Neptune.
      • Advantages: Efficient traversal and querying of complex relationships, flexible schema.
      • Disadvantages: Not as efficient for operations on large sets of unrelated data.

      9. Multimodel Database

      • Structure: Supports multiple data models (e.g., relational, document, graph) within a single database engine.
      • Usage: Useful for applications that require different types of data storage and querying mechanisms.
      • Example: ArangoDB, Microsoft Azure Cosmos DB.
      • Advantages: Flexibility, ability to handle diverse data requirements within a single system.
      • Disadvantages: Complexity in management and optimization.

      10. Time-Series Database Model

      • Structure: Specifically designed to handle time-series data, where each record is associated with a timestamp.
      • Usage: Best for applications like monitoring, logging, and real-time analytics where data changes over time.
      • Example: InfluxDB, TimescaleDB.
      • Advantages: Optimized for handling and querying large volumes of time-stamped data.
      • Disadvantages: Limited use cases outside of time-series data.

      11. NoSQL Database Model

      • Structure: An umbrella term for various non-relational database models, including key-value, document, column-family, and graph databases.
      • Usage: Ideal for handling unstructured or semi-structured data, and scenarios requiring high scalability and flexibility.
      • Example: MongoDB, Cassandra, Couchbase, Neo4j.
      • Advantages: Flexibility, scalability, high performance for specific use cases.
      • Disadvantages: Lack of standardization, potential data consistency challenges.

      Summary

      Each database model serves different purposes, and the choice of model depends on the specific requirements of the application, such as data structure, relationships, performance needs, and scalability. While relational databases are still the most widely used, NoSQL and specialized databases have become increasingly important for handling diverse data types and large-scale applications.

      Terraform code for AWS Postgresql RDS

      7 January 2024 at 17:19

      create directory postgres and navigate
      $ mkdir postgres && cd postgres
      create main.tf file
      $ vim main.tf

      provider "aws" {
      }
      resource "aws_security_group" "rds_sg" {
      name = "rds_sg"
      ingress {
      from_port = 5432
      to_port = 5432
      protocol = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
      }
      }

      resource "aws_db_instance" "myinstance" {
      engine = "postgres"
      identifier = "myrdsinstance"
      allocated_storage = 20
      engine_version = "14"
      instance_class = "db.t3.micro"
      username = "myrdsuser"
      password = "myrdspassword"
      parameter_group_name = "default.postgres14"
      vpc_security_group_ids = ["${aws_security_group.rds_sg.id}"]
      skip_final_snapshot = true
      publicly_accessible = true
      }

      output "rds_endpoint" {
      value = "${aws_db_instance.myinstance.endpoint}"
      }

      save and exit
      $ terraform init
      $ terraform plan
      $ terraform apply -auto-approve
      Install postgres client in local machine
      $ sudo apt install -y postgresql-client
      To access AWS postgresql RDS instance
      $ psql -h <end_point_URL> –p=5432 –username=myrdsuser –password –dbname=mydb
      To destroy postgresql RDS instance
      $ terraform destroy -auto-approve

      ❌
      ❌