❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Basic SQL Queries, Stored Proc, Function in PostgreSQL

By: Sugirtha
2 January 2025 at 06:17

DDL, DML, DQL Queries:

CREATE TABLE Employees (
    EmployeeID INTEGER PRIMARY KEY, 
    Name VARCHAR(50), 
    Age INTEGER, 
    DepartmentID INTEGER, 
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
INSERT INTO Employees(empid, ename, age, deptid) VALUES(1, 'Kavi', 32, 101), (2, 'Sugi', 30, 102);
UPDATE Employees SET age=31 WHERE Name='Nila';
DELETE FROM Employees WHERE Name='Nila';
SELECT e.*, d.DepartmentName 
FROM Employees e 
JOIN Departments d ON e.DepartmentID = d.DepartmentID;

SELECT e.EmpName AS Employee, m.EmpName AS Manager
FROM Employees e
JOIN Employees m ON e.ManagerID = m.EmpID;

INNER JOIN:

  • Returns only the rows where there is a match between the columns in both tables.
  • If no match is found, the row is not included in the result.
  • It’s the most common type of join.

OUTER JOIN:

  • Returns all rows from one or both tables, even if there is no match in the other table.
    • LEFT OUTER JOIN (or just LEFT JOIN): Returns all rows from the left table, and the matched rows from the right table. If no match, the result will have NULL values for columns from the right table.
    • RIGHT OUTER JOIN (or just RIGHT JOIN): Returns all rows from the right table, and the matched rows from the left table. If no match, the result will have NULL values for columns from the left table.
    • FULL OUTER JOIN: Returns all rows from both tables. If there is no match, the result will have NULL values for the non-matching table’s columns.

GROUP BY:

  • Groups rows that have the same values in specified columns into summary rows (like finding the total count, sum, average, etc.).
  • It is typically used with aggregate functions such as COUNT(), SUM(), AVG(), MAX(), MIN().

HAVING:

  • Used to filter records after the GROUP BY has been applied.
  • It works similarly to the WHERE clause, but WHERE is used for filtering individual rows before grouping, while HAVING filters the grouped results.
SELECT DeptName, COUNT(*)
FROM Employees
GROUP BY DeptName;

DISTINCT:

  • Used to remove duplicate rows from the result set based on the specified columns.
  • If you specify only one column, it will return the distinct values of that column.
  • If you specify multiple columns, the combination of values in those columns will be considered to determine uniqueness.
SELECT DISTINCT DeptName FROM Employees;

SELECT DISTINCT DeptName, EmpName FROM Employees;

Difference between DELETE and TRUNCATE:

  • Removes rows one by one and logs each deletion, which can be slower for large datasets.
  • You can use a WHERE clause to specify which rows to delete.
  • Can be rolled back if you’re working within a transaction (assuming no COMMIT has been done).
  • Can fire triggers if there are any triggers defined on the table (for example, BEFORE DELETE or AFTER DELETE triggers).

TRUNCATE:

  • Removes all rows in the table in one go, without scanning them individually.
  • Does not support a WHERE clause, so it always deletes all rows.
  • It’s much faster than DELETE because it doesn’t log individual row deletions (but it does log the deallocation of the table’s data pages).
  • Cannot be rolled back in most databases (unless in a transaction, depending on the DBMS), and there are no triggers involved.

UNION:

  • Combines the results of two or more queries.
  • Removes duplicates: Only unique rows are included in the final result.
  • It performs a sort operation to eliminate duplicates, which can have a slight performance cost.

UNION ALL:

  • Also combines the results of two or more queries.
  • Keeps duplicates: All rows from the queries are included in the final result, even if they are the same.
  • It doesn’t perform the sort operation, which usually makes it faster than UNION.
SELECT EmpID, EmpName FROM Employees
UNION ALL
SELECT EmpID, EmpName FROM Contractors;

SELECT EmpID, EmpName FROM Employees
UNION 
SELECT EmpID, EmpName FROM Contractors;

COALESCE():

First Non null value will be taken, For ex. in select statement, some names are null, that time some default value can be used or another field value.
SELECT COALESCE(NULL, β€˜Hello’, β€˜World’);
Output: Hello

INSERT INTO users (name, nickname) VALUES
(β€˜Alice’, NULL),
(NULL, β€˜Bob’),
(NULL, NULL);

SELECT id, COALESCE(name, nickname, β€˜Unknown’) AS display_name FROM users;

NULLIF()

NULLIF(expression1, expression2)
Returns null if both expressions or column values are equal, else return first the first column value, ie expression1
SELECT NULLIF(10, 10); β€” Output: NULL
SELECT NULLIF(10, 20); β€” Output: 10
SELECT NULLIF(10, NULL) OR β€” Output: 10
SELECT NULLIF(NULL, 10) β€” Output: NULL

IF Condition:

The IF statement is used to check conditions and execute SQL code accordingly.

IF condition THEN
    -- Code to execute if the condition is true
ELSIF condition THEN
    -- Code block to execute if another condition is true
ELSE
    -- Code to execute if the condition is false
END IF;

IF NOT FOUND THEN
    RAISE NOTICE 'Employee with ID % not found!', emp_id;
    emp_bonus := 0;
END IF;

CASE WHEN:

The CASE WHEN expression is used for conditional logic within a query (similar to IF but more flexible in SQL).

SELECT 
    name,
    salary,
    CASE 
        WHEN salary > 5000 THEN 'High Salary'
        WHEN salary BETWEEN 3000 AND 5000 THEN 'Average Salary'
        ELSE 'Low Salary'
    END AS salary_category
FROM employees;

FOR LOOP:

DECLARE 
    i INT;
BEGIN
    FOR i IN 1..5 LOOP
        -- Perform an action for each iteration (e.g., insert or update a record)
        INSERT INTO audit_log (action, timestamp) 
        VALUES ('Employee update', NOW());
    END LOOP;
END;

FOR record IN SELECT column1, column2 FROM employees LOOP
-- Code block using record.column1, record.column2
END LOOP;

RAISE – used for printing something (SOP in java)

RAISE NOTICE β€˜Employee: %, Salary: %’, emp_name, emp_salary;
RAISE EXCEPTION β€˜An error occurred: %’, error_message; β€” This will print and halt the execution.
RAISE INFO β€˜Employee: %, Salary: %’, emp_name, emp_salary;

Stored Procedures in SQL:

A stored procedure is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. Stored procedures are used for:

  • Modularizing complex SQL logic.
  • Improving performance by reducing network traffic.
  • Ensuring code reuse and security (by granting permissions to execute rather than to the tables directly).

Example:

A stored procedure to insert a new employee record:

CREATE PROCEDURE add_employee(emp_name VARCHAR, emp_salary NUMERIC)
LANGUAGE plpgsql AS 
$$ 
BEGIN 
  INSERT INTO employees (name, salary) VALUES (emp_name, emp_salary); 
END; 
$$;

Execution:

CALL add_employee(β€˜John Doe’, 50000);

Functions in SQL:

A SQL function is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. It is similar to a procedure but returns a single value or table. Functions are typically used for computations or transformations.
Example: A function to calculate the yearly salary:

CREATE FUNCTION calculate_yearly_salary(monthly_salary NUMERIC)
RETURNS NUMERIC
LANGUAGE plpgsql AS 
$$
BEGIN
  RETURN monthly_salary * 12;
END;
$$;

Execution:

SELECT calculate_yearly_salary(5000); OR EXECUTE calculate_yearly_salary(5000); (If we are using inside a trigger)

Key Differences Between Procedures and Functions:

Return Type:

  • Function: Always returns a value.
  • Procedure: Does not return a value.

Usage:

  • Function: Can be used in SQL queries (e.g., SELECT).
  • Procedure: Called using CALL, cannot be used in SQL queries.

Transaction Control:

  • Function: Cannot manage transactions.
  • Procedure: Can manage transactions (e.g., COMMIT, ROLLBACK).

Side Effects:

  • Function: Should not have side effects (e.g., modifying data).
  • Procedure: Can modify data and have side effects.

Calling Mechanism:

Procedure: Called using CALL procedure_name().

Function: Called within SQL expressions, like SELECT function_name().

TRIGGER:

A trigger is a special kind of stored procedure that automatically executes (or β€œfires”) when certain events occur in the database, such as INSERT, UPDATE, or DELETE. Triggers can be used to enforce business rules, validate data, or maintain audit logs.
Key Points:

Types of Triggers:

  • BEFORE Trigger: Fires before the actual operation (INSERT, UPDATE, DELETE).
  • AFTER Trigger: Fires after the actual operation.
  • INSTEAD OF Trigger: Used to override the standard operation, useful in views. (This is in SQL Server only not in postgres)

  • Trigger Actions: The trigger action can be an operation like logging data, updating related tables, or enforcing data integrity.
  • Trigger Events: A trigger can be set to fire on certain events, such as when a row is inserted, updated, or deleted.
  • Trigger Scope: Triggers can be defined to act on either a row (executing once for each affected row) or a statement (executing once for the entire statement).
  • A trigger can be created to log changes in a Users table whenever a record is updated, or it could prevent deleting a record if certain conditions aren’t met.

Example:

CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
    IF OLD.Age <> NEW.Age THEN
        INSERT INTO EmployeeLogs (EmployeeID, OldAge, NewAge)
        VALUES (OLD.EmployeeID, OLD.Age, NEW.Age);
    END IF;
END;

Example:

CREATE OR REPLACE FUNCTION prevent_employee_delete()
RETURNS TRIGGER AS 
$$
BEGIN
-- Check if the employee is in a protected department (for example, department_id = 10)
  IF OLD.department_id = 10 THEN
     RAISE EXCEPTION 'Cannot delete employee in department 10';
  END IF;
  RETURN OLD;
END;
$$ 
LANGUAGE plpgsql;

-- Attach the function to a trigger
CREATE TRIGGER prevent_employee_delete_trigger
BEFORE DELETE ON Employees
FOR EACH ROW
EXECUTE FUNCTION prevent_employee_delete();

Creates a trigger which is used to log age and related whenever insert, delete, update action on employee rows:

CREATE OR REPLACE FUNCTION log_employee_changes()
RETURNS TRIGGER AS 
$$
BEGIN
-- Handle INSERT operation
  IF (TG_OP = 'INSERT') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, NewAge,    ChangeTime)
    VALUES (NEW.EmployeeID, 'INSERT', NEW.Age, CURRENT_TIMESTAMP);
    RETURN NEW;
     -- Handle UPDATE operation
  ELSIF (TG_OP = 'UPDATE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, NewAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'UPDATE', OLD.Age, NEW.Age,  CURRENT_TIMESTAMP);
    RETURN NEW;
  -- Handle DELETE operation
  ELSIF (TG_OP = 'DELETE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'DELETE', OLD.Age, CURRENT_TIMESTAMP);
    RETURN OLD;
  END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER log_employee_changes_trigger
AFTER INSERT OR UPDATE OR DELETE 
ON Employees
FOR EACH ROW
EXECUTE FUNCTION log_employee_changes();

Step 3: Attach the Trigger to the Employees Table

Now that we have the function, we can attach it to the Employees table to log changes. We’ll create a trigger that fires on insert, update, and delete operations.

TG_OP: This is a special variable in PostgreSQL that holds the operation type (either INSERT, UPDATE, or DELETE).
NEW and OLD: These are references to the row being inserted or updated (NEW) or the row before it was updated or deleted (OLD).
EmployeeChangeLog: This table stores the details of the changes (employee ID, operation type, old and new values, timestamp). – Programmer defined.

What happens when you omit FOR EACH ROW?

  1. Statement-Level Trigger: The trigger will fire once per SQL statement, regardless of how many rows are affected. This means it won’t have access to the individual rows being modified.
    • For example, if you run an UPDATE statement that affects 10 rows, the trigger will fire once (for the statement) rather than for each of those 10 rows.
  2. No Access to Row-Specific Data: You won’t be able to use OLD or NEW values to capture the individual row’s data. The trigger will just execute as a whole, without row-specific actions.
  3. With FOR EACH ROW: The trigger works on each row affected, and you can track specific changes (e.g., old vs new values).Without FOR EACH ROW: The trigger fires once per statement and doesn’t have access to specific row data.
CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
BEGIN
    -- Perform some operation, but it won't track individual rows.
    INSERT INTO AuditLogs (EventDescription)
    VALUES ('Employees table updated');
END;

NORMALIZATION:

1st NF:
  1. Each column/attribute should have atomic value or indivisible value, ie only one value.
  2. Rows should not be repeated, ie unique rows, there is not necessary to have PKey here.
2nd NF:
  1. Must fulfill the 1st NF. [cadidate key(composite key to form the uniqueness)]
  2. All non-candidate-key columns should be fully dependent on the each attribute/column of the composite keys to form the cadidate key. For ex. If the DB is in denormalalized form (ie before normalization, all tables and values are together in a single table) and the candidate key is (orderId+ProductId), then the non-key(not part of the candidate key) if you take orderdate, orderedStatus, qty, item_price are not dependent on each part of the candidate key ie it depends only orderId, not ProductId, ProductName are not dependent on Order, like that customer details are not dependent on ProductId. So only related items should be there in a table, so the table is partitioned based on the column values, so that each attribute will depend on its candidate key.
    So Products goto separate table, orders separate and customers going to separate table.
  3. Primary key is created based for each separated table and ensure that all non-key columns completely dependent on the primary key. Then the foreign key relationships also established to connect all the tablesis not fullly dependent on.
3rd NF:
  1. Must fulfill till 2ndNF.
  2. Remove the transitional dependency (In a decentralized DB, One column value(Order ID) is functionally dependent on another column(Product ID) and OrderId is functionally dependent on the OrderId, so that disturbing one value will affect another row with same column value), so to avoid that separate the table, for Ex. from orders table Sales People’s data is separated.

What is a Transitive Dependency? Let’s break this down with a simple example:
StudentID Department HODName
S001 IT Dr. Rajan
S002 CS Dr. Priya

Primary Key: StudentID
Non-prime attributes: Department, HODName

StudentID β†’ Department (StudentID determines the department).
Department β†’ HODName (Department determines the HOD name). It should be like StudentID only should determine HOD, not the dept. HODName depends indirectly on StudentID through Department.

This is a transitive dependency, and we need to remove it.

A transitive dependency means a non-prime attribute (not part of the candidate key) depends indirectly on the primary key through another non-prime attribute.

Reference: https://www.youtube.com/watch?v=rBPQ5fg_kiY and Learning with the help of chatGPT

SQL – Postgres – Few Advance Topics

By: Sugirtha
29 December 2024 at 09:31

The order of execution in a SQL query:

FROM and/or JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT nad/or OFFSET

Command Types:

References : Aysha Beevi

CAST()

CAST is used to typecast or we can use ::target data type.

SELECT β€˜The current date is: β€˜ || CURRENT_DATE::TEXT;
SELECT β€˜2024-12-21’::DATE::TEXT;
SELECT CAST(β€˜2024-12-21’ AS DATE);

|| –> Concatenation operator

DATE functions:

SELECT CURRENT_DATE; β€” Output: 2024-12-21
SELECT CURRENT_TIME; β€” Output: 09:15:34.123456+05:30
SELECT NOW(); β€” Output: 2024-12-21 09:15:34.123456+05:30
SELECT AGE(β€˜2020-01-01’, β€˜2010-01-01’); β€” Output: 10 years 0 mons 0 days
SELECT AGE(β€˜1990-05-15’); β€” Output: 34 years 7 mons 6 days (calculated from NOW())
SELECT EXTRACT(YEAR FROM NOW()); β€” Output: 2024
SELECT EXTRACT(MONTH FROM CURRENT_DATE); β€” Output: 12
SELECT EXTRACT(DAY FROM TIMESTAMP β€˜2024-12-25 10:15:00’); β€” Output: 25

The DATE_TRUNC() function truncates a date or timestamp to the specified precision. This means it β€œresets” smaller parts of the date/time to their starting values.
SELECT DATE_TRUNC(β€˜month’, TIMESTAMP β€˜2024-12-21 10:45:30’);
β€” Output: 2024-12-01 00:00:00 –> The β€˜month’ precision resets the day to the 1st, and the time to 00:00:00.
SELECT DATE_TRUNC(β€˜year’, TIMESTAMP β€˜2024-12-21 10:45:30’);
β€” Output: 2024-01-01 00:00:00
SELECT DATE_TRUNC(β€˜day’, TIMESTAMP β€˜2024-12-21 10:45:30’);
β€” Output: 2024-12-21 00:00:00

SELECT NOW() + INTERVAL β€˜1 year’;
β€” Output: Current timestamp + 1 year
SELECT CURRENT_DATE – INTERVAL ’30 days’;
β€” Output: Today’s date – 30 days
SELECT NOW() + INTERVAL β€˜2 hours’;
β€” Output: Current timestamp + 2 hours
SELECT NOW() + INTERVAL β€˜1 year’ + INTERVAL β€˜3 months’ – INTERVAL ’15 days’;

Window Functions

This is the function that will operate over the specified window. Common window functions include ROW_NUMBER(), RANK(), SUM(), AVG(), etc

.PARTITION BY: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.ORDER BY: (Optional) Orders the rows in each partition before the window function is applied.

window_function() OVER (--RANK() or SUM() etc. can come in window_function
    PARTITION BY column_name(s)
    ORDER BY column_name(s)
 );

SELECT 
    department_id,
    employee_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) AS running_total
FROM employees;

CURSOR:

DO $$
DECLARE
emp_name VARCHAR;
emp_salary DECIMAL;
emp_cursor CURSOR FOR SELECT name, salary FROM employees;
BEGIN
OPEN emp_cursor;
LOOP
FETCH emp_cursor INTO emp_name, emp_salary;
EXIT WHEN NOT FOUND; β€” Exit the loop when no rows are left
RAISE NOTICE β€˜Employee: %, Salary: %’, emp_name, emp_salary;
END LOOP;
CLOSE emp_cursor;

Basic Data Types in PostgreSQL

TEXT, VARCHAR, CHAR: Working with strings.
INTEGER, BIGINT, NUMERIC: Handling numbers.
DATE, TIMESTAMP: Date and time handling.

OVER CLAUSE

In PostgreSQL, the OVER() clause is used in window functions to define a window of rows over which a function operates. Just create a serial number (Row_number) from 1 (Rows are already ordered by salary desc)
SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
WHERE row_num <= 5;

RANK()

Parition the table records based on the dept id, then inside each partition order by salary desc with rank 1,2,3… – In RANK() if same salary then RANK repeats.

SELECT department_id, name, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
Output:
department_id name salary rank
101 Charlie 70,000 1
101 Alice 50,000 2
101 Frank 50,000 2
102 Eve 75,000 1
102 Bob 60,000 2
103 David 55,000 1

  • Divides employees into 3 equal salary buckets (quartiles).
    SELECT id, name, salary,
    NTILE(3) OVER (ORDER BY salary DESC) AS quartile
    FROM employees;
    id name salary quartile
    5 Eve 75,000 1
    3 Charlie 70,000 1
    2 Bob 60,000 2
    4 David 55,000 2
    1 Alice 50,000 3
    6 Frank 50,000 3
  • Retrieves the first name in each department based on descending salary.
    SELECT department_id, name, salary,
    FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS top_earner
    FROM employees;
    Output:
    department_id name salary top_earner
    101 Charlie 70,000 Charlie
    101 Alice 50,000 Charlie
    101 Frank 50,000 Charlie
    102 Eve 75,000 Eve
    102 Bob 60,000 Eve
    103 David 55,000 David

First from table will be taken, then WHERE condition will be applied

  • In the WHERE clause directly you cannot call the RANK(), it should be stored in result set, from there only we can call it. So only RANK() will get executed ie Windows CTE (Common Table Expression), that’s why first the CTE will get executed and stored in a temp result set, then SELECT from that result set.
  • Below we gave in the subquery, so it will get executed and then that value is getting used by the outer query.

In each dept top earner name with his name and salary (consider the above table employees)
SELECT department_id, name, salary
FROM (
SELECT department_id, name, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
) ranked_employees
WHERE rank = 1;

department_id name salary
101 Charlie 70,000
102 Eve 75,000
103 David 55,000

Resultset – here RankedSalaries is Resultset

WITH RankedSalaries AS (
SELECT salary, RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
)
SELECT salary
FROM RankedSalaries WHERE rank = 2;

Here, RankedSalaries is a temporary result set or CTE (Common Table Expression)

Reference: Learnt from ChatGPT and Picture from Ms.Aysha

❌
❌