❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Basic SQL Queries, Stored Proc, Function in PostgreSQL

By: Sugirtha
2 January 2025 at 06:17

DDL, DML, DQL Queries:

CREATE TABLE Employees (
    EmployeeID INTEGER PRIMARY KEY, 
    Name VARCHAR(50), 
    Age INTEGER, 
    DepartmentID INTEGER, 
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
INSERT INTO Employees(empid, ename, age, deptid) VALUES(1, 'Kavi', 32, 101), (2, 'Sugi', 30, 102);
UPDATE Employees SET age=31 WHERE Name='Nila';
DELETE FROM Employees WHERE Name='Nila';
SELECT e.*, d.DepartmentName 
FROM Employees e 
JOIN Departments d ON e.DepartmentID = d.DepartmentID;

SELECT e.EmpName AS Employee, m.EmpName AS Manager
FROM Employees e
JOIN Employees m ON e.ManagerID = m.EmpID;

INNER JOIN:

  • Returns only the rows where there is a match between the columns in both tables.
  • If no match is found, the row is not included in the result.
  • It’s the most common type of join.

OUTER JOIN:

  • Returns all rows from one or both tables, even if there is no match in the other table.
    • LEFT OUTER JOIN (or just LEFT JOIN): Returns all rows from the left table, and the matched rows from the right table. If no match, the result will have NULL values for columns from the right table.
    • RIGHT OUTER JOIN (or just RIGHT JOIN): Returns all rows from the right table, and the matched rows from the left table. If no match, the result will have NULL values for columns from the left table.
    • FULL OUTER JOIN: Returns all rows from both tables. If there is no match, the result will have NULL values for the non-matching table’s columns.

GROUP BY:

  • Groups rows that have the same values in specified columns into summary rows (like finding the total count, sum, average, etc.).
  • It is typically used with aggregate functions such as COUNT(), SUM(), AVG(), MAX(), MIN().

HAVING:

  • Used to filter records after the GROUP BY has been applied.
  • It works similarly to the WHERE clause, but WHERE is used for filtering individual rows before grouping, while HAVING filters the grouped results.
SELECT DeptName, COUNT(*)
FROM Employees
GROUP BY DeptName;

DISTINCT:

  • Used to remove duplicate rows from the result set based on the specified columns.
  • If you specify only one column, it will return the distinct values of that column.
  • If you specify multiple columns, the combination of values in those columns will be considered to determine uniqueness.
SELECT DISTINCT DeptName FROM Employees;

SELECT DISTINCT DeptName, EmpName FROM Employees;

Difference between DELETE and TRUNCATE:

  • Removes rows one by one and logs each deletion, which can be slower for large datasets.
  • You can use a WHERE clause to specify which rows to delete.
  • Can be rolled back if you’re working within a transaction (assuming no COMMIT has been done).
  • Can fire triggers if there are any triggers defined on the table (for example, BEFORE DELETE or AFTER DELETE triggers).

TRUNCATE:

  • Removes all rows in the table in one go, without scanning them individually.
  • Does not support a WHERE clause, so it always deletes all rows.
  • It’s much faster than DELETE because it doesn’t log individual row deletions (but it does log the deallocation of the table’s data pages).
  • Cannot be rolled back in most databases (unless in a transaction, depending on the DBMS), and there are no triggers involved.

UNION:

  • Combines the results of two or more queries.
  • Removes duplicates: Only unique rows are included in the final result.
  • It performs a sort operation to eliminate duplicates, which can have a slight performance cost.

UNION ALL:

  • Also combines the results of two or more queries.
  • Keeps duplicates: All rows from the queries are included in the final result, even if they are the same.
  • It doesn’t perform the sort operation, which usually makes it faster than UNION.
SELECT EmpID, EmpName FROM Employees
UNION ALL
SELECT EmpID, EmpName FROM Contractors;

SELECT EmpID, EmpName FROM Employees
UNION 
SELECT EmpID, EmpName FROM Contractors;

COALESCE():

First Non null value will be taken, For ex. in select statement, some names are null, that time some default value can be used or another field value.
SELECT COALESCE(NULL, β€˜Hello’, β€˜World’);
Output: Hello

INSERT INTO users (name, nickname) VALUES
(β€˜Alice’, NULL),
(NULL, β€˜Bob’),
(NULL, NULL);

SELECT id, COALESCE(name, nickname, β€˜Unknown’) AS display_name FROM users;

NULLIF()

NULLIF(expression1, expression2)
Returns null if both expressions or column values are equal, else return first the first column value, ie expression1
SELECT NULLIF(10, 10); β€” Output: NULL
SELECT NULLIF(10, 20); β€” Output: 10
SELECT NULLIF(10, NULL) OR β€” Output: 10
SELECT NULLIF(NULL, 10) β€” Output: NULL

IF Condition:

The IF statement is used to check conditions and execute SQL code accordingly.

IF condition THEN
    -- Code to execute if the condition is true
ELSIF condition THEN
    -- Code block to execute if another condition is true
ELSE
    -- Code to execute if the condition is false
END IF;

IF NOT FOUND THEN
    RAISE NOTICE 'Employee with ID % not found!', emp_id;
    emp_bonus := 0;
END IF;

CASE WHEN:

The CASE WHEN expression is used for conditional logic within a query (similar to IF but more flexible in SQL).

SELECT 
    name,
    salary,
    CASE 
        WHEN salary > 5000 THEN 'High Salary'
        WHEN salary BETWEEN 3000 AND 5000 THEN 'Average Salary'
        ELSE 'Low Salary'
    END AS salary_category
FROM employees;

FOR LOOP:

DECLARE 
    i INT;
BEGIN
    FOR i IN 1..5 LOOP
        -- Perform an action for each iteration (e.g., insert or update a record)
        INSERT INTO audit_log (action, timestamp) 
        VALUES ('Employee update', NOW());
    END LOOP;
END;

FOR record IN SELECT column1, column2 FROM employees LOOP
-- Code block using record.column1, record.column2
END LOOP;

RAISE – used for printing something (SOP in java)

RAISE NOTICE β€˜Employee: %, Salary: %’, emp_name, emp_salary;
RAISE EXCEPTION β€˜An error occurred: %’, error_message; β€” This will print and halt the execution.
RAISE INFO β€˜Employee: %, Salary: %’, emp_name, emp_salary;

Stored Procedures in SQL:

A stored procedure is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. Stored procedures are used for:

  • Modularizing complex SQL logic.
  • Improving performance by reducing network traffic.
  • Ensuring code reuse and security (by granting permissions to execute rather than to the tables directly).

Example:

A stored procedure to insert a new employee record:

CREATE PROCEDURE add_employee(emp_name VARCHAR, emp_salary NUMERIC)
LANGUAGE plpgsql AS 
$$ 
BEGIN 
  INSERT INTO employees (name, salary) VALUES (emp_name, emp_salary); 
END; 
$$;

Execution:

CALL add_employee(β€˜John Doe’, 50000);

Functions in SQL:

A SQL function is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. It is similar to a procedure but returns a single value or table. Functions are typically used for computations or transformations.
Example: A function to calculate the yearly salary:

CREATE FUNCTION calculate_yearly_salary(monthly_salary NUMERIC)
RETURNS NUMERIC
LANGUAGE plpgsql AS 
$$
BEGIN
  RETURN monthly_salary * 12;
END;
$$;

Execution:

SELECT calculate_yearly_salary(5000); OR EXECUTE calculate_yearly_salary(5000); (If we are using inside a trigger)

Key Differences Between Procedures and Functions:

Return Type:

  • Function: Always returns a value.
  • Procedure: Does not return a value.

Usage:

  • Function: Can be used in SQL queries (e.g., SELECT).
  • Procedure: Called using CALL, cannot be used in SQL queries.

Transaction Control:

  • Function: Cannot manage transactions.
  • Procedure: Can manage transactions (e.g., COMMIT, ROLLBACK).

Side Effects:

  • Function: Should not have side effects (e.g., modifying data).
  • Procedure: Can modify data and have side effects.

Calling Mechanism:

Procedure: Called using CALL procedure_name().

Function: Called within SQL expressions, like SELECT function_name().

TRIGGER:

A trigger is a special kind of stored procedure that automatically executes (or β€œfires”) when certain events occur in the database, such as INSERT, UPDATE, or DELETE. Triggers can be used to enforce business rules, validate data, or maintain audit logs.
Key Points:

Types of Triggers:

  • BEFORE Trigger: Fires before the actual operation (INSERT, UPDATE, DELETE).
  • AFTER Trigger: Fires after the actual operation.
  • INSTEAD OF Trigger: Used to override the standard operation, useful in views. (This is in SQL Server only not in postgres)

  • Trigger Actions: The trigger action can be an operation like logging data, updating related tables, or enforcing data integrity.
  • Trigger Events: A trigger can be set to fire on certain events, such as when a row is inserted, updated, or deleted.
  • Trigger Scope: Triggers can be defined to act on either a row (executing once for each affected row) or a statement (executing once for the entire statement).
  • A trigger can be created to log changes in a Users table whenever a record is updated, or it could prevent deleting a record if certain conditions aren’t met.

Example:

CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
    IF OLD.Age <> NEW.Age THEN
        INSERT INTO EmployeeLogs (EmployeeID, OldAge, NewAge)
        VALUES (OLD.EmployeeID, OLD.Age, NEW.Age);
    END IF;
END;

Example:

CREATE OR REPLACE FUNCTION prevent_employee_delete()
RETURNS TRIGGER AS 
$$
BEGIN
-- Check if the employee is in a protected department (for example, department_id = 10)
  IF OLD.department_id = 10 THEN
     RAISE EXCEPTION 'Cannot delete employee in department 10';
  END IF;
  RETURN OLD;
END;
$$ 
LANGUAGE plpgsql;

-- Attach the function to a trigger
CREATE TRIGGER prevent_employee_delete_trigger
BEFORE DELETE ON Employees
FOR EACH ROW
EXECUTE FUNCTION prevent_employee_delete();

Creates a trigger which is used to log age and related whenever insert, delete, update action on employee rows:

CREATE OR REPLACE FUNCTION log_employee_changes()
RETURNS TRIGGER AS 
$$
BEGIN
-- Handle INSERT operation
  IF (TG_OP = 'INSERT') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, NewAge,    ChangeTime)
    VALUES (NEW.EmployeeID, 'INSERT', NEW.Age, CURRENT_TIMESTAMP);
    RETURN NEW;
     -- Handle UPDATE operation
  ELSIF (TG_OP = 'UPDATE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, NewAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'UPDATE', OLD.Age, NEW.Age,  CURRENT_TIMESTAMP);
    RETURN NEW;
  -- Handle DELETE operation
  ELSIF (TG_OP = 'DELETE') THEN
    INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, ChangeTime)
    VALUES (OLD.EmployeeID, 'DELETE', OLD.Age, CURRENT_TIMESTAMP);
    RETURN OLD;
  END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER log_employee_changes_trigger
AFTER INSERT OR UPDATE OR DELETE 
ON Employees
FOR EACH ROW
EXECUTE FUNCTION log_employee_changes();

Step 3: Attach the Trigger to the Employees Table

Now that we have the function, we can attach it to the Employees table to log changes. We’ll create a trigger that fires on insert, update, and delete operations.

TG_OP: This is a special variable in PostgreSQL that holds the operation type (either INSERT, UPDATE, or DELETE).
NEW and OLD: These are references to the row being inserted or updated (NEW) or the row before it was updated or deleted (OLD).
EmployeeChangeLog: This table stores the details of the changes (employee ID, operation type, old and new values, timestamp). – Programmer defined.

What happens when you omit FOR EACH ROW?

  1. Statement-Level Trigger: The trigger will fire once per SQL statement, regardless of how many rows are affected. This means it won’t have access to the individual rows being modified.
    • For example, if you run an UPDATE statement that affects 10 rows, the trigger will fire once (for the statement) rather than for each of those 10 rows.
  2. No Access to Row-Specific Data: You won’t be able to use OLD or NEW values to capture the individual row’s data. The trigger will just execute as a whole, without row-specific actions.
  3. With FOR EACH ROW: The trigger works on each row affected, and you can track specific changes (e.g., old vs new values).Without FOR EACH ROW: The trigger fires once per statement and doesn’t have access to specific row data.
CREATE TRIGGER LogEmployeeAgeUpdate
AFTER UPDATE ON Employees
BEGIN
    -- Perform some operation, but it won't track individual rows.
    INSERT INTO AuditLogs (EventDescription)
    VALUES ('Employees table updated');
END;

NORMALIZATION:

1st NF:
  1. Each column/attribute should have atomic value or indivisible value, ie only one value.
  2. Rows should not be repeated, ie unique rows, there is not necessary to have PKey here.
2nd NF:
  1. Must fulfill the 1st NF. [cadidate key(composite key to form the uniqueness)]
  2. All non-candidate-key columns should be fully dependent on the each attribute/column of the composite keys to form the cadidate key. For ex. If the DB is in denormalalized form (ie before normalization, all tables and values are together in a single table) and the candidate key is (orderId+ProductId), then the non-key(not part of the candidate key) if you take orderdate, orderedStatus, qty, item_price are not dependent on each part of the candidate key ie it depends only orderId, not ProductId, ProductName are not dependent on Order, like that customer details are not dependent on ProductId. So only related items should be there in a table, so the table is partitioned based on the column values, so that each attribute will depend on its candidate key.
    So Products goto separate table, orders separate and customers going to separate table.
  3. Primary key is created based for each separated table and ensure that all non-key columns completely dependent on the primary key. Then the foreign key relationships also established to connect all the tablesis not fullly dependent on.
3rd NF:
  1. Must fulfill till 2ndNF.
  2. Remove the transitional dependency (In a decentralized DB, One column value(Order ID) is functionally dependent on another column(Product ID) and OrderId is functionally dependent on the OrderId, so that disturbing one value will affect another row with same column value), so to avoid that separate the table, for Ex. from orders table Sales People’s data is separated.

What is a Transitive Dependency? Let’s break this down with a simple example:
StudentID Department HODName
S001 IT Dr. Rajan
S002 CS Dr. Priya

Primary Key: StudentID
Non-prime attributes: Department, HODName

StudentID β†’ Department (StudentID determines the department).
Department β†’ HODName (Department determines the HOD name). It should be like StudentID only should determine HOD, not the dept. HODName depends indirectly on StudentID through Department.

This is a transitive dependency, and we need to remove it.

A transitive dependency means a non-prime attribute (not part of the candidate key) depends indirectly on the primary key through another non-prime attribute.

Reference: https://www.youtube.com/watch?v=rBPQ5fg_kiY and Learning with the help of chatGPT

❌
❌