Basic SQL Queries, Stored Proc, Function in PostgreSQL
DDL, DML, DQL Queries:
CREATE TABLE Employees ( EmployeeID INTEGER PRIMARY KEY, Name VARCHAR(50), Age INTEGER, DepartmentID INTEGER, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) );
INSERT INTO Employees(empid, ename, age, deptid) VALUES(1, 'Kavi', 32, 101), (2, 'Sugi', 30, 102);
UPDATE Employees SET age=31 WHERE Name='Nila';
DELETE FROM Employees WHERE Name='Nila';
SELECT e.*, d.DepartmentName FROM Employees e JOIN Departments d ON e.DepartmentID = d.DepartmentID;
SELECT e.EmpName AS Employee, m.EmpName AS Manager FROM Employees e JOIN Employees m ON e.ManagerID = m.EmpID;
INNER JOIN
:
- Returns only the rows where there is a match between the columns in both tables.
- If no match is found, the row is not included in the result.
- Itβs the most common type of join.
OUTER JOIN
:
- Returns all rows from one or both tables, even if there is no match in the other table.
LEFT OUTER JOIN
(or justLEFT JOIN
): Returns all rows from the left table, and the matched rows from the right table. If no match, the result will haveNULL
values for columns from the right table.RIGHT OUTER JOIN
(or justRIGHT JOIN
): Returns all rows from the right table, and the matched rows from the left table. If no match, the result will haveNULL
values for columns from the left table.FULL OUTER JOIN
: Returns all rows from both tables. If there is no match, the result will haveNULL
values for the non-matching tableβs columns.
GROUP BY
:
- Groups rows that have the same values in specified columns into summary rows (like finding the total count, sum, average, etc.).
- It is typically used with aggregate functions such as
COUNT()
,SUM()
,AVG()
,MAX()
,MIN()
.
HAVING
:
- Used to filter records after the
GROUP BY
has been applied. - It works similarly to the
WHERE
clause, butWHERE
is used for filtering individual rows before grouping, whileHAVING
filters the grouped results.
SELECT DeptName, COUNT(*) FROM Employees GROUP BY DeptName;
DISTINCT
:
- Used to remove duplicate rows from the result set based on the specified columns.
- If you specify only one column, it will return the distinct values of that column.
- If you specify multiple columns, the combination of values in those columns will be considered to determine uniqueness.
SELECT DISTINCT DeptName FROM Employees; SELECT DISTINCT DeptName, EmpName FROM Employees;
Difference between DELETE
and TRUNCATE:
- Removes rows one by one and logs each deletion, which can be slower for large datasets.
- You can use a
WHERE
clause to specify which rows to delete. - Can be rolled back if youβre working within a transaction (assuming no
COMMIT
has been done). - Can fire triggers if there are any triggers defined on the table (for example,
BEFORE DELETE
orAFTER DELETE
triggers).
TRUNCATE
:
- Removes all rows in the table in one go, without scanning them individually.
- Does not support a
WHERE
clause, so it always deletes all rows. - Itβs much faster than
DELETE
because it doesnβt log individual row deletions (but it does log the deallocation of the tableβs data pages). - Cannot be rolled back in most databases (unless in a transaction, depending on the DBMS), and there are no triggers involved.
UNION
:
- Combines the results of two or more queries.
- Removes duplicates: Only unique rows are included in the final result.
- It performs a sort operation to eliminate duplicates, which can have a slight performance cost.
UNION ALL
:
- Also combines the results of two or more queries.
- Keeps duplicates: All rows from the queries are included in the final result, even if they are the same.
- It doesnβt perform the sort operation, which usually makes it faster than
UNION
.
SELECT EmpID, EmpName FROM Employees UNION ALL SELECT EmpID, EmpName FROM Contractors; SELECT EmpID, EmpName FROM Employees UNION SELECT EmpID, EmpName FROM Contractors;
COALESCE():
First Non null value will be taken, For ex. in select statement, some names are null, that time some default value can be used or another field value.
SELECT COALESCE(NULL, βHelloβ, βWorldβ);
Output: Hello
INSERT INTO users (name, nickname) VALUES
(βAliceβ, NULL),
(NULL, βBobβ),
(NULL, NULL);
SELECT id, COALESCE(name, nickname, βUnknownβ) AS display_name FROM users;
NULLIF()
NULLIF(expression1, expression2)
Returns null if both expressions or column values are equal, else return first the first column value, ie expression1
SELECT NULLIF(10, 10); β Output: NULL
SELECT NULLIF(10, 20); β Output: 10
SELECT NULLIF(10, NULL) OR β Output: 10
SELECT NULLIF(NULL, 10) β Output: NULL
IF Condition:
The IF statement is used to check conditions and execute SQL code accordingly.
IF condition THEN -- Code to execute if the condition is true ELSIF condition THEN -- Code block to execute if another condition is true ELSE -- Code to execute if the condition is false END IF; IF NOT FOUND THEN RAISE NOTICE 'Employee with ID % not found!', emp_id; emp_bonus := 0; END IF;
CASE WHEN:
The CASE WHEN expression is used for conditional logic within a query (similar to IF
but more flexible in SQL).
SELECT name, salary, CASE WHEN salary > 5000 THEN 'High Salary' WHEN salary BETWEEN 3000 AND 5000 THEN 'Average Salary' ELSE 'Low Salary' END AS salary_category FROM employees;
FOR LOOP:
DECLARE i INT; BEGIN FOR i IN 1..5 LOOP -- Perform an action for each iteration (e.g., insert or update a record) INSERT INTO audit_log (action, timestamp) VALUES ('Employee update', NOW()); END LOOP; END;
FOR record IN SELECT column1, column2 FROM employees LOOP -- Code block using record.column1, record.column2 END LOOP;
RAISE β used for printing something (SOP in java)
RAISE NOTICE βEmployee: %, Salary: %β, emp_name, emp_salary;
RAISE EXCEPTION βAn error occurred: %β, error_message; β This will print and halt the execution.
RAISE INFO βEmployee: %, Salary: %β, emp_name, emp_salary;
Stored Procedures in SQL:
A stored procedure is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. Stored procedures are used for:
- Modularizing complex SQL logic.
- Improving performance by reducing network traffic.
- Ensuring code reuse and security (by granting permissions to execute rather than to the tables directly).
Example:
A stored procedure to insert a new employee record:
CREATE PROCEDURE add_employee(emp_name VARCHAR, emp_salary NUMERIC) LANGUAGE plpgsql AS $$ BEGIN INSERT INTO employees (name, salary) VALUES (emp_name, emp_salary); END; $$;
Execution:
CALL add_employee(βJohn Doeβ, 50000);
Functions in SQL:
A SQL function is a reusable block of SQL code that performs specific tasks. It is stored in the database and can be called as needed. It is similar to a procedure but returns a single value or table. Functions are typically used for computations or transformations.
Example: A function to calculate the yearly salary:
CREATE FUNCTION calculate_yearly_salary(monthly_salary NUMERIC) RETURNS NUMERIC LANGUAGE plpgsql AS $$ BEGIN RETURN monthly_salary * 12; END; $$;
Execution:
SELECT calculate_yearly_salary(5000); OR EXECUTE calculate_yearly_salary(5000); (If we are using inside a trigger)
Key Differences Between Procedures and Functions:
Return Type:
- Function: Always returns a value.
- Procedure: Does not return a value.
Usage:
- Function: Can be used in SQL queries (e.g., SELECT).
- Procedure: Called using
CALL
, cannot be used in SQL queries.
Transaction Control:
- Function: Cannot manage transactions.
- Procedure: Can manage transactions (e.g., COMMIT, ROLLBACK).
Side Effects:
- Function: Should not have side effects (e.g., modifying data).
- Procedure: Can modify data and have side effects.
Calling Mechanism:
Procedure: Called using CALL procedure_name()
.
Function: Called within SQL expressions, like SELECT function_name()
.
TRIGGER:
A trigger is a special kind of stored procedure that automatically executes (or βfiresβ) when certain events occur in the database, such as INSERT, UPDATE, or DELETE. Triggers can be used to enforce business rules, validate data, or maintain audit logs.
Key Points:
Types of Triggers:
- BEFORE Trigger: Fires before the actual operation (INSERT, UPDATE, DELETE).
- AFTER Trigger: Fires after the actual operation.
- INSTEAD OF Trigger: Used to override the standard operation, useful in views. (This is in SQL Server only not in postgres)
- Trigger Actions: The trigger action can be an operation like logging data, updating related tables, or enforcing data integrity.
- Trigger Events: A trigger can be set to fire on certain events, such as when a row is inserted, updated, or deleted.
- Trigger Scope: Triggers can be defined to act on either a row (executing once for each affected row) or a statement (executing once for the entire statement).
- A trigger can be created to log changes in a Users table whenever a record is updated, or it could prevent deleting a record if certain conditions arenβt met.
Example:
CREATE TRIGGER LogEmployeeAgeUpdate AFTER UPDATE ON Employees FOR EACH ROW BEGIN IF OLD.Age <> NEW.Age THEN INSERT INTO EmployeeLogs (EmployeeID, OldAge, NewAge) VALUES (OLD.EmployeeID, OLD.Age, NEW.Age); END IF; END;
Example:
CREATE OR REPLACE FUNCTION prevent_employee_delete() RETURNS TRIGGER AS $$ BEGIN -- Check if the employee is in a protected department (for example, department_id = 10) IF OLD.department_id = 10 THEN RAISE EXCEPTION 'Cannot delete employee in department 10'; END IF; RETURN OLD; END; $$ LANGUAGE plpgsql; -- Attach the function to a trigger CREATE TRIGGER prevent_employee_delete_trigger BEFORE DELETE ON Employees FOR EACH ROW EXECUTE FUNCTION prevent_employee_delete();
Creates a trigger which is used to log age and related whenever insert, delete, update action on employee rows:
CREATE OR REPLACE FUNCTION log_employee_changes() RETURNS TRIGGER AS $$ BEGIN -- Handle INSERT operation IF (TG_OP = 'INSERT') THEN INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, NewAge, ChangeTime) VALUES (NEW.EmployeeID, 'INSERT', NEW.Age, CURRENT_TIMESTAMP); RETURN NEW; -- Handle UPDATE operation ELSIF (TG_OP = 'UPDATE') THEN INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, NewAge, ChangeTime) VALUES (OLD.EmployeeID, 'UPDATE', OLD.Age, NEW.Age, CURRENT_TIMESTAMP); RETURN NEW; -- Handle DELETE operation ELSIF (TG_OP = 'DELETE') THEN INSERT INTO EmployeeChangeLog (EmployeeID, OperationType, OldAge, ChangeTime) VALUES (OLD.EmployeeID, 'DELETE', OLD.Age, CURRENT_TIMESTAMP); RETURN OLD; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER log_employee_changes_trigger AFTER INSERT OR UPDATE OR DELETE ON Employees FOR EACH ROW EXECUTE FUNCTION log_employee_changes();
Step 3: Attach the Trigger to the Employees Table
Now that we have the function, we can attach it to the Employees table to log changes. Weβll create a trigger that fires on insert, update, and delete operations.
TG_OP: This is a special variable in PostgreSQL that holds the operation type (either INSERT, UPDATE, or DELETE).
NEW and OLD: These are references to the row being inserted or updated (NEW) or the row before it was updated or deleted (OLD).
EmployeeChangeLog: This table stores the details of the changes (employee ID, operation type, old and new values, timestamp). β Programmer defined.
What happens when you omit FOR EACH ROW
?
- Statement-Level Trigger: The trigger will fire once per SQL statement, regardless of how many rows are affected. This means it wonβt have access to the individual rows being modified.
- For example, if you run an
UPDATE
statement that affects 10 rows, the trigger will fire once (for the statement) rather than for each of those 10 rows.
- For example, if you run an
- No Access to Row-Specific Data: You wonβt be able to use
OLD
orNEW
values to capture the individual rowβs data. The trigger will just execute as a whole, without row-specific actions. - With
FOR EACH ROW
: The trigger works on each row affected, and you can track specific changes (e.g., old vs new values).WithoutFOR EACH ROW
: The trigger fires once per statement and doesnβt have access to specific row data.
CREATE TRIGGER LogEmployeeAgeUpdate AFTER UPDATE ON Employees BEGIN -- Perform some operation, but it won't track individual rows. INSERT INTO AuditLogs (EventDescription) VALUES ('Employees table updated'); END;
NORMALIZATION:
1st NF:
- Each column/attribute should have atomic value or indivisible value, ie only one value.
- Rows should not be repeated, ie unique rows, there is not necessary to have PKey here.
2nd NF:
- Must fulfill the 1st NF. [cadidate key(composite key to form the uniqueness)]
- All non-candidate-key columns should be fully dependent on the each attribute/column of the composite keys to form the cadidate key. For ex. If the DB is in denormalalized form (ie before normalization, all tables and values are together in a single table) and the candidate key is (orderId+ProductId), then the non-key(not part of the candidate key) if you take orderdate, orderedStatus, qty, item_price are not dependent on each part of the candidate key ie it depends only orderId, not ProductId, ProductName are not dependent on Order, like that customer details are not dependent on ProductId. So only related items should be there in a table, so the table is partitioned based on the column values, so that each attribute will depend on its candidate key.
So Products goto separate table, orders separate and customers going to separate table. - Primary key is created based for each separated table and ensure that all non-key columns completely dependent on the primary key. Then the foreign key relationships also established to connect all the tablesis not fullly dependent on.
3rd NF:
- Must fulfill till 2ndNF.
- Remove the transitional dependency (In a decentralized DB, One column value(Order ID) is functionally dependent on another column(Product ID) and OrderId is functionally dependent on the OrderId, so that disturbing one value will affect another row with same column value), so to avoid that separate the table, for Ex. from orders table Sales Peopleβs data is separated.
What is a Transitive Dependency? Letβs break this down with a simple example:
StudentID Department HODName
S001 IT Dr. Rajan
S002 CS Dr. Priya
Primary Key: StudentID
Non-prime attributes: Department, HODName
StudentID β Department (StudentID determines the department).
Department β HODName (Department determines the HOD name). It should be like StudentID only should determine HOD, not the dept. HODName depends indirectly on StudentID through Department.
This is a transitive dependency, and we need to remove it.
A transitive dependency means a non-prime attribute (not part of the candidate key) depends indirectly on the primary key through another non-prime attribute.
Reference: https://www.youtube.com/watch?v=rBPQ5fg_kiY and Learning with the help of chatGPT