❌

Normal view

There are new articles available, click to refresh the page.
Yesterday β€” 21 November 2024Main stream

Locust ep 5: How to use test_start and test_stop Events in Locust

21 November 2024 at 04:30

Locust provides powerful event hooks, such as test_start and test_stop, to execute custom logic before and after a load test begins or ends. These events allow you to implement setup and teardown operations at the test level, which applies to the entire test run rather than individual users.

In this blog, we will

  1. Understand what test_start and test_stop are.
  2. Explore their use cases.
  3. Provide examples of implementing these events.
  4. Discuss how to run and validate the setup.

What Are test_start and test_stop?

  • test_start: Triggered when the test starts. Use this event to perform actions like initializing global resources, starting external systems, or logging test start information.
  • test_stop: Triggered when the test ends. This event is ideal for cleanup operations, aggregating results, or stopping external systems.

These events are global and apply to the entire test environment rather than individual user instances.

Why Use test_start and test_stop?

  • Global Setup: Initialize shared resources, like database connections or external services.
  • Logging: Record timestamps or test details for audit or reporting purposes.
  • External System Management: Start/stop services that the test depends on, such as mock servers or third-party APIs.

Example: Basic Usage of test_start and test_stop

Here’s a basic example demonstrating the usage of these events


from locust import User, task, between, events
from datetime import datetime

# Global setup: Perform actions at test start
@events.test_start.add_listener
def on_test_start(environment, **kwargs):
    print("Test started at:", datetime.now())

# Global teardown: Perform actions at test stop
@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
    print("Test stopped at:", datetime.now())

# Simulated user behavior
class MyUser(User):
    wait_time = between(1, 5)

    @task
    def print_datetime(self):
        """Task that prints the current datetime."""
        print("Current datetime:", datetime.now())

Running the Example

  • Save the code as locustfile.py.
  • Start Locust -> `locust -f locustfile.py`
  • Configure the test parameters (number of users, spawn rate, etc.) in the web UI at http://localhost:8089.
  • Observe the console output:
    • A message when the test starts (on_test_start).
    • Messages during the test as users execute tasks.
    • A message when the test stops (on_test_stop).

Example: Logging Test Details

You can log detailed test information, like the number of users and host under test, using environment and kwargs


from locust import User, task, between, events

@events.test_start.add_listener
def on_test_start(environment, **kwargs):
    print("Test started!")
    print(f"Target host: {environment.host}")
    print(f"Total users: {environment.runner.target_user_count}")

@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
    print("Test finished!")
    print("Summary:")
    print(f"Requests completed: {environment.stats.total.num_requests}")
    print(f"Failures: {environment.stats.total.num_failures}")

class MyUser(User):
    wait_time = between(1, 5)

    @task
    def dummy_task(self):
        pass

Observing the Results

When you run the above examples

  • At Test Start: Look for messages indicating setup actions, like initializing external systems or printing start time.
  • During the Test: Observe user tasks being executed.
  • At Test Stop: Verify that cleanup actions were executed successfully.

Before yesterdayMain stream

Locust ep 4: Why on_start and on_stop are Essential for Locust Users

19 November 2024 at 04:30

Locust provides two special methods, on_start and on_stop, to handle setup and teardown actions for individual users. These methods allow you to execute specific code when a simulated user starts or stops, making it easier to simulate real-world scenarios like login/logout or initialization tasks.

In this blog, we’ll cover,

  1. What on_start and on_stop do.
  2. Why they are important.
  3. Practical examples of using these methods.
  4. Running and testing Locust scripts.

What Are on_start and on_stop?

  • on_start: This method is executed once when a new simulated user starts. It’s commonly used for tasks like logging in or setting up the environment.
  • on_stop: This method is executed once when a simulated user stops. It’s often used for cleanup tasks like logging out.

These methods are executed only once per user during the lifecycle of a test, as opposed to tasks that are run repeatedly.

Why Use on_start and on_stop?

  1. Simulating Real User Behavior: Real users often start a session with an action (e.g., login) and end it with another (e.g., logout).
  2. Initial Setup: Some tasks require initializing data or setting up user state before performing other actions.
  3. Cleanup: Ensure that actions like logout are performed to leave the system in a clean state.

Examples

Basic Usage of on_start and on_stop

In this example, we just print on start and `on stop` for each user while running a task.


from locust import User, task, between, constant, constant_pacing
from datetime import datetime


class MyUser(User):

    wait_time = between(1, 5)

    def on_start(self):
        print("on start")

    def on_stop(self):
        print("on stop")

    @task
    def print_datetime(self):
        print(datetime.now())

Locust EP 3: Simulating Multiple User Types in Locust

18 November 2024 at 04:30

Locust allows you to define multiple user types in your load tests, enabling you to simulate different user behaviors and traffic patterns. This is particularly useful when your application serves diverse client types, such as web and mobile users, each with unique interaction patterns.

In this blog, we will

  1. Discuss the concept of multiple user types in Locust.
  2. Explore how to implement multiple user classes with weights.
  3. Run and analyze the test results.

Why Use Multiple User Types?

In real-world applications, different user groups interact with your system differently. For example,

  • Web Users might spend more time browsing through the UI.
  • Mobile Users could make faster but more frequent requests.

By simulating distinct user types with varying behaviors, you can identify performance bottlenecks across all client groups.

Understanding User Classes and Weights

Locust provides the ability to define user classes by extending the User or HttpUser base class. Each user class can,

  • Have a unique set of tasks.
  • Define its own wait times.
  • Be assigned a weight, which determines the proportion of that user type in the simulation.

For example, if WebUser has a weight of 1 and MobileUser has a weight of 2, the simulation will spawn 1 web user for every 2 mobile users.

Example: Simulating Web and Mobile Users

Below is an example Locust test with two user types


from locust import User, task, between

# Define a user class for web users
class MyWebUser(User):
    wait_time = between(1, 3)  # Web users wait between 1 and 3 seconds between tasks
    weight = 1  # Web users are less frequent

    @task
    def login_url(self):
        print("I am logging in as a Web User")


# Define a user class for mobile users
class MyMobileUser(User):
    wait_time = between(1, 3)  # Mobile users wait between 1 and 3 seconds
    weight = 2  # Mobile users are more frequent

    @task
    def login_url(self):
        print("I am logging in as a Mobile User")

How Locust Uses Weights

With the above configuration

  • For every 3 users spawned, 1 will be a Web User, and 2 will be Mobile Users (based on their weights: 1 and 2).

Locust automatically handles spawning these users in the specified ratio.

Running the Locust Test

  1. Save the Code
    Save the above code in a file named locustfile.py.
  2. Start Locust
    Open your terminal and run `locust -f locustfile.py`
  3. Access the Web UI
  4. Enter Test Parameters
    • Number of users (e.g., 30).
    • Spawn rate (e.g., 5 users per second).
    • Host: If you are testing an actual API or website, specify its URL (e.g., http://localhost:8000).
  5. Analyze Results
    • Observe how Locust spawns the users according to their weights and tracks metrics like request counts and response times.

After running the test:

  • Check the distribution of requests to ensure it matches the weight ratio (e.g., for every 1 web user request, there should be ~3 mobile user requests).
  • Use the metrics (response time, failure rate) to evaluate performance for each user type.

Locust EP 2: Understanding Locust Wait Times with Complete Examples

17 November 2024 at 07:43

Locust is an excellent load testing tool, enabling developers to simulate concurrent user traffic on their applications. One of its powerful features is wait times, which simulate the realistic user think time between consecutive tasks. By customizing wait times, you can emulate user behavior more effectively, making your tests reflect actual usage patterns.

In this blog, we’ll cover,

  1. What wait times are in Locust.
  2. Built-in wait time options.
  3. Creating custom wait times.
  4. A full example with instructions to run the test.

What Are Wait Times in Locust?

In real-world scenarios, users don’t interact with applications continuously. After performing an action (e.g., submitting a form), they often pause before the next action. This pause is called a wait time in Locust, and it plays a crucial role in mimicking real-life user behavior.

Locust provides several ways to define these wait times within your test scenarios.

FastAPI App Overview

Here’s the FastAPI app that we’ll test,


from fastapi import FastAPI

# Create a FastAPI app instance
app = FastAPI()

# Define a route with a GET method
@app.get("/")
def read_root():
    return {"message": "Welcome to FastAPI!"}

@app.get("/items/{item_id}")
def read_item(item_id: int, q: str = None):
    return {"item_id": item_id, "q": q}

Locust Examples for FastAPI

1. Constant Wait Time Example

Here, we’ll simulate constant pauses between user requests


from locust import HttpUser, task, constant

class FastAPIUser(HttpUser):
    wait_time = constant(2)  # Wait for 2 seconds between requests

    @task
    def get_root(self):
        self.client.get("/")  # Simulates a GET request to the root endpoint

    @task
    def get_item(self):
        self.client.get("/items/42?q=test")  # Simulates a GET request with path and query parameters

2. Between wait time Example

Simulating random pauses between requests.


from locust import HttpUser, task, between

class FastAPIUser(HttpUser):
    wait_time = between(1, 5)  # Random wait time between 1 and 5 seconds

    @task(3)  # Weighted task: this runs 3 times more often
    def get_root(self):
        self.client.get("/")

    @task(1)
    def get_item(self):
        self.client.get("/items/10?q=locust")

3. Custom Wait Time Example

Using a custom wait time function to introduce more complex user behavior


import random
from locust import HttpUser, task

def custom_wait():
    return max(1, random.normalvariate(3, 1))  # Normal distribution (mean: 3s, stddev: 1s)

class FastAPIUser(HttpUser):
    wait_time = custom_wait

    @task
    def get_root(self):
        self.client.get("/")

    @task
    def get_item(self):
        self.client.get("/items/99?q=custom")


Full Test Example

Combining all the above elements, here’s a complete Locust test for your FastAPI app.


from locust import HttpUser, task, between
import random

# Custom wait time function
def custom_wait():
    return max(1, random.uniform(1, 3))  # Random wait time between 1 and 3 seconds

class FastAPIUser(HttpUser):
    wait_time = custom_wait  # Use the custom wait time

    @task(3)
    def browse_homepage(self):
        """Simulates browsing the root endpoint."""
        self.client.get("/")

    @task(1)
    def browse_item(self):
        """Simulates fetching an item with ID and query parameter."""
        item_id = random.randint(1, 100)
        self.client.get(f"/items/{item_id}?q=test")

Running Locust for FastAPI

  1. Run Your FastAPI App
    Save the FastAPI app code in a file (e.g., main.py) and start the server

uvicorn main:app --reload

By default, the app will run on http://127.0.0.1:8000.

2. Run Locust
Save the Locust file as locustfile.py and start Locust.


locust -f locustfile.py

3. Configure Locust
Open http://localhost:8089 in your browser and enter:

  • Host: http://127.0.0.1:8000
  • Number of users and spawn rate based on your testing requirements.

4. Run in Headless Mode (Optional)
Use the following command to run Locust in headless mode


locust -f locustfile.py --headless -u 50 -r 10 --host http://127.0.0.1:8000`

-u 50: Simulate 50 users.

-r 10: Spawn 10 users per second.

Postgres – Write-Ahead Logging (WAL) in PostgreSQL

16 November 2024 at 07:06

Write-Ahead Logging (WAL) is a fundamental feature of PostgreSQL, ensuring data integrity and facilitating critical functionalities like crash recovery, replication, and backup.

This series of experimentation explores WAL in detail, its importance, how it works, and provides examples to demonstrate its usage.

What is Write-Ahead Logging (WAL)?

WAL is a logging mechanism where changes to the database are first written to a log file before being applied to the actual data files. This ensures that in case of a crash or unexpected failure, the database can recover and replay these logs to restore its state.

Your question is right !

Why do we need a WAL, when we do a periodic backup ?

Write-Ahead Logging (WAL) is critical even when periodic backups are in place because it complements backups to provide data consistency, durability, and flexibility in the following scenarios.

1. Crash Recovery

  • Why It’s Important: Periodic backups only capture the database state at specific intervals. If a crash occurs after the latest backup, all changes made since that backup would be lost.
  • Role of WAL: WAL ensures that any committed transactions not yet written to data files (due to PostgreSQL’s lazy-writing behavior) are recoverable. During recovery, PostgreSQL replays the WAL logs to restore the database to its last consistent state, bridging the gap between the last checkpoint and the crash.

Example:

  • Backup Taken: At 12:00 PM.
  • Crash Occurs: At 1:30 PM.
  • Without WAL: All changes after 12:00 PM are lost.
  • With WAL: All changes up to 1:30 PM are recovered.

2. Point-in-Time Recovery (PITR)

  • Why It’s Important: Periodic backups restore the database to the exact time of the backup. However, this may not be sufficient if you need to recover to a specific point, such as just before a mistake (e.g., accidental data deletion).
  • Role of WAL: WAL records every change, enabling you to replay transactions up to a specific time. This allows fine-grained recovery beyond what periodic backups can provide.

Example:

  • Backup Taken: At 12:00 AM.
  • Mistake Made: At 9:45 AM, an important table is accidentally dropped.
  • Without WAL: Restore only to 12:00 AM, losing 9 hours and 45 minutes of data.
  • With WAL: Restore to 9:44 AM, recovering all valid changes except the accidental drop.

3. Replication and High Availability

  • Why It’s Important: In a high-availability setup, replicas must stay synchronized with the primary database to handle failovers. Periodic backups cannot provide real-time synchronization.
  • Role of WAL: WAL enables streaming replication by transmitting logs to replicas, ensuring near real-time synchronization.

Example:

  • A primary database sends WAL logs to replicas as changes occur. If the primary fails, a replica can quickly take over without data loss.

4. Handling Incremental Changes

  • Why It’s Important: Periodic backups store complete snapshots of the database, which can be time-consuming and resource-intensive. They also do not capture intermediate changes.
  • Role of WAL: WAL allows incremental updates by recording only the changes made since the last backup or checkpoint. This is crucial for efficient data recovery and backup optimization.

5. Ensuring Data Durability

  • Why It’s Important: Even during normal operations, a database crash (e.g., power failure) can occur. Without WAL, transactions committed by users but not yet flushed to disk are lost.
  • Role of WAL: WAL ensures durability by logging all changes before acknowledging transaction commits. This guarantees that committed transactions are recoverable even if the system crashes before flushing the changes to data files.

6. Supporting Hot Backups

  • Why It’s Important: For large, active databases, taking a backup while the database is running can result in inconsistent snapshots.
  • Role of WAL: WAL ensures consistency by recording changes that occur during the backup process. When replayed, these logs synchronize the backup, ensuring it is valid and consistent.

7. Debugging and Auditing

  • Why It’s Important: Periodic backups are static snapshots and don’t provide a record of what happened in the database between backups.
  • Role of WAL: WAL contains a sequential record of all database modifications, which can help in debugging issues or auditing transactions.
FeaturePeriodic BackupsWrite-Ahead Logging
Crash RecoveryLimited to the last backupEnsures full recovery to the crash point
Point-in-Time RecoveryRestores only to the backup timeAllows recovery to any specific point
ReplicationNot supportedEnables real-time replication
EfficiencyFull snapshotIncremental changes
DurabilityRelies on backup frequencyGuarantees transaction durability

In upcoming sessions, we will all experiment each one of the failure scenarios for understanding.

PostgreSQL Tasks:

By: Sakthivel
4 September 2024 at 18:03

Table I created table name is β€œmovies”

TASK 1 : Show details for Vijay acted movies.

SELECT * FROM movies WHERE actor=’vijay’;


TASK 2 : Show Vijay movies names only.

SELECT movie FROM movies WHERE actor=’vijay’;


TASK 3 : Show details, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

SELECT * FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


TASK 4 : Show movie name only, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

SELECT movie FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


TASK 5 : Show movie name and actor name only, lokesh kanagaraj directed movie and actor name starts with β€˜k’ and ends with vowels[a,e,i,o,u].

SELECT movie,actor FROM movies WHERE director=’lokesh kanagaraj’ AND actor LIKE β€˜k%’ AND actor SIMILAR TO β€˜%[aeiou]’;


TASK 6 : Show the details, director name starts and ends with same letter.

SELECT * FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


TASK 7 : Show the movie name only, the director name starts and ends with same letter.

SELECT movie FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


TASK 8 : Show the director name only, the director name start and ends with same letter.

SELECT director FROM movies WHERE LEFT(director, 1)=RIGHT(director, 1);


TASK 9 : Show the movie name, the actors have only five character.

SELECT movie FROM movies WHERE LENGTH(actor)=5;


TASK 10 : Show the movie name and actors name, actors have only five character.

SELECT movie,actor FROM movies WHERE LENGTH(actor)=5;

TASK 11 : Add a column salary_in_crore and values.

ALTER TABLE movies ADD COLUMN salary_in_crore INT;

UPDATE movies SET salary_in_crore =100 WHERE actor = β€˜vijay’;

UPDATE movies SET salary_in_crore =70 WHERE actor = β€˜kamal’;

UPDATE movies SET salary_in_crore =90 WHERE actor = β€˜ajith’;

UPDATE movies SET salary_in_crore =40 WHERE actor = β€˜karthi’;

UPDATE movies SET salary_in_crore =110 WHERE actor = β€˜rajini’;

UPDATE movies SET salary_in_crore =50 WHERE actor = β€˜dhanush’;

UPDATE movies SET salary_in_crore =5 WHERE actor = β€˜soori’;

SELECT * FROM movies;


TASK 12 : Show actor name and salary,order by salary high to low.

SELECT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC;


TASK 13 : Show top 3 actor based on salary.

SELECT DISTINCT actor, salary_in_crore FROM movies ORDER BY salary_in_crore DESC LIMIT 3;


To import a .CSV/Excel file to PostgreSQL table:

By: Sakthivel
4 September 2024 at 15:11

Insert values From a .csv file

Step 1: Create a .csv or .xlsx file

Step 2: Insert values from a .csv or .xlsx file

Step 3: Copy the file and paste in to the location where pSQL installed:

-> β€œ/var/lib/postgresql” and paste the file

Step 4: Go to the terminal and connect the database.

Step 5: Create table in the database with the columns given in the file header.

Step 6: Enter the follwing query in psql


-> COPY table_name FROM β€˜/var/lib/postgresql/fine_name.csv’ WITH CSV HEADER;


Here, β€œmovies” is the table name and the location where the β€œmovies.csv” file is present.

9 values copied

Step 7: Then, write a select query to fetch all the rows.

successfully import the datas in to postgres database from csv or excel file


Different Database Models

23 August 2024 at 01:50

Database models define the structure, relationships, and operations that can be performed on a database. Different database models are used based on the specific needs of an application or organization. Here are the most common types of database models:

1. Hierarchical Database Model

  • Structure: Data is organized in a tree-like structure with a single root, where each record has a single parent but can have multiple children.
  • Usage: Best for applications with a clear hierarchical relationship, like organizational structures or file systems.
  • Example: IBM’s Information Management System (IMS).
  • Advantages: Fast access to data through parent-child relationships.
  • Disadvantages: Rigid structure; difficult to reorganize or restructure.

2. Network Database Model

  • Structure: Data is organized in a graph structure, where each record can have multiple parent and child records, forming a network of relationships.
  • Usage: Useful for complex relationships, such as in telecommunications or transportation networks.
  • Example: Integrated Data Store (IDS).
  • Advantages: Flexible representation of complex relationships.
  • Disadvantages: Complex design and navigation; can be difficult to maintain.

3. Relational Database Model

  • Structure: Data is organized into tables (relations) where each table consists of rows (records) and columns (fields). Relationships between tables are managed through keys.
  • Usage: Widely used in various applications, including finance, retail, and enterprise software.
  • Example: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.
  • Advantages: Simplicity, data integrity, flexibility in querying through SQL.
  • Disadvantages: Can be slower for very large datasets or highly complex queries.

4. Object-Oriented Database Model

  • Structure: Data is stored as objects, similar to objects in object-oriented programming. Each object contains both data and methods for processing the data.
  • Usage: Suitable for applications that require the modeling of complex data and relationships, such as CAD, CAM, and multimedia databases.
  • Example: db4o, ObjectDB.
  • Advantages: Seamless integration with object-oriented programming languages, reusability of objects.
  • Disadvantages: Complexity, not as widely adopted as relational databases.

5. Document-Oriented Database Model

  • Structure: Data is stored in document collections, with each document being a self-contained piece of data often in JSON, BSON, or XML format.
  • Usage: Ideal for content management systems, real-time analytics, and big data applications.
  • Example: MongoDB, CouchDB.
  • Advantages: Flexible schema design, scalability, ease of storing hierarchical data.
  • Disadvantages: May require denormalization, leading to potential data redundancy.

6. Key-Value Database Model

  • Structure: Data is stored as key-value pairs, where each key is unique, and the value can be a string, number, or more complex data structure.
  • Usage: Best for applications requiring fast access to simple data, such as caching, session management, and real-time analytics.
  • Example: Redis, DynamoDB, Riak.
  • Advantages: High performance, simplicity, scalability.
  • Disadvantages: Limited querying capabilities, lack of complex relationships.

7. Column-Family Database Model

  • Structure: Data is stored in columns rather than rows, with each column family containing a set of columns that are logically related.
  • Usage: Suitable for distributed databases, handling large volumes of data across multiple servers.
  • Example: Apache Cassandra, HBase.
  • Advantages: High write and read performance, efficient storage of sparse data.
  • Disadvantages: Complexity in design and maintenance, not as flexible for ad-hoc queries.

8. Graph Database Model

  • Structure: Data is stored as nodes (entities) and edges (relationships) forming a graph. Each node represents an object, and edges represent the relationships between objects.
  • Usage: Ideal for social networks, recommendation engines, fraud detection, and any scenario where relationships between entities are crucial.
  • Example: Neo4j, Amazon Neptune.
  • Advantages: Efficient traversal and querying of complex relationships, flexible schema.
  • Disadvantages: Not as efficient for operations on large sets of unrelated data.

9. Multimodel Database

  • Structure: Supports multiple data models (e.g., relational, document, graph) within a single database engine.
  • Usage: Useful for applications that require different types of data storage and querying mechanisms.
  • Example: ArangoDB, Microsoft Azure Cosmos DB.
  • Advantages: Flexibility, ability to handle diverse data requirements within a single system.
  • Disadvantages: Complexity in management and optimization.

10. Time-Series Database Model

  • Structure: Specifically designed to handle time-series data, where each record is associated with a timestamp.
  • Usage: Best for applications like monitoring, logging, and real-time analytics where data changes over time.
  • Example: InfluxDB, TimescaleDB.
  • Advantages: Optimized for handling and querying large volumes of time-stamped data.
  • Disadvantages: Limited use cases outside of time-series data.

11. NoSQL Database Model

  • Structure: An umbrella term for various non-relational database models, including key-value, document, column-family, and graph databases.
  • Usage: Ideal for handling unstructured or semi-structured data, and scenarios requiring high scalability and flexibility.
  • Example: MongoDB, Cassandra, Couchbase, Neo4j.
  • Advantages: Flexibility, scalability, high performance for specific use cases.
  • Disadvantages: Lack of standardization, potential data consistency challenges.

Summary

Each database model serves different purposes, and the choice of model depends on the specific requirements of the application, such as data structure, relationships, performance needs, and scalability. While relational databases are still the most widely used, NoSQL and specialized databases have become increasingly important for handling diverse data types and large-scale applications.

Postgres Ep 2 : Amutha Hotel and Issues with Flat Files

23 August 2024 at 01:29

Once upon a time in ooty, there was a small business called β€œAmutha Hotel,” run by a passionate baker named Saravanan. Saravanan bakery was famous for its delicious sambar, and as his customer base grew, he needed to keep track of orders, customer information, and inventory.

Being a techie, he decided to store all this information in a flat file a simple spreadsheet named β€œHotelData.csv.”

The Early Days: Simple and Sweet

At first, everything was easy. Saravanan’s flat file had only a few columns, OrderID, CustomerName, Product, Quantity, and Price. Each row represented a new order, and it was simple enough to manage. Saravanan could quickly find orders, calculate totals, and even check his inventory by filtering the file.

The Business Grows: Complexity Creeps In

As the business boomed, Saravanan started offering new products, special discounts, and loyalty programs. He added more columns to her flat file, like Discount, LoyaltyPoints, and DeliveryAddress. He once-simple file began to swell with information.

Then, Saravanan decided to start tracking customer preferences and order history. He began adding multiple rows for the same customer, each representing a different order. His flat file now had repeating groups of data for each customer, and it became harder and harder to find the information he needed.

His flat file was getting out of hand. For every new order from a returning customer, he had to re-enter all their information

CustomerName, DeliveryAddress, LoyaltyPoints

over and over again. This duplication wasn’t just tedious; it started to cause mistakes. One day, he accidentally typed β€œJohn Smyth” instead of β€œJohn Smith,” and suddenly, his loyal customer was split into two different entries.

On a Busy Saturday

One busy Saturday, Saravanan opened his flat file to update the day’s orders, but instead of popping up instantly as it used to, it took several minutes to load. As he scrolled through the endless rows, his computer started to lag, and the spreadsheet software even crashed a few times. The file had become too large and cumbersome for him to handle efficiently.

Customers were waiting longer for their orders to be processed because Saravanan was struggling to find their previous details and apply the right discounts. The flat file that once served his so well was now slowing her down, and it was affecting her business.

The Journaling

Techie Saravanan started to note these issues in to a notepad. He badly wants a solution which will solve these problems. So he started listing out the problems with examples to look for a solution.

His journal continues …

Before databases became common for data storage, flat files (such as CSVs or text files) were often used to store and manage data. The data file that we use has no special structure; it’s just some lines of text that mean something to the particular application that reads it. It has no inherent structure

However, these flat files posed several challenges, particularly when dealing with repeating groups, which are essentially sets of related fields that repeat multiple times within a record. Here are some of the key problems associated with repeating groups in flat files,

1. Data Redundancy

  • Description: Repeating groups can lead to significant redundancy, as the same data might need to be repeated across multiple records.
  • Example: If an employee can have multiple skills, a flat file might need to repeat the employee’s name, ID, and other details for each skill.
  • Problem: This not only increases the file size but also makes data entry, updates, and deletions more prone to errors.

Eg: Suppose you are maintaining a flat file to track employees and their skills. Each employee can have multiple skills, which you store as repeating groups in the file.

EmployeeID, EmployeeName, Skill1, Skill2, Skill3, Skill4
1, John Doe, Python, SQL, Java, 
2, Jane Smith, Excel, PowerPoint, Python, SQL

If an employee has four skills, you need to add four columns (Skill1, Skill2, Skill3, Skill4). If an employee has more than four skills, you must either add more columns or create a new row with repeated employee details.

2. Data Inconsistency

  • Description: Repeating groups can lead to inconsistencies when data is updated.
  • Example: If an employee’s name changes, and it’s stored multiple times in different rows because of repeating skills, it’s easy for some instances to be updated while others are not.
  • Problem: This can lead to situations where the same employee is listed under different names or IDs in the same file.

Eg: Suppose you are maintaining a flat file to track employees and their skills. Each employee can have multiple skills, which you store as repeating groups in the file.

EmployeeID, EmployeeName, Skill1, Skill2, Skill3, Skill4
1, John Doe, Python, SQL, Java, 
2, Jane Smith, Excel, PowerPoint, Python, SQL

If John’s name changes to β€œJohn A. Doe,” you must manually update each occurrence of β€œJohn Doe” across all rows, which increases the chance of inconsistencies.

3. Difficulty in Querying

  • Description: Querying data in flat files with repeating groups can be cumbersome and inefficient.
  • Example: Extracting a list of unique employees with their respective skills requires complex scripting or manual processing.
  • Problem: Unlike relational databases, which use joins to simplify such queries, flat files require custom logic to manage and extract data, leading to slower processing and more potential for errors.

Eg: Suppose you are maintaining a flat file to track employees and their skills. Each employee can have multiple skills, which you store as repeating groups in the file.

EmployeeID, EmployeeName, Skill1, Skill2, Skill3, Skill4
1, John Doe, Python, SQL, Java, 
2, Jane Smith, Excel, PowerPoint, Python, SQL

Extracting a list of all employees proficient in β€œPython” requires you to search across multiple skill columns (Skill1, Skill2, etc.), which is cumbersome compared to a relational database where you can use a simple JOIN on a normalized EmployeeSkills table.

4. Limited Scalability

  • Description: Flat files do not scale well when the number of repeating groups or the size of the data grows.
  • Example: A file with multiple repeating fields can become extremely large and difficult to manage as the number of records increases.
  • Problem: This can lead to performance issues, such as slow read/write operations and difficulty in maintaining the file over time.

Eg: You are storing customer orders in a flat file where each customer can place multiple orders.

CustomerID, CustomerName, Order1ID, Order1Date, Order2ID, Order2Date, Order3ID, Order3Date
1001, Alice Brown, 5001, 2023-08-01, 5002, 2023-08-15, 
1002, Bob White, 5003, 2023-08-05, 

If Alice places more than three orders, you’ll need to add more columns (Order4ID, Order4Date, etc.), leading to an unwieldy file with many empty cells for customers with fewer orders.

5. Challenges in Data Integrity

  • Description: Ensuring data integrity in flat files with repeating groups is difficult.
  • Example: Enforcing rules like β€œan employee can only have unique skills” is nearly impossible in a flat file format.
  • Problem: This can result in duplicated or invalid data, which is hard to detect and correct without a database system.

Eg: You are storing customer orders in a flat file where each customer can place multiple orders.

CustomerID, CustomerName, Order1ID, Order1Date, Order2ID, Order2Date, Order3ID, Order3Date
1001, Alice Brown, 5001, 2023-08-01, 5002, 2023-08-15, 
1002, Bob White, 5003, 2023-08-05,

There’s no easy way to enforce that each order ID is unique and corresponds to the correct customer, which could lead to errors or duplicated orders.

6. Complex File Formats

  • Description: Managing and processing flat files with repeating groups often requires complex file formats.
  • Example: Custom delimiters or nested formats might be needed to handle repeating groups, making the file harder to understand and work with.
  • Problem: This increases the likelihood of errors during data entry, processing, or when the file is read by different systems.

Eg: You are storing customer orders in a flat file where each customer can place multiple orders.

CustomerID, CustomerName, Order1ID, Order1Date, Order2ID, Order2Date, Order3ID, Order3Date
1001, Alice Brown, 5001, 2023-08-01, 5002, 2023-08-15, 
1002, Bob White, 5003, 2023-08-05, 

As the number of orders grows, the file format becomes increasingly complex, requiring custom scripts to manage and extract order data for each customer.

7. Lack of Referential Integrity

  • Description: Flat files lack mechanisms to enforce referential integrity between related groups of data.
  • Example: Ensuring that a skill listed in one file corresponds to a valid skill ID in another file requires manual checks or complex logic.
  • Problem: This can lead to orphaned records or mismatches between related data sets.

Eg: A fleet management company tracks maintenance records for each vehicle in a flat file. Each vehicle can have multiple maintenance records.

VehicleID, VehicleType, Maintenance1Date, Maintenance1Type, Maintenance2Date, Maintenance2Type
V001, Truck, 2023-01-15, Oil Change, 2023-03-10, Tire Rotation
V002, Van, 2023-02-20, Brake Inspection, , 

There’s no way to ensure that the Maintenance1Type and Maintenance2Type fields are valid maintenance types or that the dates are in correct chronological order.

8. Difficulty in Data Modification

  • Description: Modifying data in flat files with repeating groups can be complex and error-prone.
  • Example: Adding or removing an item from a repeating group might require extensive manual edits across multiple records.
  • Problem: This increases the risk of errors and makes data management time-consuming.

Eg: A university maintains a flat file to record student enrollments in courses. Each student can enroll in multiple courses.

StudentID, StudentName, Course1, Course2, Course3, Course4, Course5
2001, Charlie Green, Math101, Physics102, , , 
2002, Dana Blue, History101, Math101, Chemistry101, , 

If a student drops a course or switches to a different one, manually editing the file can easily lead to errors, especially as the number of students and courses increases.

After listing down all these, Saravanan started looking into solutions. His search goes on…

What is Relational Database and Postgres Sql ?

21 August 2024 at 01:56

In the city of Data, the citizens relied heavily on organizing their information. The city was home to many different types of data numbers, names, addresses, and even some exotic types like images and documents. But as the city grew, so did the complexity of managing all this information.

One day, the city’s leaders called a meeting to discuss how best to handle the growing data. They were split between two different systems

  1. the old and trusted Relational Database Management System (RDBMS)
  2. the new, flashy NoSQL databases.

Enter Relational Databases:

Relational databases were like the city’s libraries. They had rows of neatly organized shelves (tables) where every book (data entry) was placed according to a specific category (columns).

Each book had a unique ID (primary key) so that anyone could find it quickly. These libraries had been around for decades, and everyone knew how to use them.

The RDBMS was more than just a library. It enforced rules (constraints) to ensure that no book went missing, was duplicated, or misplaced. It even allowed librarians (queries) to connect different books using relationships (joins).

If you wanted to find all the books by a particular author that were published in the last five years, the RDBMS could do it in a heartbeat.

The Benefits of RDBMS:

The citizens loved the RDBMS because it was:

  1. Organized: Everything was in its place, and data was easy to find.
  2. Reliable: The rules ensured data integrity, so they didn’t have to worry about inconsistencies.
  3. Powerful: It could handle complex queries, making it easy to get insights from their data.
  4. Secure: Access to the data could be controlled, keeping it safe from unauthorized users.

The Rise of NoSQL:

But then came the NoSQL databases, which were more like vast, sprawling warehouses. These warehouses didn’t care much about organization; they just stored everything in a big open space. You could toss in anything, and it would accept itβ€”no need for strict categories or relationships. This flexibility appealed to the tech-savvy citizens who wanted to store newer, more diverse types of data like social media posts, images, and videos.

NoSQL warehouses were fast. They could handle enormous amounts of data without breaking a sweat and were perfect for real-time applications like chat systems and analytics.

The PostgreSQL Advantage:

PostgreSQL was a superstar in the world of RDBMS. It combined the organization and reliability of traditional relational databases with some of the flexibility of NoSQL. It allowed citizens to store structured data in tables while also offering support for unstructured data types like JSON. This made PostgreSQL a versatile choice, bridging the gap between the old and new worlds.

For installing postgres : https://www.postgresql.org/download/

The Dilemma: PostgreSQL vs. NoSQL:

The city faced a dilemma. Should they stick with PostgreSQL, which offered the best of both worlds, or fully embrace NoSQL for its speed and flexibility? The answer wasn’t simple. It depended on what the city valued more: the structured, reliable nature of PostgreSQL or the unstructured, flexible approach of NoSQL.

For applications that required strict data integrity and complex queries, PostgreSQL was the way to go. But for projects that needed to handle massive amounts of unstructured data quickly, NoSQL was the better choice.

Conclusion:

In the end, the city of Data realized that there was no one-size-fits-all solution. They decided to use PostgreSQL for applications where data relationships and integrity were crucial, and NoSQL for those that required speed and flexibility with diverse data types.

And so, the citizens of Data lived happily, managing their information with the right tools for the right tasks, knowing that both systems had their place in the ever-growing city.

connect Grafana to MySQL

13 July 2023 at 10:23

log in to remote mysql sever
change bind address to 0.0.0.0
# vim /etc/mysql/mysql.conf.d/mysqld.cnf
from
bind-address = 127.0.0.1
to
bind-address = 0.0.0.0
:wq! save and exit

Log in to mysql sever
$ sudo mysql -u root -p
mysql> CREATE DATABASE mydb;
mysql> CREATE USER 'grafana'@'%' IDENTIFIED BY 'zha123';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'grafana'@'%';
mysql> flush privileges;
mysql> exit

In grafana server
go to data sources
select MySQLΒ  as data source
Name source name
Host : remote_mysql_server_ip:3306
database: dbname
username :Β  and password:
and give
save and test

Change MySQL Root Password in Ubuntu

11 July 2023 at 07:11

$ sudo systemctl stop mysql.service
$ sudo systemctl status mysql.service
$ sudo systemctl set-environment MYSQLD_OPTS="--skip-networking --skip-grant-tables"
$ sudo systemctl start mysql.service
$ sudo systemctl status mysql.service
$ sudo mysql -u root
> flush privileges;
> USE mysql
> ALTER USER 'root'@'localhost' IDENTIFIED BY 'zha123';
> quit;
$ sudo systemctl unset-environment MYSQLD_OPTS
$ sudo systemctl revert mysql
$ sudo killall -u mysql
$ sudo systemctl restart mysql.service
$ sudo mysql -u root -p

❌
❌