Cron jobs are a fundamental part of automating tasks in Unix-based systems. However, one common problem with cron jobs is multiple executions, where overlapping job runs can cause serious issues like data corruption, race conditions, or unexpected system load.
In this blog, we’ll explore why multiple executions happen, the potential risks, and how flock provides an elegant solution to ensure that a cron job runs only once at a time.
The Problem: Multiple Executions of Cron Jobs
Cron jobs are scheduled to run at fixed intervals, but sometimes a new job instance starts before the previous one finishes.
This can happen due to
Long-running jobs: If a cron job takes longer than its interval, a new instance starts while the old one is still running.
System slowdowns: High CPU or memory usage can delay job execution, leading to overlapping runs.
Simultaneous executions across servers: In a distributed system, multiple servers might execute the same cron job, causing duplication.
Example of a Problematic Cron Job
Let’s say we have the following cron job that runs every minute:
* * * * * /path/to/script.sh
If script.sh takes more than a minute to execute, a second instance will start before the first one finishes.
This can lead to:
Duplicate database writes → Inconsistent data
Conflicts in file processing → Corrupt files
Overloaded system resources → Performance degradation
Real-World Example
Imagine a job that processes user invoices and sends emails
If the script takes longer than a minute to complete, multiple instances might start running, causing
Users to receive multiple invoices.
The database to get inconsistent updates.
Increased server load due to excessive email sending.
The Solution: Using flock to Prevent Multiple Executions
flock is a Linux utility that manages file locks to ensure that only one instance of a process runs at a time. It works by locking a specific file, preventing other processes from acquiring the same lock.
Open another terminal and try to run the same command. You’ll see that the second attempt exits immediately because the lock is already acquired.
Preventing multiple executions of cron jobs is essential for maintaining data consistency, system stability, and efficiency. By using flock, you can easily enforce single execution without complex logic.
Simple & efficient solution. No external dependencies required. Works seamlessly with cron jobs.
So next time you set up a cron job, add flock and sleep peacefully knowing your tasks won’t collide.
Load testing is essential to evaluate how a system behaves under expected and peak loads. Traditionally, we rely on metrics like requests per second (RPS), response time, and error rates. However, an insightful approach called Average Load Testing has been discussed recently. This blog explores that concept in detail, providing practical examples to help you apply it effectively.
Understanding Average Load Testing
Average Load Testing focuses on simulating real-world load patterns rather than traditional peak load tests. Instead of sending a fixed number of requests per second, this approach
Generates requests based on the average concurrency over time.
More accurately reflects real-world traffic patterns.
Helps identify performance bottlenecks in a realistic manner.
Setting Up Load Testing with K6
K6 is an excellent tool for implementing Average Load Testing. Let’s go through practical examples of setting up such tests.
The ramping-arrival-rate gradually increases requests per second over time.
The stages array defines a progression from 5 to 100 requests/sec over 6 minutes.
Logs response times to help analyze system performance.
Example 3: Load Testing with Multiple Endpoints
In real applications, multiple endpoints are often tested simultaneously. Here’s how to test different API routes
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
scenarios: {
multiple_endpoints: {
executor: 'constant-arrival-rate',
rate: 15, // 15 requests per second
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 30,
maxVUs: 60,
},
},
};
export default function () {
let urls = [
'https://test-api.example.com/users',
'https://test-api.example.com/orders',
'https://test-api.example.com/products'
];
let res = http.get(urls[Math.floor(Math.random() * urls.length)]);
check(res, {
'is status 200': (r) => r.status === 200,
});
console.log(`Response time: ${res.timings.duration}ms`);
sleep(1);
}
Explanation
The script randomly selects an API endpoint to test different routes.
Uses check to ensure status codes are 200.
Logs response times for deeper insights.
Analyzing Results
To analyze test results, you can store logs or metrics in a database or monitoring tool and visualize trends over time. Some popular options include
Prometheus for time-series data storage.
InfluxDB for handling large-scale performance metrics.
ELK Stack (Elasticsearch, Logstash, Kibana) for log-based analysis.
Average Load Testing provides a more realistic way to measure system performance. By leveraging K6, you can create flexible, real-world simulations to optimize your applications effectively.
The top command in Linux is a powerful utility that provides realtime information about system performance, including CPU usage, memory usage, running processes, and more.
It is an essential tool for system administrators to monitor system health and manage resources effectively.
1. Basic Usage
Simply running top without any arguments displays an interactive screen showing system statistics and a list of running processes:
$ top
2. Understanding the top Output
The top interface is divided into multiple sections
Header Section
This section provides an overview of the system status, including uptime, load averages, and system resource usage.
Uptime and Load Average – Displays how long the system has been running and the average system load over the last 1, 5, and 15 minutes.
Task Summary – Shows the number of processes in various states:
Running – Processes actively executing on the CPU.
Sleeping – Processes waiting for an event or resource.
Stopped – Processes that have been paused.
Zombie – Processes that have completed execution but still have an entry in the process table. These occur when the parent process has not yet read the exit status of the child process. Zombie processes do not consume system resources but can clutter the process table if not handled properly.
CPU Usage – Breaks down CPU utilization into different categories:
us (User Space) – CPU time spent on user processes.
sy (System Space) – CPU time spent on kernel operations.
id (Idle) – Time when the CPU is not being used.
wa (I/O Wait) – Time spent waiting for I/O operations to complete.
st (Steal Time) – CPU cycles stolen by a hypervisor in a virtualized environment.
Memory Usage – Shows the total, used, free, and available RAM.
Swap Usage – Displays total, used, and free swap memory, which is used when RAM is full.
Process Table
The table below the header lists active processes with details such as:
PID – Process ID, a unique identifier for each process.
USER – The owner of the process.
PR – Priority of the process, affecting its scheduling.
NI – Nice value, which determines how favorable the process scheduling is.
VIRT – The total virtual memory used by the process.
RES – The actual RAM used by the process.
SHR – The shared memory portion.
S – Process state:
R – Running
S – Sleeping
Z – Zombie
T – Stopped
%CPU – The percentage of CPU time used.
%MEM – The percentage of RAM used.
TIME+ – The total CPU time consumed by the process.
COMMAND – The command that started the process.
3. Interactive Commands
While running top, various keyboard shortcuts allow dynamic interaction:
q – Quit top.
h – Display help.
k – Kill a process by entering its PID.
r – Renice a process (change priority).
z – Toggle color/monochrome mode.
M – Sort by memory usage.
P – Sort by CPU usage.
T – Sort by process runtime.
1 – Toggle CPU usage breakdown for multi-core systems.
u – Filter processes by a specific user.
s – Change update interval.
4. Command-Line Options
The top command supports various options for customization:
-b (Batch mode): Used for scripting to display output in a non-interactive mode.$ top -b -n 1-n specifies the number of iterations before exit.
-o FIELD (Sort by a specific field):$ top -o %CPUSorts by CPU usage.
-d SECONDS (Refresh interval):$ top -d 3Updates the display every 3 seconds.
-u USERNAME (Show processes for a specific user):$ top -u john
-p PID (Monitor a specific process):$ top -p 1234
5. Customizing top Display
Persistent Customization
To save custom settings, press W while running top. This saves the configuration to ~/.toprc.
Changing Column Layout
Press f to toggle the fields displayed.
Press o to change sorting order.
Press X to highlight sorted columns.
6. Alternative to top: htop, btop
For a more user-friendly experience, htop is an alternative:
Today, i learned about bulk head pattern and how it makes the system resilient to failure, resource exhaustion. In this blog i jot down notes on this pattern for better understanding.
In today’s world of distributed systems and microservices, resiliency is key to ensuring applications are robust and can withstand failures.
The Bulkhead Pattern is a design principle used to improve system resilience by isolating different parts of a system to prevent failure in one component from cascading to others.
What is the Bulkhead Pattern?
The term “bulkhead” originates from shipbuilding, where bulkheads are partitions that divide a ship into separate compartments. If one compartment is breached, the others remain intact, preventing the entire ship from sinking. Similarly, in software design, the Bulkhead Pattern isolates components or services so that a failure in one part does not bring down the entire system.
In software systems, bulkheads:
Isolate resources (e.g., threads, database connections, or network calls) for different components.
Limit the scope of failures.
Allow other parts of the system to continue functioning even if one part is degraded or completely unavailable.
Example
Consider an e-commerce application with a product-service that has two endpoints
/product/{id} – This endpoint gives detailed information about a specific product, including ratings and reviews. It depends on the rating-service.
/products – This endpoint provides a catalog of products based on search criteria. It does not depend on any external services.
Consider, with a fixed amount of resource allocated to product-service is loaded with /product/{id} calls, then they can monopolize the thread pool. This delays /products requests, causing users to experience slowness even though these requests are independent. Which leads to resource exhaustion and failures.
With bulkhead pattern, we can allocate separate client, connection pools to isolate the service interaction. we can implement bulkhead by allocating some connection pool (10) to /product/{id} requests and /products requests have a different connection pool (5) .
Even if /product/{id} requests are slow or encounter high traffic, /products requests remain unaffected.
Scenarios Where the Bulkhead Pattern is Needed
Microservices with Shared Resources – In a microservices architecture, multiple services might share limited resources such as database connections or threads. If one service experiences a surge in traffic or a failure, it can exhaust these shared resources, impacting all other services. Bulkheading ensures each service gets a dedicated pool of resources, isolating the impact of failures.
Prioritizing Critical Workloads – In systems with mixed workloads (e.g., processing user transactions and generating reports), critical operations like transaction processing must not be delayed or blocked by less critical tasks. Bulkheading allocates separate resources to ensure critical tasks have priority.
Third-Party API Integration – When an application depends on multiple external APIs, one slow or failing API can delay the entire application if not isolated. Using bulkheads ensures that issues with one API do not affect interactions with others.
Multi-Tenant Systems – In SaaS applications serving multiple tenants, a single tenant’s high resource consumption or failure should not degrade the experience for others. Bulkheads can segregate resources per tenant to maintain service quality.
Cloud-Native Applications – In cloud environments, services often scale independently. A spike in one service’s load should not overwhelm shared backend systems. Bulkheads help isolate and manage these spikes.
Event-Driven Systems – In event-driven architectures with message queues, processing backlogs for one type of event can delay others. By applying the Bulkhead Pattern, separate processing pipelines can handle different event types independently.
What are the Key Points of the Bulkhead Pattern? (Simplified)
Define Partitions – (Think of a ship) it’s divided into compartments (partitions) to keep water from flooding the whole ship if one section gets damaged. In software, these partitions are designed around how the application works and its technical needs.
Designing with Context – If you’re using a design approach like DDD (Domain-Driven Design), make sure your bulkheads (partitions) match the business logic boundaries.
Choosing Isolation Levels – Decide how much isolation is needed. For example: Threads for lightweight tasks. Separate containers or virtual machines for more critical separations. Balance between keeping things separate and the costs or extra effort involved.
Combining Other Techniques – Bulkheads work even better with patterns like Retry, Circuit Breaker, Throttling.
Monitoring – Keep an eye on each partition’s performance. If one starts getting overloaded, you can adjust resources or change limits.
When Should You Use the Bulkhead Pattern?
To Isolate Critical Resources – If one part of your system fails, other parts can keep working. For example, you don’t want search functionality to stop working because the reviews section is down.
To Prioritize Important Work – For example, make sure payment processing (critical) is separate from background tasks like sending emails.
To Avoid Cascading Failures – If one part of the system gets overwhelmed, it won’t drag down everything else.
When Should You Avoid It?
Complexity Isn’t Needed – If your system is simple, adding bulkheads might just make it harder to manage.
Resource Efficiency is Critical – Sometimes, splitting resources into separate pools can mean less efficient use of those resources. If every thread, connection, or container is underutilized, this might not be the best approach.
Challenges and Best Practices
Overhead: Maintaining separate resource pools can increase system complexity and resource utilization.
Resource Sizing: Properly sizing the pools is critical to ensure resources are efficiently utilized without bottlenecks.
Monitoring: Use tools to monitor the health and performance of each resource pool to detect bottlenecks or saturation.
Alex Pandian was the system administrator for a tech company, responsible for managing servers, maintaining network stability, and ensuring that everything ran smoothly.
With many scripts running daily and long-running processes that needed monitoring, Alex was constantly flooded with notifications.
Alex Pandian: “Every day, I have to gothrough dozens of emails and alerts just to find the ones that matter,”
Alex muttered while sipping coffee in the server room.
Alex Pandian: “There must be a better way to streamline all this information.”
Despite using several monitoring tools, the notifications from these systems were scattered and overwhelming. Alex needed a more efficient method to receive alerts only when crucial events occurred, such as script failures or the completion of resource-intensive tasks.
Determined to find a better system, Alex began searching online for a tool that could help consolidate and manage notifications.
After reading through countless forums and reviews, Alex stumbled upon a discussion about ntfy.sh, a service praised for its simplicity and flexibility.
“This looks promising,” Alex thought, excited by the ability to publish and subscribe to notifications using a straightforward, topic-based system. The idea of having notifications sent directly to a phone or desktop without needing complex configurations was exactly what Alex was looking for.
Alex decided to consult with Sam, a fellow system admin known for their expertise in automation and monitoring.
Alex Pandian: “Hey Sam, have you ever used ntfy.sh?”
Sam: “Absolutely, It’s a lifesaver for managing notifications. How do you plan to use it?”
Alex Pandian: “I’m thinking of using it for real-time alerts on script failures and long-running commands, Can you show me how it works?”
Sam: “Of course,”
with a smile, eager to guide Alex through setting up ntfy.sh to improve workflow efficiency.
Together, Sam and Alex began configuring ntfy.sh for Alex’s environment. They focused on setting up topics and integrating them with existing systems to ensure that important notifications were delivered promptly.
Step 1: Identifying Key Topics
Alex identified the main areas where notifications were needed:
script-failures: To receive alerts whenever a script failed.
command-completions: To notify when long-running commands finished.
server-health: For critical server health alerts.
Step 2: Subscribing to Topics
Sam showed Alex how to subscribe to these topics using ntfy.sh on a mobile device and desktop. This ensured that Alex would receive notifications wherever they were, without having to constantly check email or dashboards.
Alex was impressed by the simplicity and efficiency of this approach. “I can automate all of this?” Alex asked.
“Definitely,” Sam replied. “You can integrate it with cron jobs, monitoring tools, and more. It’s a great way to keep track of important events without getting bogged down by noise.”
With the basics in place, Alex began applying ntfy.sh to various real-world scenarios, streamlining the notification process and improving overall efficiency.
Monitoring Script Failures
Alex set up automated alerts for critical scripts that ran daily, ensuring that any failures were immediately reported. This allowed Alex to address issues quickly, minimizing downtime and improving system reliability.
Whenever Alex initiated a long-running command, such as a server backup or data migration, notifications were sent upon completion. This enabled Alex to focus on other tasks without constantly checking on progress.
To monitor server health, Alex integrated ntfy.sh with existing monitoring tools, ensuring that any critical issues were immediately flagged.
# Send server health alert
curl -d "Server CPU usage is critically high!" ntfy.sh/server-health
As with any new tool, there were challenges to overcome. Alex encountered a few hurdles, but with Sam’s guidance, these were quickly resolved.
Challenge: Managing Multiple Notifications
Initially, Alex found it challenging to manage multiple notifications and ensure that only critical alerts were prioritized. Sam suggested using filters and priorities to focus on the most important messages.
# Subscribe with filters for high-priority alerts
ntfy subscribe script-failures --priority=high
Challenge: Scheduling Notifications
Alex wanted to schedule notifications for regular maintenance tasks and reminders. Sam introduced Alex to using cron for scheduling automated alerts.S
# Schedule notification for regular maintenance
echo "Time for weekly server maintenance." | at 8:00 AM next Saturday ntfy.sh/server-health
Sam gave some more examples to alex,
Monitoring disk space
As a system administrator, you can use ntfy.sh to receive alerts when disk space usage reaches a critical level. This helps prevent issues related to insufficient disk space.
# Check disk space and notify if usage is over 80%
disk_usage=$(df / | grep / | awk '{ print $5 }' | sed 's/%//g')
if [ $disk_usage -gt 80 ]; then
curl -d "Warning: Disk space usage is at ${disk_usage}%." ntfy.sh/disk-space
fi
Alerting on Website Downtime
You can use ntfy.sh to monitor the status of a website and receive notifications if it goes down.
# Check website status and notify if it's down
website="https://example.com"
status_code=$(curl -o /dev/null -s -w "%{http_code}\n" $website)
if [ $status_code -ne 200 ]; then
curl -d "Alert: $website is down! Status code: $status_code." ntfy.sh/website-monitor
fi
Reminding for Daily Tasks
You can set up ntfy.sh to send you daily reminders for important tasks, ensuring that you stay on top of your schedule.
# Schedule daily reminders
echo "Time to review your daily tasks!" | at 9:00 AM ntfy.sh/daily-reminders
echo "Stand-up meeting at 10:00 AM." | at 9:50 AM ntfy.sh/daily-reminders
Alerting on High System Load
Monitor system load and receive notifications when it exceeds a certain threshold, allowing you to take action before it impacts performance.
# Check system load and notify if it's high
load=$(uptime | awk '{ print $10 }' | sed 's/,//')
threshold=2.0
if (( $(echo "$load > $threshold" | bc -l) )); then
curl -d "Warning: System load is high: $load" ntfy.sh/system-load
fi
Notify on Backup Completion
Receive a notification when a backup process completes, allowing you to verify its success.
Integrate ntfy.sh with Docker to send alerts for specific container events, such as when a container stops unexpectedly.
# Notify on Docker container stop event
container_name="my_app"
container_status=$(docker inspect -f '{{.State.Status}}' $container_name)
if [ "$container_status" != "running" ]; then
curl -d "Alert: Docker container $container_name has stopped." ntfy.sh/docker-alerts
fi
Integrating with CI/CD Pipelines
Use ntfy.sh to notify you about the status of CI/CD pipeline stages, ensuring you stay informed about build successes or failures.
# Example GitLab CI/CD YAML snippet
stages:
- build
build_job:
stage: build
script:
- make build
after_script:
- if [ "$CI_JOB_STATUS" == "success" ]; then
curl -d "Build succeeded for commit $CI_COMMIT_SHORT_SHA." ntfy.sh/ci-cd-status;
else
curl -d "Build failed for commit $CI_COMMIT_SHORT_SHA." ntfy.sh/ci-cd-status;
fi
Notification on ssh login to server
Lets try with docker,
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
# Set root password for SSH access (change 'your_password' to your desired password)
RUN echo 'root:password' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd
COPY ntfy-ssh.sh /usr/bin/ntfy-ssh.sh
RUN chmod +x /usr/bin/ntfy-ssh.sh
RUN echo "session optional pam_exec.so /usr/bin/ntfy-ssh.sh" >> /etc/pam.d/sshd
RUN apt-get -y update; apt-get -y install curl
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
script to send notification,
#!/bin/bash
if [ "${PAM_TYPE}" = "open_session" ]; then
echo "here"
curl \
-H prio:high \
-H tags:warning \
-d "SSH login: ${PAM_USER} from ${PAM_RHOST}" \
ntfy.sh/syed-alerts
fi
With ntfy.sh as an integral part of daily operations, Alex found a renewed sense of balance and control. The once overwhelming chaos of notifications was now a manageable stream of valuable information.
As Alex reflected on the journey, it was clear that ntfy.sh had transformed not just the way notifications were managed, but also the overall approach to system administration.
In a world full of noise, ntfy.sh had provided a clear and effective way to stay informed without distractions. For Alex, it was more than just a tool—it was a new way of managing systems efficiently.