Normal view

There are new articles available, click to refresh the page.
Today — 30 January 2025Main stream

Learning Notes #67 – Build and Push to a Registry (Docker Hub) with GH-Actions

28 January 2025 at 02:30

GitHub Actions is a powerful tool for automating workflows directly in your repository.In this blog, we’ll explore how to efficiently set up GitHub Actions to handle Docker workflows with environments, secrets, and protection rules.

Why Use GitHub Actions for Docker?

My Code base is in Github and i want to tryout gh-actions to build and push images to docker hub seamlessly.

Setting Up GitHub Environments

GitHub Environments let you define settings specific to deployment stages. Here’s how to configure them:

1. Create an Environment

Go to your GitHub repository and navigate to Settings > Environments. Click New environment, name it (e.g., production), and save.

2. Add Secrets and Variables

Inside the environment settings, click Add secret to store sensitive information like DOCKER_USERNAME and DOCKER_TOKEN.

Use Variables for non-sensitive configuration, such as the Docker image name.

3. Optional: Set Protection Rules

Enforce rules like requiring manual approval before deployments. Restrict deployments to specific branches (e.g., main).

Sample Workflow for Building and Pushing Docker Images

Below is a GitHub Actions workflow for automating the build and push of a Docker image based on a minimal Flask app.

Workflow: .github/workflows/docker-build-push.yml


name: Build and Push Docker Image

on:
  push:
    branches:
      - main  # Trigger workflow on pushes to the `main` branch

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    environment: production  # Specify the environment to use

    steps:
      # Checkout the repository
      - name: Checkout code
        uses: actions/checkout@v3

      # Log in to Docker Hub using environment secrets
      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_TOKEN }}

      # Build the Docker image using an environment variable
      - name: Build Docker image
        env:
          DOCKER_IMAGE_NAME: ${{ vars.DOCKER_IMAGE_NAME }}
        run: |
          docker build -t ${{ secrets.DOCKER_USERNAME }}/$DOCKER_IMAGE_NAME:${{ github.run_id }} .

      # Push the Docker image to Docker Hub
      - name: Push Docker image
        env:
          DOCKER_IMAGE_NAME: ${{ vars.DOCKER_IMAGE_NAME }}
        run: |
          docker push ${{ secrets.DOCKER_USERNAME }}/$DOCKER_IMAGE_NAME:${{ github.run_id }}

To Actions on live: https://github.com/syedjaferk/gh_action_docker_build_push_fastapi_app/actions

Before yesterdayMain stream

SelfHost #2 | BugSink – An Error Tracking Tool

26 January 2025 at 16:41

I am regular follower of https://selfh.st/ , last week they showcased about BugSink. Bugsink is a tool to track errors in your applications that you can self-host. It’s easy to install and use, is compatible with the Sentry SDK, and is scalable and reliable.

When an application breaks, finding and fixing the root cause quickly is critical. Hosted error tracking tools often make you trade privacy for convenience, and they can be expensive. On the other hand, self-hosted solutions are an alternative, but they are often a pain to set up and maintain.

What Is Error Tracking?

When code is deployed in production, errors are inevitable. They can arise from a variety of reasons like bugs in the code, network failures, integration mismatches, or even unforeseen user behavior. To ensure smooth operation and user satisfaction, error tracking is essential.

Error tracking involves monitoring and recording errors in your application code, particularly in production environments. A good error tracker doesn’t just log errors; it contextualizes them, offering insights that make troubleshooting straightforward.

Here are the key benefits of error tracking

  • Early Detection: Spot issues before they snowball into critical outages.
  • Context-Rich Reporting: Understand the “what, when, and why” of an error.
  • Faster Debugging: Detailed stack traces make it easier to pinpoint root causes.

Effective error tracking tools allow developers to respond to errors proactively, minimizing user impact.

Why Bugsink?

Bugsink takes error tracking to a new level by prioritizing privacy, simplicity, and compatibility.

1. Built for Self-Hosting

Unlike many hosted error tracking tools that require sensitive data to be shared with third-party servers, Bugsink is self-hosted. This ensures you retain full control over your data, a critical aspect for privacy-conscious teams.

2. Easy to Set Up and Manage

Whether you’re deploying it on your local server or in the cloud, the experience is smooth.

3. Resource Efficiency

Bugsink is designed to be lightweight and efficient. It doesn’t demand hefty server resources, making it an ideal choice for startups, small teams, or resource-constrained environments.

4. Compatible with Sentry

If you’ve used Sentry before, you’ll feel right at home with Bugsink. It offers Sentry compatibility, allowing you to migrate effortlessly or use it alongside existing tools. This compatibility also means you can leverage existing SDKs and integrations.

5. Proactive Notifications

Bugsink ensures you’re in the loop as soon as something goes wrong. Email notifications alert you the moment an error occurs, enabling swift action. This proactive approach reduces the mean time to resolution (MTTR) and keeps users happy.

Docs: https://www.bugsink.com/docs/

In this blog, i jot down my experience on using BugSink with Python.

1. Run using Docker

There are many ways proposed for BugSink installation, https://www.bugsink.com/docs/installation/. In this blog, i am trying using docker.


docker pull bugsink/bugsink:latest

docker run \
  -e SECRET_KEY=ab4xjs5wfnP2XrUwRJPtmk1sEnMcx9d2mta8vtbdZ4oOtvy5BJ \
  -e CREATE_SUPERUSER=admin:admin \
  -e PORT=8000 \
  -p 8000:8000 \
  bugsink/bugsink

2. Log In, Create a Team, Project

The Application will run at port 8000.

Login using admin/admin. Create a new team, by clicking the top right button.

Give a name to the team,

then create a project, under this team,

After creating a project, you will be able to see like below,

You will get an individual DSN , like http://9d0186dd7b854205bed8d60674f349ea@localhost:8000/1.

3. Attaching DSN to python app



import sentry_sdk

sentry_sdk.init(
    "http://d76bc0ccf4da4423b71d1fa80d6004a3@localhost:8000/1",

    send_default_pii=True,
    max_request_body_size="always",
    traces_sample_rate=0,
)

def divide(num1, num2):
    return num1/num2

divide(1, 0)


The above program, will throw an Zero Division Error, which will be reflected in BugSink application.

The best part is you will get the value of variables at that instance. In this example, you can see values of num1 and num2.

There are lot more awesome features out there https://www.bugsink.com/docs/.

Learning Notes #66 – What is SBOM ? Software Bill of Materials

26 January 2025 at 09:16

Yesterday, i came to know about SBOM, from my friend Prasanth Baskar. Let’s say you’re building a website.

You decide to use a popular open-source tool to handle user logins. Here’s the catch,

  • That library uses another library to store data.
  • That tool depends on another library to handle passwords.

Now, if one of those libraries has a bug or security issue, how do you even know it’s there? In this blog, i will jot down my understanding on SBOM with Trivy.

What is SBOM ?

A Software Bill of Materials (SBOM) is a list of everything that makes up a piece of software.

Think of it as,

  • A shopping list for all the tools, libraries, and pieces used to build the software.
  • A recipe card showing what’s inside and how it’s structured.

For software, this means,

  • Components: These are the “ingredients,” such as open-source libraries, frameworks, and tools.
  • Versions: Just like you might want to know if the cake uses almond flour or regular flour, knowing the version of a software component matters.
  • Licenses: Did the baker follow the rules for the ingredients they used? Software components also come with licenses that dictate how they can be used.

So How come its Important ?

1. Understanding What You’re Using

When you download or use software, especially something complex, you often don’t know what’s inside. An SBOM helps you understand what components are being used are they secure? Are they trustworthy?

2. Finding Problems Faster

If someone discovers that a specific ingredient is bad—like flour with bacteria in it—you’d want to know if that’s in your cake. Similarly, if a software library has a security issue, an SBOM helps you figure out if your software is affected and needs fixing.

For example,

When the Log4j vulnerability made headlines, companies that had SBOMs could quickly identify whether they used Log4j and take action.

3. Building Trust

Imagine buying food without a label or list of ingredients.

You’d feel doubtful, right ? Similarly, an SBOM builds trust by showing users exactly what’s in the software they’re using.

4. Avoiding Legal Trouble

Some software components come with specific rules or licenses about how they can be used. An SBOM ensures these rules are followed, avoiding potential legal headaches.

How to Create an SBOM?

For many developers, creating an SBOM manually would be impossible because modern software can have hundreds (or even thousands!) of components.

Thankfully, there are tools that automatically create SBOMs. Examples include,

  • Trivy: A lightweight tool to generate SBOMs and find vulnerabilities.
  • CycloneDX: A popular SBOM format supported by many tools https://cyclonedx.org/
  • SPDX: Another format designed to make sharing SBOMs easier https://spdx.dev/

These tools can scan your software and automatically list out every component, its version, and its dependencies.

We will see example on generating a SBOM file for nginx using trivy.

How Trivy Works ?

On running trivy scan,

1. It downloads Trivy DB including vulnerability information.

2. Pull Missing layers in cache.

3. Analyze layers and stores information in cache.

4. Detect security issues and write to SBOM file.

Note: a CVE refers to a Common Vulnerabilities and Exposures identifier. A CVE is a unique code used to catalog and track publicly known security vulnerabilities and exposures in software or systems.

How to Generate SBOMs with Trivy

Step 1: Install Trivy in Ubuntu

sudo apt-get install wget gnupg
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy

More on Installation: https://github.com/aquasecurity/trivy/blob/main/docs/getting-started/installation.md

Step 2: Generate an SBOM

Trivy allows you to create SBOMs in formats like CycloneDX or SPDX.

trivy image --format cyclonedx --output sbom.json nginx:latest

It generates the SBOM file.

It can be incorporated into Github CI/CD.

Event Summary: FOSS United Chennai Meetup – 25-01-2025

26 January 2025 at 04:53

🚀 Attended the FOSS United Chennai Meetup Yesterday! 🚀

After, attending Grafana & Friends Meetup, straightly went to FOSS United Chennai Meetup at YuniQ in Taramani.

Had a chance to meet my Friends face to face after a long time. Sakhil Ahamed E. , Dhanasekar T, Dhanasekar Chellamuthu, Thanga Ayyanar, Parameshwar Arunachalam, Guru Prasath S, Krisha, Gopinathan Asokan

Talks Summary,

1. Ansh Arora, Gave a tour on FOSS United, How its formed, Motto, FOSS Hack, FOSS Clubs.

2. Karthikeyan A K, Gave a talk on his open source product injee (The no configuration instant database for frontend developers.). It’s a great tool. He gave a personal demo for me. It’s a great tool with lot of potentials. Would like to contribute !.

3. Justin Benito, How they celebrated New Year with https://tamilnadu.tech
It’s single go to page for events in Tamil Nadu. If you are interested ,go to the repo https://lnkd.in/geKFqnFz and contribute.

From Kaniyam Foundation we are maintaining a Google Calendar for a long time on Tech Events happening in Tamil Nadu https://lnkd.in/gbmGMuaa.

4. Prasanth Baskar, gave a talk on Harbor, OSS Container Registry with SBOM and more functionalities. SBOM was new to me.

5. Thanga Ayyanar, gave a talk on Static Site Generation with Emacs.

At the end, we had a group photo and went for tea. Got to meet my Juniors from St. Joseph’s Institute of Technology in this meet. Had a discussion with Parameshwar Arunachalam on his BuildToLearn Experience. They started prototyping an Tinder app for Tamil Words. After that had a small discussion on our Feb 8th Glug Inauguration at St. Joseph’s Institute of Technology Dr. KARTHI M .

Happy to see, lot of minds travelling from different districts to attend this meet.

Event Summary: Grafana & Friends Meetup Chennai – 25-01-2025

26 January 2025 at 04:47

🚀 Attended the Grafana & Friends Meetup Yesterday! 🚀

I usually have a question. As a developer, i have logs, isn’t that enough. With curious mind, i attended Grafana & Friends Chennai meetup (Jan 25th 2025)

Had an awesome time meeting fellow tech enthusiasts (devops engineers) and learning about cool ways to monitor and understand data better.
Big shoutout to the Grafana Labs community and Presidio for hosting such a great event!

Sandwich and Juice was nice 😋

Talk Summary,

1⃣ Making Data Collection Easier with Grafana Alloy
Dinesh J. and Krithika R shared how Grafana Alloy, combined with Open Telemetry, makes it super simple to collect and manage data for better monitoring.

2⃣ Running Grafana in Kubernetes
Lakshmi Narasimhan Parthasarathy (https://lnkd.in/gShxtucZ) showed how to set up Grafana in Kubernetes in 4 different ways (vanilla, helm chart, grafana operator, kube-prom-stack). He is building a SaaS product https://lnkd.in/gSS9XS5m (Heroku on your own servers).

3⃣ Observability for Frontend Apps with Grafana Faro
Selvaraj Kuppusamy show how Grafana Faro can help frontend developers monitor what’s happening on websites and apps in real time. This makes it easier to spot and fix issues quickly. Were able to see core web vitals, and traces too. I was surprised about this.

Techies i had interaction with,

Prasanth Baskar, who is an open source contributor at Cloud Native Computing Foundation (CNCF) on project https://lnkd.in/gmHjt9Bs. I was also happy to know that he knows **parottasalna** (that’s me) and read some blogs. Happy To Hear that.

Selvaraj Kuppusamy, Devops Engineer, he is also conducting Grafana and Friends chapter in Coimbatore on Feb 1. I will attend that aswell.

Saranraj Chandrasekaran who is also a devops engineer, Had a chat with him on devops and related stuffs.

To all of them, i shared about KanchiLUG (https://lnkd.in/gasCnxXv) and Parottasalna (https://parottasalna.com/) and My Channel on Tech https://lnkd.in/gKcyE-b5.

Thanks Achanandhi M for organising this wonderful meetup. You did well. I came to Achanandhi M from medium. He regularly writes blog on cloud related stuffs. https://lnkd.in/ghUS-GTc Checkout his blog.

Also, He shared some tasks for us,

1. Create your First Grafana Dashboard.
Objective: Create a basic Grafana Dashboard to visualize data in various formats such as tables, charts and graphs. Aslo, try to connect to multiple data sources to get diverse data for your dashboard.

2. Monitor your linux system’s health with prometheus, Node Exporter and Grafana.
Objective: Use prometheus, Node Exporter adn Grafana to monitor your linux machines health system by tracking key metrics like CPU, memory and disk usage.


3. Using Grafana Faro to track User Actions (Like Button Clicks) and Identify the Most Used Features.

Give a try on these.

RSVP for RabbitMQ: Build Scalable Messaging Systems in Tamil

24 January 2025 at 11:21

Hi All,

Invitation to RabbitMQ Session

🔹 Topic: RabbitMQ: Asynchronous Communication
🔹 Date: Feb 2 Sunday
🔹 Time: 10:30 AM to 1 PM
🔹 Venue: Online. Will be shared in mail after RSVP.

Join us for an in-depth session on RabbitMQ in தமிழ், where we’ll explore,

  • Message queuing fundamentals
  • Connections, channels, and virtual hosts
  • Exchanges, queues, and bindings
  • Publisher confirmations and consumer acknowledgments
  • Use cases and live demos

Whether you’re a developer, DevOps enthusiast, or curious learner, this session will empower you with the knowledge to build scalable and efficient messaging systems.

📌 Don’t miss this opportunity to level up your messaging skills!

RSVP Now,

Our Previous Monthly meetshttps://www.youtube.com/watch?v=cPtyuSzeaa8&list=PLiutOxBS1MizPGGcdfXF61WP5pNUYvxUl&pp=gAQB

Our Previous Sessions,

  1. Python – https://www.youtube.com/watch?v=lQquVptFreE&list=PLiutOxBS1Mizte0ehfMrRKHSIQcCImwHL&pp=gAQB
  2. Docker – https://www.youtube.com/watch?v=nXgUBanjZP8&list=PLiutOxBS1Mizi9IRQM-N3BFWXJkb-hQ4U&pp=gAQB
  3. Postgres – https://www.youtube.com/watch?v=04pE5bK2-VA&list=PLiutOxBS1Miy3PPwxuvlGRpmNo724mAlt&pp=gAQB

Our Social Handles,

Learning Notes #65 – Application Logs, Metrics, MDC

21 January 2025 at 05:45

I am big fan of logs. Would like to log everything. All the request, response of an API. But is it correct ? Though logs helped our team greatly during this new year, i want to know, is there a better approach to log things. That search made this blog. In this blog i jot down notes on logging. Lets log it.

Throughout this blog, i try to generalize things. Not biased to a particular language. But here and there you can see me biased towards Python. Also this is my opinion. Not a hard rule.

Which is a best logger ?

I’m not here to argue about which logger is the best, they all have their problems. But the worst one is usually the one you build yourself. Sure, existing loggers aren’t perfect, but trying to create your own is often a much bigger mistake.

1. Why Logging Matters

Logging provides visibility into your application’s behavior, helping to,

  • Diagnose and troubleshoot issues (This is most common usecase)
  • Monitor application health and performance (Metrics)
  • Meet compliance and auditing requirements (Audit Logs)
  • Enable debugging in production environments (we all do this.)

However, poorly designed logging strategies can lead to excessive log volumes, higher costs, and difficulty in pinpointing actionable insights.

2. Logging Best Practices

a. Use Structured Logs

Long story short, instead of unstructured plain text, use JSON or other structured formats. This makes parsing and querying easier, especially in log aggregation tools.


{
  "timestamp": "2025-01-20T12:34:56Z",
  "level": "INFO",
  "message": "User login successful",
  "userId": 12345,
  "sessionId": "abcde12345"
}

b. Leverage Logging Levels

Define and adhere to appropriate logging levels to avoid log bloat:

  • DEBUG: Detailed information for debugging.
  • INFO: General operational messages.
  • WARNING: Indications of potential issues.
  • ERROR: Application errors that require immediate attention.
  • CRITICAL: Severe errors leading to application failure.

c. Avoid Sensitive Data

Sanitize your logs to exclude sensitive information like passwords, PII, or API keys. Instead, mask or hash such data. Don’t add token even for testing.


d. Include Contextual Information

Incorporate metadata like request IDs, user IDs, or transaction IDs to trace specific events effectively.


3. Log Ingestion at Scale

As applications scale, log ingestion can become a bottleneck. Here’s how to manage it,

a. Centralized Logging

Stream logs to centralized systems like Elasticsearch, Logstash, Kibana (ELK), or cloud-native services like AWS CloudWatch, Azure Monitor, or Google Cloud Logging.

b. Optimize Log Volume

  • Log only necessary information.
  • Use log sampling to reduce verbosity in high-throughput systems.
  • Rotate logs to limit disk usage.

c. Use Asynchronous Logging

Asynchronous loggers improve application performance by delegating logging tasks to separate threads or processes. (Not Suitable all time. It has its own problems)

d. Method return values are usually important

If you have a log in the method and don’t include the return value of the method, you’re missing important information. Make an effort to include that at the expense of slightly less elegant looking code.

e. Include filename in error messages

Mention the path/to/file:line-number to pinpoint the location of the issue.

3. Logging Don’ts

a. Don’t Log Everything at the Same Level

Logging all messages at the INFO or DEBUG level creates noise and makes it difficult to identify critical issues.

b. Don’t Hardcode Log Messages

Avoid static, vague, or generic log messages. Use dynamic and descriptive messages that include relevant context.

# Bad Example
Error occurred.

# Good Example
Error occurred while processing payment for user_id=12345, transaction_id=abc-6789.

c. Don’t Log Sensitive or Regulated Data

Exposing personally identifiable information (PII), passwords, or other sensitive data in logs can lead to compliance violations (e.g., GDPR, HIPAA).

d. Don’t Ignore Log Rotation

Failing to implement log rotation can result in disk space exhaustion, especially in high traffic systems (Log Retention).

e. Don’t Overlook Log Correlation

Logs without request IDs, session IDs, or contextual metadata make it difficult to correlate related events.

f. Don’t Forget to Monitor Log Costs

Logging everything without considering storage and processing costs can lead to financial inefficiency in large-scale systems.

g. Keep the log message short

Long and verbose messages are a cost. The cost is in reading time and ingestion time.

h. Never use log message in loop

This might seem obvious, but just to be clear -> logging inside a loop, even if the log level isn’t visible by default, can still hurt performance. It’s best to avoid this whenever possible.

If you absolutely need to log something at a hidden level and decide to break this guideline, keep it short and straightforward.

i. Log item you already “have”

We should avoid this,


logger.info("Reached X and value of method is {}", method());

Here, just for the logging purpose, we are calling the method() again. Even if the method is cheap. You’re effectively running the method regardless of the respective logging levels!

j. Dont log iterables

Even if it’s a small list. The concern is that the list might grow and “overcrowd” the log. Writing the content of the list to the log can balloon it up and slow processing noticeably. Also kills time in debugging.

k. Don’t Log What the Framework Logs for You

There are great things to log. E.g. the name of the current thread, the time, etc. But those are already written into the log by default almost everywhere. Don’t duplicate these efforts.

l.Don’t log Method Entry/Exit

Log only important events in the system. Entering or exiting a method isn’t an important event. E.g. if I have a method that enables feature X the log should be “Feature X enabled” and not “enable_feature_X entered”. I have done this a lot.

m. Dont fill the method

A complex method might include multiple points of failure, so it makes sense that we’d place logs in multiple points in the method so we can detect the failure along the way. Unfortunately, this leads to duplicate logging and verbosity.

Errors will typically map to error handling code which should be logged in generically. So all error conditions should already be covered.

This creates situations where we sometimes need to change the flow/behavior of the code, so logging will be more elegant.

n. Don’t use AOP logging

AOP (Aspect-Oriented Programming) logging allows you to automatically add logs at specific points in your application, such as when methods are entered or exited.

In Python, AOP-style logging can be implemented using decorators or middleware that inject logs into specific points, such as method entry and exit. While it might seem appealing for detailed tracing, the same problems apply as in other languages like Java.


import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def log_method_entry_exit(func):
    def wrapper(*args, **kwargs):
        logger.info(f"Entering: {func.__name__} with args={args} kwargs={kwargs}")
        result = func(*args, **kwargs)
        logger.info(f"Exiting: {func.__name__} with result={result}")
        return result
    return wrapper

# Example usage
@log_method_entry_exit
def example_function(x, y):
    return x + y

example_function(5, 3)

Why Avoid AOP Logging in Python

  1. Performance Impact:
    • Injecting logs into every method increases runtime overhead, especially if used extensively in large-scale systems.
    • In Python, where function calls already add some overhead, this can significantly affect performance.
  2. Log Verbosity:
    • If this decorator is applied to every function or method in a system, it produces an enormous amount of log data.
    • Debugging becomes harder because the meaningful logs are lost in the noise of entry/exit logs.
  3. Limited Usefulness:
    • During local development, tools like Python debuggers (pdb), profilers (cProfile, line_profiler), or tracing libraries like trace are far more effective for inspecting function behavior and performance.
  4. CI Issues:
    • Enabling such verbose logging during CI test runs can make tracking test failures more difficult because the logs are flooded with entry/exit messages, obscuring the root cause of failures.

Use Python-specific tools like pdb, ipdb, or IDE-integrated debuggers to inspect code locally.

o. Dont Double log

It’s pretty common to log an error when we’re about to throw an error. However, since most error code is generic, it’s likely there’s a log in the generic error handling code.

4. Ensuring Scalability

To keep your logging system robust and scalable,

  • Monitor Log Storage: Set alerts for log storage thresholds.
  • Implement Compression: Compress log files to reduce storage costs.
  • Automate Archival and Deletion: Regularly archive old logs and purge obsolete data.
  • Benchmark Logging Overhead: Measure the performance impact of logging on your application.

5. Logging for Metrics

Below, is the list of items that i wish can be logged for metrics.

General API Metrics

  1. General API Metrics on HTTP methods, status codes, latency/duration, request size.
  2. Total requests per endpoint over time. Requests per minute/hour.
  3. Frequency and breakdown of 4XX and 5XX errors.
  4. User ID or API client making the request.

{
  "timestamp": "2025-01-20T12:34:56Z",
  "endpoint": "/projects",
  "method": "POST",
  "status_code": 201,
  "user_id": 12345,
  "request_size_bytes": 512,
  "response_size_bytes": 256,
  "duration_ms": 120
}

Business Specific Metrics

  1. Objects (session) creations: No. of projects created (daily/weekly)
  2. Average success/failure rate.
  3. Average time to create a session.
  4. Frequency of each action on top of session.

{
  "timestamp": "2025-01-20T12:35:00Z",
  "endpoint": "/projects/12345/actions",
  "action": "edit",
  "status_code": 200,
  "user_id": 12345,
  "duration_ms": 98
}

Performance Metrics

  1. Database query metrics on execution time, no. of queries per request.
  2. Third party service metrics on time spent, success/failure rates of external calls.

{
  "timestamp": "2025-01-20T12:37:15Z",
  "endpoint": "/projects/12345",
  "db_query_time_ms": 45,
  "external_api_time_ms": 80,
  "status_code": 200,
  "duration_ms": 130
}

Scalability Metrics

  1. Concurrency metrics on max request handled.
  2. Request queue times during load.
  3. System Metrics on CPU and Memory usage during request processing (this will be auto captured).

Usage Metrics

  1. Traffic analysis on peak usage times.
  2. Most/Least used endpoints.

6. Mapped Diagnostic Context (MDC)

MDC is the one, i longed for most. Also went into trouble by implementing without a middleware.

Mapped Diagnostic Context (MDC) is a feature provided by many logging frameworks, such as Logback, Log4j, and SLF4J. It allows developers to attach contextual information (key-value pairs) to the logging events, which can then be automatically included in log messages.

This context helps in differentiating and correlating log messages, especially in multi-threaded applications.

Why Use MDC?

  1. Enhanced Log Clarity: By adding contextual information like user IDs, session IDs, or transaction IDs, MDC enables logs to provide more meaningful insights.
  2. Easier Debugging: When logs contain thread-specific context, tracing the execution path of a specific transaction or user request becomes straightforward.
  3. Reduced Log Ambiguity: MDC ensures that logs from different threads or components do not get mixed up, avoiding confusion.

Common Use Cases

  1. Web Applications: Logging user sessions, request IDs, or IP addresses to trace the lifecycle of a request.
  2. Microservices: Propagating correlation IDs across services for distributed tracing.
  3. Background Tasks: Tracking specific jobs or tasks in asynchronous operations.

Limitations (Curated from other blogs. I havent tried yet )

  1. Thread Boundaries: MDC is thread-local, so its context does not automatically propagate across threads (e.g., in asynchronous executions). For such scenarios, you may need to manually propagate the MDC context.
  2. Overhead: Adding and managing MDC context introduces a small runtime overhead, especially in high-throughput systems.
  3. Configuration Dependency: Proper MDC usage often depends on correctly configuring the logging framework.


2025-01-21 14:22:15.123 INFO  [thread-1] [userId=12345, transactionId=abc123] Starting transaction
2025-01-21 14:22:16.456 DEBUG [thread-1] [userId=12345, transactionId=abc123] Processing request
2025-01-21 14:22:17.789 ERROR [thread-1] [userId=12345, transactionId=abc123] Error processing request: Invalid input
2025-01-21 14:22:18.012 INFO  [thread-1] [userId=12345, transactionId=abc123] Transaction completed

In Fastapi, we can implement this via a middleware,


import logging
import uuid
from fastapi import FastAPI, Request
from starlette.middleware.base import BaseHTTPMiddleware

# Configure the logger
logger = logging.getLogger("uvicorn")
logger.setLevel(logging.INFO)

# Create a custom formatter with MDC placeholders
class CustomFormatter(logging.Formatter):
    def format(self, record):
        record.user_id = getattr(record, "user_id", "unknown")
        record.transaction_id = getattr(record, "transaction_id", str(uuid.uuid4()))
        return super().format(record)

# Set the logging format with MDC keys
formatter = CustomFormatter(
    "%(asctime)s %(levelname)s [%(threadName)s] [userId=%(user_id)s, transactionId=%(transaction_id)s] %(message)s"
)

# Apply the formatter to the handler
console_handler = logging.StreamHandler()
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)

# FastAPI application
app = FastAPI()

# Custom Middleware to add MDC context
class RequestContextMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        # Add MDC info before handling the request
        user_id = request.headers.get("X-User-ID", "default-user")
        transaction_id = str(uuid.uuid4())
        logging.getLogger().info(f"Request started: {user_id}, {transaction_id}")

        # Add MDC info to log
        logging.getLogger().user_id = user_id
        logging.getLogger().transaction_id = transaction_id

        response = await call_next(request)

        # Optionally, log additional information when the response is done
        logging.getLogger().info(f"Request finished: {user_id}, {transaction_id}")

        return response

# Add custom middleware to the FastAPI app
app.add_middleware(RequestContextMiddleware)

@app.get("/")
async def read_root():
    logger.info("Handling the root endpoint.")
    return {"message": "Hello, World!"}

@app.get("/items/{item_id}")
async def read_item(item_id: int):
    logger.info(f"Fetching item with ID {item_id}")
    return {"item_id": item_id}

Hope, you might have got a better idea on logging.

Learning Notes #64 – E-Tags and Last-Modified Headers

20 January 2025 at 16:57

Today morning, i started with a video on E-Tags (came as first in youtube suggestion). In this blog i jot down my notes on E-Tags and how it helps in saving bandwidth. Also how Last-Modified header is better than E-Tags.

In the world of web development, ensuring efficient resource management and improved performance is crucial. Two key mechanisms that help in achieving this are E-Tags (Entity Tags) and the Last-Modified header.

These HTTP features facilitate caching and conditional requests, reducing bandwidth usage and improving user experience.

What is an E-Tag?

An Entity Tag (E-Tag) is an HTTP header used for web cache validation. It acts as a unique identifier for a specific version of a resource on the server. When a resource changes, its E-Tag also changes, enabling clients (e.g., browsers) to determine if their cached version of the resource is still valid.

How E-Tags Work

1. Response with E-Tag: When a client requests a resource, the server responds with the resource and an E-Tag in the HTTP header.


HTTP/1.1 200 OK
ETag: "abc123"
Content-Type: application/json
Content-Length: 200

2. Subsequent Requests: On subsequent requests, the client includes the E-Tag in the If-None-Match header.


GET /resource HTTP/1.1
If-None-Match: "abc123"

3. Server Response

If the resource hasn’t changed, the server responds with a 304 Not Modified status, saving bandwidth,


HTTP/1.1 304 Not Modified

If the resource has changed, the server responds with a 200 OK status and a new E-Tag,


HTTP/1.1 200 OK
ETag: "xyz789"

Benefits of E-Tags

  • Precise cache validation based on resource version.
  • Reduced bandwidth usage as unchanged resources are not re-downloaded.
  • Improved user experience with faster loading times for unchanged resources.

What is the Last-Modified Header?

The Last-Modified header indicates the last time a resource was modified on the server. It’s a simpler mechanism compared to E-Tags but serves a similar purpose in caching and validation.

How Last-Modified Works

1. Response with Last-Modified: When a client requests a resource, the server includes the Last-Modified header in its response,


HTTP/1.1 200 OK
Last-Modified: Wed, 17 Jan 2025 10:00:00 GMT
Content-Type: image/png
Content-Length: 1024

    2. Subsequent Requests: On future requests, the client includes the If-Modified-Since header.

    
    GET /image.png HTTP/1.1
    If-Modified-Since: Wed, 17 Jan 2025 10:00:00 GMT
    

    3. Server Response

    If the resource hasn’t changed, the server responds with a 304 Not Modified status,

    
    HTTP/1.1 304 Not Modified
    

    If the resource has changed, the server sends the updated resource with a new Last-Modified value,

    
    HTTP/1.1 200 OK
    Last-Modified: Thu, 18 Jan 2025 12:00:00 GMT
    

    E-Tags and Last-Modified headers are powerful tools for improving web application performance. By enabling conditional requests and efficient caching, they reduce server load and bandwidth usage while enhancing the user experience. Remember, these 2 are pretty old mechanisms, which are been used tilldate.

    Learning Notes #63 – Change Data Capture. What does it do ?

    19 January 2025 at 16:22

    Few days back i came across a concept of CDC. Like a notifier of database events. Instead of polling, this enables event to be available in a queue, which can be consumed by many consumers. In this blog, i try to explain the concepts, types in a theoretical manner.

    You run a library. Every day, books are borrowed, returned, or new books are added. What if you wanted to keep a live record of all these activities so you always know the exact state of your library?

    This is essentially what Change Data Capture (CDC) does for your databases. It’s a way to track changes (like inserts, updates, or deletions) in your database tables and send them to another system, like a live dashboard or a backup system. (Might be a bad example. Don’t lose hope. Continue …)

    CDC is widely used in modern technology to power,

    • Real-Time Analytics: Live dashboards that show sales, user activity, or system performance.
    • Data Synchronization: Keeping multiple databases or microservices in sync.
    • Event-Driven Architectures: Triggering notifications, workflows, or downstream processes based on database changes.
    • Data Pipelines: Streaming changes to data lakes or warehouses for further processing.
    • Backup and Recovery: Incremental backups by capturing changes instead of full data dumps.

    It’s a critical part of tools like Debezium, Kafka, and cloud services such as AWS Database Migration Service (DMS) and Azure Data Factory. CDC enables companies to move towards real-time data-driven decision-making.

    What is CDC?

    CDC stands for Change Data Capture. It’s a technique that listens to a database and captures every change that happens in it. These changes can then be sent to other systems to,

    • Keep data in sync across multiple databases.
    • Power real-time analytics dashboards.
    • Trigger notifications for certain database events.
    • Process data streams in real time.

    In short, CDC ensures your data is always up-to-date wherever it’s needed.

    Why is CDC Useful?

    Imagine you have an online store. Whenever someone,

    • Places an order,
    • Updates their shipping address, or
    • Cancels an order,

    you need these changes to be reflected immediately across,

    • The shipping system.
    • The inventory system.
    • The email notification service.

    Instead of having all these systems query the database (this is one of main reasons) constantly (which is slow and inefficient), CDC automatically streams these changes to the relevant systems.

    This means,

    1. Real-Time Updates: Systems receive changes instantly.
    2. Improved Performance: Your database isn’t overloaded with repeated queries.
    3. Consistency: All systems stay in sync without manual intervention.

    How Does CDC Work?

    Note: I haven’t yet tried all these. But conceptually having a feeling.

    CDC relies on tracking changes in your database. There are a few ways to do this,

    1. Query-Based CDC

    This method repeatedly checks the database for changes. For example:

    • Every 5 minutes, it queries the database: “What changed since my last check?”
    • Any new or modified data is identified and processed.

    Drawbacks: This can miss changes if the timing isn’t right, and it’s not truly real-time (Long Polling).

    2. Log-Based CDC

    Most modern databases (like PostgreSQL or MySQL) keep logs of every operation. Log-based CDC listens to these logs and captures changes as they happen.

    Advantages

    • It’s real-time.
    • It’s lightweight since it doesn’t query the database directly.

    3. Trigger-Based CDC

    In this method, the database uses triggers to log changes into a separate table. Whenever a change occurs, a trigger writes a record of it.

    Advantages: Simple to set up.

    Drawbacks: Can slow down the database if not carefully managed.

    Tools That Make CDC Easy

    Several tools simplify CDC implementation. Some popular ones are,

    1. Debezium: Open-source and widely used for log-based CDC with databases like PostgreSQL, MySQL, and MongoDB.
    2. Striim: A commercial tool for real-time data integration.
    3. AWS Database Migration Service (DMS): A cloud-based CDC service.
    4. StreamSets: Another tool for real-time data movement.

    These tools integrate with databases, capture changes, and deliver them to systems like RabbitMQ, Kafka, or cloud storage.

    To help visualize CDC, think of,

    • Social Media Feeds: When someone likes or comments on a post, you see the update instantly. This is CDC in action.
    • Bank Notifications: Whenever you make a transaction, your bank app updates instantly. Another example of CDC.

    In upcoming blogs, will include Debezium implementation with CDC.

    Learning Notes #62 – Serverless – Just like riding a taxi

    19 January 2025 at 04:55

    What is Serverless Computing?

    Serverless computing allows developers to run applications without having to manage the underlying infrastructure. You write code, deploy it, and the cloud provider takes care of the rest from provisioning servers to scaling applications.

    Popular serverless platforms include AWS Lambda, Azure Functions, and Google Cloud Functions.

    The Taxi Analogy

    Imagine traveling to a destination. There are multiple ways to get there,

    1. Owning a Car (Traditional Servers): You own and maintain your car. This means handling maintenance, fuel, insurance, parking, and everything else that comes with it. It’s reliable and gives you control, but it’s also time-consuming and expensive to manage.
    2. Hiring a Taxi (Serverless): With a taxi, you simply book a ride when you need it. You don’t worry about maintaining the car, fueling it, or where it’s parked afterward. You pay only for the distance traveled, and the service scales to your needs whether you’re alone or with friends.

    Why Serverless is Like Taking a Taxi ?

    1. No Infrastructure Management – With serverless, you don’t have to manage or worry about servers, just like you don’t need to maintain a taxi.
    2. Pay-As-You-Go – In a taxi, you pay only for the distance traveled. Similarly, in serverless, you’re billed only for the compute time your application consumes.
    3. On-Demand Availability – Need a ride at midnight? A taxi is just a booking away. Serverless functions work the same way available whenever you need them, scaling up or down as required.
    4. Scalability – Whether you’re a solo traveler or part of a group, taxis can adapt by providing a small car or a larger vehicle. Serverless computing scales resources automatically based on traffic, ensuring optimal performance.
    5. Focus on the Destination – When you take a taxi, you focus on reaching your destination without worrying about the vehicle. Serverless lets you concentrate on writing and deploying code rather than worrying about servers.

    Key Benefits of Serverless (and Taxi Rides)

    • Cost-Effectiveness – Avoid upfront costs. No need to buy servers (or cars) you might not fully utilize.
    • Flexibility – Serverless platforms support multiple programming languages and integrations.
      Taxis, too, come in various forms: regular cars, SUVs, and even luxury rides for special occasions.
    • Reduced Overhead – Free yourself from maintenance tasks, whether it’s patching servers or checking tire pressure.

    When Not to Choose Serverless (or a Taxi)

    1. Predictable, High-Volume Usage – Owning a car might be cheaper if you’re constantly on the road. Similarly, for predictable and sustained workloads, traditional servers or containers might be more cost-effective than serverless.
    2. Special Requirements – Need a specific type of vehicle, like a truck for moving furniture? Owning one might make sense. Similarly, applications with unique infrastructure requirements may not be a perfect fit for serverless.
    3. Latency Sensitivity – Taxis take time to arrive after booking. Likewise, serverless functions may experience cold starts, adding slight delays. For ultra-low-latency applications, other architectures may be preferable.

    Learning Notes #61 – Undo a git pull

    18 January 2025 at 16:04

    Today, i came across a blog on undo a git pull. In this blog, i have reiterated the blog in other words.

    Mistakes happen. You run a git pull and suddenly find your repository in a mess. Maybe conflicts arose, or perhaps the changes merged from the remote branch aren’t what you expected.

    Fortunately, Git’s reflog comes to the rescue, allowing you to undo a git pull and restore your repository to its previous state. Here’s how you can do it.

    Understanding Reflog


    Reflog is a powerful feature in Git that logs every update made to the tips of your branches and references. Even actions like resets or rebases leave traces in the reflog. This makes it an invaluable tool for troubleshooting and recovering from mistakes.

    Whenever you perform a git pull, Git updates the branch pointer, and the reflog records this action. By examining the reflog, you can identify the exact state of your branch before the pull and revert to it if needed.

    Step By Step Guide to UNDO a git pull

    1. Check Your Current State Ensure you’re aware of the current state of your branch. If you have uncommitted changes, stash or commit them to avoid losing any work.

    
    git stash
    # or
    git add . && git commit -m "Save changes before undoing pull"
    

    2. Inspect the Reflog View the recent history of your branch using the reflog,

    
    git reflog
    

    This command will display a list of recent actions, showing commit hashes and descriptions. For example,

    
    0a1b2c3 (HEAD -> main) HEAD@{0}: pull origin main: Fast-forward
    4d5e6f7 HEAD@{1}: commit: Add new feature
    8g9h0i1 HEAD@{2}: checkout: moving from feature-branch to main
    

    3. Identify the Pre-Pull Commit Locate the commit hash of your branch’s state before the pull. In the above example, it’s 4d5e6f7, which corresponds to the commit made before the git pull.

    4. Reset to the Previous Commit Use the git reset command to move your branch back to its earlier state,

    
    git reset <commit-hash>
    

    By default, it’s mixed so changes wont be removed but will be in staging.

    The next time a pull operation goes awry, don’t panic—let the reflog guide you back to safety!

    Learning Notes #58 – Command Query Responsibility Segregation – An Idea Overview

    17 January 2025 at 16:42

    Today, i came across a video on ByteMonk on Event Sourcing. In that video, they mentioned about CQRS, then i delved into that. This blog is on understanding CQRS from a high level. I am planning to dive deep into Event Driven Architecture conceptually in upcoming weekend.

    In this blog, i jot down notes for basic understanding of CQRS.

    In the world of software development, there are countless patterns and practices aimed at solving specific problems. One such pattern is CQRS, short for Command Query Responsibility Segregation. While it might sound complex (it did for me), the idea is quite straightforward when broken down into simple terms.

    What is CQRS?

    Imagine you run a small bookstore. Customers interact with your store in two main ways

    1. They buy books.
    2. They ask for information about books.

    These two activities buying (command) and asking (querying) are fundamentally different. Buying a book changes something in your store (your inventory decreases), whereas asking for information doesn’t change anything; it just retrieves details.

    CQRS applies the same principle to software. It separates the operations that change data (called commands) from those that read data (called queries). This separation brings clarity and efficiency (not sure yet 🙂 )

    In simpler terms,

    • Commands are actions like “Add this book to the inventory” or “Update the price of this book.” These modify the state of your system.

    • Queries are questions like “How many books are in stock?” or “What’s the price of this book?” These fetch data but don’t alter it.

    By keeping these two types of operations separate, you make your system easier to manage and scale.

    Why Should You Care About CQRS?

    Let’s revisit our bookstore analogy. Imagine if every time someone asked for information about a book, your staff had to dig through boxes in the storage room. It would be slow and inefficient!

    Instead, you might keep a catalog at the front desk that’s easy to browse.

    In software, this means that,

    • Better Performance: By separating commands and queries, you can optimize them individually. For instance, you can have a simple, fast database for queries and a robust, detailed database for commands.

    • Simpler Code: Each part of your system does one thing, making it easier to understand and maintain.

    • Flexibility: You can scale the command and query sides independently. If you get a lot of read requests but fewer writes, you can optimize the query side without touching the command side.

    CQRS in Action

    Let’s say you’re building an app for managing a library. Here’s how CQRS might look,

    • Command: A librarian adds a new book to the catalog or updates the details of an existing book.

    • Query: A user searches for books by title or checks the availability of a specific book.

    The app could use one database to handle commands (storing all the book details and history) and another optimized database to handle queries (focused on quickly retrieving book information).

    Does CQRS Always Make Sense?

    As of now, its making items complicated for small applications. As usual every pattern is devised for their niche problems. Single Bolt can go through all Nuts.

    In upcoming blogs, let’s learn more on CQRS.

    Learning Notes #57 – Partial Indexing in Postgres

    16 January 2025 at 14:36

    Today, i learnt about partial indexing in postgres, how its optimizes the indexing process to filter subset of table more efficiently. In this blog, i jot down notes on partial indexing.

    Partial indexing in PostgreSQL is a powerful feature that provides a way to optimize database performance by creating indexes that apply only to a subset of a table’s rows. This selective indexing can result in reduced storage space, faster index maintenance, and improved query performance, especially when queries frequently involve filters or conditions that only target a portion of the data.

    An index in PostgreSQL, like in other relational database management systems, is a data structure that improves the speed of data retrieval operations. However, creating an index on an entire table can sometimes be inefficient, especially when dealing with very large datasets where queries often focus on specific subsets of the data. This is where partial indexing becomes invaluable.

    Unlike a standard index that covers every row in a table, a partial index only includes rows that satisfy a specified condition. This condition is defined using a WHERE clause when the index is created.

    To understand the mechanics, let us consider a practical example.

    Suppose you have a table named orders that stores details about customer orders, including columns like order_id, customer_id, order_date, status, and total_amount. If the majority of your queries focus on pending orders those where the status is pending, creating a partial index specifically for these rows can significantly improve performance.

    Example 1:

    Here’s how you can create such an index,

    CREATE INDEX idx_pending_orders
    ON orders (order_date)
    WHERE status = 'pending';
    

    In this example, the index idx_pending_orders includes only the rows where status equals pending. This means that any query that involves filtering by status = 'pending' and utilizes the order_date column will leverage this index. For instance, the following query would benefit from the partial index,

    SELECT *
    FROM orders
    WHERE status = 'pending'
    AND order_date > '2025-01-01';
    

    The benefits of this approach are significant. By indexing only the rows with status = 'pending', the size of the index is much smaller compared to a full table index.

    This reduction in size not only saves disk space but also speeds up the process of scanning the index, as there are fewer entries to traverse. Furthermore, updates or modifications to rows that do not meet the WHERE condition are excluded from index maintenance, thereby reducing the overhead of maintaining the index and improving performance for write operations.

    Example 2:

    Let us explore another example. Suppose your application frequently queries orders that exceed a certain total amount. You can create a partial index tailored to this use case,

    CREATE INDEX idx_high_value_orders
    ON orders (customer_id)
    WHERE total_amount > 1000;
    

    This index would optimize queries like the following,

    SELECT *
    FROM orders
    WHERE total_amount > 1000
    AND customer_id = 123;
    

    The key advantage here is that the index only includes rows where total_amount > 1000. For datasets with a wide range of order amounts, this can dramatically reduce the number of indexed entries. Queries that filter by high-value orders become faster because the database does not need to sift through irrelevant rows.

    Additionally, as with the previous example, index maintenance is limited to the subset of rows matching the condition, improving overall performance for insertions and updates.

    Partial indexes are also useful for enforcing constraints in a selective manner. Consider a scenario where you want to ensure that no two active promotions exist for the same product. You can achieve this using a unique partial index

    CREATE UNIQUE INDEX idx_unique_active_promotion
    ON promotions (product_id)
    WHERE is_active = true;
    

    This index guarantees that only one row with is_active = true can exist for each product_id.

    In conclusion, partial indexing in PostgreSQL offers a flexible and efficient way to optimize database performance by targeting specific subsets of data.

    Learning Notes #56 – Push vs Pull Architecture

    15 January 2025 at 16:16

    Today, i learnt about push vs pull architecture, the choice between push and pull architectures can significantly influence system performance, scalability, and user experience. Both approaches have their unique advantages and trade-offs. Understanding these architectures and their ideal use cases can help developers and architects make informed decisions.

    What is Push Architecture?

    Push architecture is a communication pattern where the server actively sends data to clients as soon as it becomes available. This approach eliminates the need for clients to repeatedly request updates.

    How it Works

    • The server maintains a connection with the client.
    • When new data is available, the server “pushes” it to the connected clients.
    • In a message queue context, producers send messages to a queue, and the queue actively delivers these messages to subscribed consumers without explicit requests.

    Examples

    • Notifications in Mobile Apps: Users receive instant updates, such as chat messages or alerts.
    • Stock Price Updates: Financial platforms use push to provide real-time market data.
    • Message Queues with Push Delivery: Systems like RabbitMQ or Kafka configured to push messages to consumers.
    • Server-Sent Events (SSE) and WebSockets: These are common implementations of push.

    Advantages

    • Low Latency: Clients receive updates instantly, improving responsiveness.
    • Reduced Redundancy: No need for clients to poll servers frequently, reducing bandwidth consumption.

    Challenges

    • Complexity: Maintaining open connections, especially for many clients, can be resource-intensive.
    • Scalability: Requires robust infrastructure to handle large-scale deployments.

    What is Pull Architecture?

    Pull architecture involves clients actively requesting data from the server. This pattern is often used when real-time updates are not critical or predictable intervals suffice.

    How it Works

    • The client periodically sends requests to the server.
    • The server responds with the requested data.
    • In a message queue context, consumers actively poll the queue to retrieve messages when ready.

    Examples

    • Web Browsing: A browser sends HTTP requests to fetch pages and resources.
    • API Data Fetching: Applications periodically query APIs to update information.
    • Message Queues with Pull Delivery: Systems like SQS or Kafka where consumers poll for messages.
    • Polling: Regularly checking a server or queue for updates.

    Advantages

    • Simpler Implementation: No need for persistent connections; standard HTTP requests or queue polling suffice.
    • Server Load Control: The server can limit the frequency of client requests to manage resources better.

    Challenges

    • Latency: Updates are only received when the client requests them, which might lead to delays.
    • Increased Bandwidth: Frequent polling can waste resources if no new data is available.

    AspectPush ArchitecturePull Architecture
    LatencyLow – Real-time updatesHigher – Dependent on polling frequency
    ComplexityHigher – Requires persistent connectionsLower – Simple request-response model
    Bandwidth EfficiencyEfficient – Updates sent only when neededLess efficient – Redundant polling possible
    ScalabilityChallenging – High client connection overheadEasier – Controlled client request intervals
    Message Queue FlowMessages actively delivered to consumersConsumers poll the queue for messages
    Use CasesReal-time applications (e.g., chat, live data)Non-critical updates (e.g., periodic reports)

    Learning Notes #55 – API Keys and Tokens

    14 January 2025 at 05:27

    Tokens and API keys are foundational tools that ensure secure communication between systems. They enable authentication, authorization, and access control, facilitating secure data exchange.

    What Are Tokens?

    Tokens are digital objects that represent a specific set of permissions or claims. They are often used in authentication and authorization processes to verify a user’s identity or grant access to resources. Tokens can be time-bound and carry information like:

    1. User Identity: Information about the user or system initiating the request.
    2. Scope of Access: Details about what actions or resources the token permits.
    3. Validity Period: Start and expiry times for the token.

    Common Types of Tokens:

    • JWT (JSON Web Tokens): Compact, URL-safe tokens containing a payload, signature, and header.
    • Opaque Tokens: Tokens without embedded information; they require validation against a server.
    • Refresh Tokens: Used to obtain a new access token when the current one expires.

    What Are API Keys?

    API keys are unique identifiers used to authenticate applications or systems accessing APIs. They are simple to use and act as a credential to allow systems to make authorized API calls.

    Key Characteristics:

    • Static Credential: Unlike tokens, API keys do not typically expire unless explicitly revoked.
    • Simple to Use: They are easy to implement and often passed in headers or query parameters.
    • Application-Specific: Keys are tied to specific applications rather than user accounts.

    Functionalities and Usage

    Both tokens and API keys enable secure interaction between systems, but their application depends on the scenario

    1. Authentication

    • Tokens: Often used for user authentication in web apps and APIs.
      • Example: A JWT issued after login is included in subsequent API requests to validate the user’s session.
    • API Keys: Authenticate applications rather than users.
      • Example: A weather app uses an API key to fetch data from a weather API.

    2. Authorization

    • Tokens: Define user-specific permissions and roles.
      • Example: A token allows read-only access to specific resources for a particular user.
    • API Keys: Grant access to predefined resources for the application.
      • Example: An API key allows access to public datasets but restricts write operations.

    3. Rate Limiting and Monitoring

    Both tokens and API keys can be used to

    • Enforce usage limits.
    • Monitor and log API usage for analytics and security.

    Considerations for Secure Implementation

    1. For Tokens

    • Use HTTPS: Always transmit tokens over HTTPS to prevent interception.
    • Implement Expiry: Set reasonable expiry times to minimize risks.
    • Adopt Refresh Tokens: Allow users to obtain new tokens securely when access tokens expire.
    • Validate Signatures: For JWTs, validate the signature to ensure the token’s integrity.

    2. For API Keys

    • Restrict IP Usage: Limit the key’s use to specific IPs or networks.
    • Set Permissions: Assign the minimum required permissions for the API key.
    • Regenerate Periodically: Refresh keys periodically to mitigate risks.
    • Monitor Usage: Track API key usage for anomalies and revoke compromised keys promptly.

    3. For Both

    • Avoid Hardcoding: Never embed tokens or keys in source code. Use environment variables or secure vaults.
    • Audit and Rotate: Regularly audit and rotate keys and tokens to maintain security.
    • Educate Users: Ensure users and developers understand secure handling practices.

    Learning Notes #54 – Architecture Decision Records

    14 January 2025 at 02:35

    Last few days, i was learning on how to make a accountable decision on deciding technical stuffs. Then i came across ADR. So far i haven’t used or seen used by our team. I think this is a necessary step to be incorporated to make accountable decisions. In this blog i share details on ADR for my future reference.

    What is an ADR?

    An Architectural Decision Record (ADR) is a concise document that captures a single architectural decision, its context, the reasoning behind it, and its consequences. ADRs help teams document, share, and revisit architectural choices, ensuring transparency and better collaboration.

    Why Use ADRs?

    1. Documentation: ADRs serve as a historical record of why certain decisions were made.
    2. Collaboration: They promote better understanding across teams.
    3. Traceability: ADRs link architectural decisions to specific project requirements and constraints.
    4. Accountability: They clarify who made a decision and when.
    5. Change Management: ADRs help evaluate the impact of changes and facilitate discussions around reversals or updates.

    ADR Structure

    A typical ADR document follows a standard format. Here’s an example:

    1. Title: A clear and concise title describing the decision.
    2. Context: Background information explaining the problem or opportunity.
    3. Decision: A summary of the chosen solution.
    4. Consequences: The positive and negative outcomes of the decision.
    5. Status: Indicates whether the decision is proposed, accepted, superseded, or deprecated.

    Example:

    Optimistic locking on MongoDB https://docs.google.com/document/d/1olCbicQeQzYpCxB0ejPDtnri9rWb2Qhs9_JZuvANAxM/edit?usp=sharing

    References

    1. https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions
    2. https://www.infoq.com/podcasts/architecture-advice-process/
    3. Recommended: https://github.com/joelparkerhenderson/architecture-decision-record/tree/main

    Learning Notes #53 – The Expiration Time Can Be Unexpectedly Lost While Using Redis SET EX

    12 January 2025 at 09:14

    Redis, a high-performance in-memory key-value store, is widely used for caching, session management, and various other scenarios where fast data retrieval is essential. One of its key features is the ability to set expiration times for keys. However, when using the SET command with the EX option, developers might encounter unexpected behaviors where the expiration time is seemingly lost. Let’s explore this issue in detail.

    Understanding SET with EX

    The Redis SET command with the EX option allows you to set a key’s value and specify its expiration time in seconds. For instance

    
    SET key value EX 60
    

    This command sets the key key to the value value and sets an expiration time of 60 seconds.

    The Problem

    In certain cases, the expiration time might be unexpectedly lost. This typically happens when subsequent operations overwrite the key without specifying a new expiration. For example,

    
    SET key value1 EX 60
    SET key value2
    

    In the above sequence,

    1. The first SET command assigns a value to key and sets an expiration of 60 seconds.
    2. The second SET command overwrites the value of key but does not include an expiration time, resulting in the key persisting indefinitely.

    This behavior can lead to subtle bugs, especially in applications that rely on key expiration for correctness or resource management.

    Why Does This Happen?

    The Redis SET command is designed to replace the entire state of a key, including its expiration. When you use SET without the EX, PX, or EXAT options, the expiration is removed, and the key becomes persistent. This behavior aligns with the principle that SET is a complete update operation.

    When using Redis SET with EX, be mindful of operations that might overwrite keys without reapplying expiration. Understanding Redis’s behavior and implementing robust patterns can save you from unexpected issues, ensuring your application remains efficient and reliable.

    Learning Notes #52 – Hybrid Origin Failover Pattern

    12 January 2025 at 06:29

    Today, i learnt about failover patterns from AWS https://aws.amazon.com/blogs/networking-and-content-delivery/three-advanced-design-patterns-for-high-available-applications-using-amazon-cloudfront/ . In this blog i jot down my understanding on this pattern for future reference,

    Hybrid origin failover is a strategy that combines two distinct approaches to handle origin failures effectively, balancing speed and resilience.

    The Need for Origin Failover

    When an application’s primary origin server becomes unavailable, the ability to reroute traffic to a secondary origin ensures continuity. The failover process determines how quickly and effectively this switch happens. Broadly, there are two approaches to implement origin failover:

    1. Stateful Failover with DNS-based Routing
    2. Stateless Failover with Application Logic

    Each has its strengths and limitations, which the hybrid approach aims to mitigate.

    Stateful Failover

    Stateful failover is a system that allows a standby server to take over for a failed server and continue active sessions. It’s used to create a resilient network infrastructure and avoid service interruptions.

    This method relies on a DNS service with health checks to detect when the primary origin is unavailable. Here’s how it works,

    1. Health Checks: The DNS service continuously monitors the health of the primary origin using health checks (e.g., HTTP, HTTPS).
    2. DNS Failover: When the primary origin is marked unhealthy, the DNS service resolves the origin’s domain name to the secondary origin’s IP address.
    3. TTL Impact: The failover process honors the DNS Time-to-Live (TTL) settings. A low TTL ensures faster propagation, but even in the most optimal configurations, this process introduces a delay—often around 60 to 70 seconds.
    4. Stateful Behavior: Once failover occurs, all traffic is routed to the secondary origin until the primary origin is marked healthy again.

    Implementation from AWS (as-is from aws blog)

    The first approach is using Amazon Route 53 Failover routing policy with health checks on the origin domain name that’s configured as the origin in CloudFront. When the primary origin becomes unhealthy, Route 53 detects it, and then starts resolving the origin domain name with the IP address of the secondary origin. CloudFront honors the origin DNS TTL, which means that traffic will start flowing to the secondary origin within the DNS TTLs. The most optimal configuration (Fast Check activated, a failover threshold of 1, and 60 second DNS TTL) means that the failover will take 70 seconds at minimum to occur. When it does, all of the traffic is switched to the secondary origin, since it’s a stateful failover. Note that this design can be further extended with Route 53 Application Recovery Control for more sophisticated application failover across multiple AWS Regions, Availability Zones, and on-premises.

    The second approach is using origin failover, a native feature of CloudFront. This capability of CloudFront tries for the primary origin of every request, and if a configured 4xx or 5xx error is received, then CloudFront attempts a retry with the secondary origin. This approach is simple to configure and provides immediate failover. However, it’s stateless, which means every request must fail independently, thus introducing latency to failed requests. For transient origin issues, this additional latency is an acceptable tradeoff with the speed of failover, but it’s not ideal when the origin is completely out of service. Finally, this approach only works for the GET/HEAD/OPTIONS HTTP methods, because other HTTP methods are not allowed on a CloudFront cache behavior with Origin Failover enabled.

    Advantages

    • Works for all HTTP methods and request types.
    • Ensures complete switchover, minimizing ongoing failures.

    Disadvantages

    • Relatively slower failover due to DNS propagation time.
    • Requires a reliable health-check mechanism.

    Approach 2: Stateless Failover with Application Logic

    This method handles failover at the application level. If a request to the primary origin fails (e.g., due to a 4xx or 5xx HTTP response), the application or CDN immediately retries the request with the secondary origin.

    How It Works

    1. Primary Request: The application sends a request to the primary origin.
    2. Failure Handling: If the response indicates a failure (configurable for specific error codes), the request is retried with the secondary origin.
    3. Stateless Behavior: Each request operates independently, so failover happens on a per-request basis without waiting for a stateful switchover.

    Implementation from AWS (as-is from aws blog)

    The hybrid origin failover pattern combines both approaches to get the best of both worlds. First, you configure both of your origins with a Failover Policy in Route 53 behind a single origin domain name. Then, you configure an origin failover group with the single origin domain name as primary origin, and the secondary origin domain name as secondary origin. This means that when the primary origin becomes unavailable, requests are immediately retried with the secondary origin until the stateful failover of Route 53 kicks in within tens of seconds, after which requests go directly to the secondary origin without any latency penalty. Note that this pattern only works with the GET/HEAD/OPTIONS HTTP methods.

    Advantages

    • Near-instantaneous failover for failed requests.
    • Simple to configure and doesn’t depend on DNS TTL.

    Disadvantages

    • Adds latency for failed requests due to retries.
    • Limited to specific HTTP methods like GET, HEAD, and OPTIONS.
    • Not suitable for scenarios where the primary origin is entirely down, as every request must fail first.

    The Hybrid Origin Failover Pattern

    The hybrid origin failover pattern combines the strengths of both approaches, mitigating their individual limitations. Here’s how it works:

    1. DNS-based Stateful Failover: A DNS service with health checks monitors the primary origin and switches to the secondary origin if the primary becomes unhealthy. This ensures a complete and stateful failover within tens of seconds.
    2. Application-level Stateless Failover: Simultaneously, the application or CDN is configured to retry failed requests with a secondary origin. This provides an immediate failover mechanism for transient or initial failures.

    Implementation Steps

    1. DNS Configuration
      • Set up health checks on the primary origin.
      • Define a failover policy in the DNS service, which resolves the origin domain name to the secondary origin when the primary is unhealthy.
    2. Application Configuration
      • Configure the application or CDN to use an origin failover group.
      • Specify the primary origin domain as the primary origin and the secondary origin domain as the backup.

    Behavior

    • Initially, if the primary origin encounters issues, requests are retried immediately with the secondary origin.
    • Meanwhile, the DNS failover switches all traffic to the secondary origin within tens of seconds, eliminating retry latencies for subsequent requests.

    Benefits of Hybrid Origin Failover

    1. Faster Failover: Immediate retries for failed requests minimize initial impact, while DNS failover ensures long-term stability.
    2. Reduced Latency: After DNS failover, subsequent requests don’t experience retry delays.
    3. High Resilience: Combines stateful and stateless failover for robust redundancy.
    4. Simplicity and Scalability: Leverages existing DNS and application/CDN features without complex configurations.

    Limitations and Considerations

    1. HTTP Method Constraints: Stateless failover works only for GET, HEAD, and OPTIONS methods, limiting its use for POST or PUT requests.
    2. TTL Impact: Low TTLs reduce propagation delays but increase DNS query rates, which could lead to higher costs.
    3. Configuration Complexity: Combining DNS and application-level failover requires careful setup and testing to avoid misconfigurations.
    4. Secondary Origin Capacity: Ensure the secondary origin can handle full traffic loads during failover.

    POTD #22 – Longest substring with distinct characters | Geeks For Geeks

    11 January 2025 at 16:44

    Problem Statement

    Geeks For Geeks : https://www.geeksforgeeks.org/problems/longest-distinct-characters-in-string5848/1

    Given a string s, find the length of the longest substring with all distinct characters. 

    
    Input: s = "geeksforgeeks"
    Output: 7
    Explanation: "eksforg" is the longest substring with all distinct characters.
    

    
    Input: s = "abcdefabcbb"
    Output: 6
    Explanation: The longest substring with all distinct characters is "abcdef", which has a length of 6.
    

    My Approach – Sliding Window

    
    class Solution:
        def longestUniqueSubstr(self, s):
            # code here
            char_index = {}
            max_length = 0
            start = 0
            
            for i, char in enumerate(s):
                if char in char_index and char_index[char] >= start:
                    start = char_index[char] + 1 #crux
                
                char_index[char] = i
                
                max_length = max(max_length, i - start + 1)
            
            return max_length
                    
    

    ❌
    ❌