GitHub Actions is a powerful tool for automating workflows directly in your repository.In this blog, weβll explore how to efficiently set up GitHub Actions to handle Docker workflows with environments, secrets, and protection rules.
Why Use GitHub Actions for Docker?
My Code base is in Github and i want to tryout gh-actions to build and push images to docker hub seamlessly.
Setting Up GitHub Environments
GitHub Environments let you define settings specific to deployment stages. Hereβs how to configure them:
1. Create an Environment
Go to your GitHub repository and navigate to Settings > Environments. Click New environment, name it (e.g., production), and save.
2. Add Secrets and Variables
Inside the environment settings, click Add secret to store sensitive information like DOCKER_USERNAME and DOCKER_TOKEN.
Use Variables for non-sensitive configuration, such as the Docker image name.
3. Optional: Set Protection Rules
Enforce rules like requiring manual approval before deployments. Restrict deployments to specific branches (e.g., main).
Sample Workflow for Building and Pushing Docker Images
Below is a GitHub Actions workflow for automating the build and push of a Docker image based on a minimal Flask app.
Workflow: .github/workflows/docker-build-push.yml
name: Build and Push Docker Image
on:
push:
branches:
- main # Trigger workflow on pushes to the `main` branch
jobs:
build-and-push:
runs-on: ubuntu-latest
environment: production # Specify the environment to use
steps:
# Checkout the repository
- name: Checkout code
uses: actions/checkout@v3
# Log in to Docker Hub using environment secrets
- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
# Build the Docker image using an environment variable
- name: Build Docker image
env:
DOCKER_IMAGE_NAME: ${{ vars.DOCKER_IMAGE_NAME }}
run: |
docker build -t ${{ secrets.DOCKER_USERNAME }}/$DOCKER_IMAGE_NAME:${{ github.run_id }} .
# Push the Docker image to Docker Hub
- name: Push Docker image
env:
DOCKER_IMAGE_NAME: ${{ vars.DOCKER_IMAGE_NAME }}
run: |
docker push ${{ secrets.DOCKER_USERNAME }}/$DOCKER_IMAGE_NAME:${{ github.run_id }}
I am big fan of logs. Would like to log everything. All the request, response of an API. But is it correct ? Though logs helped our team greatly during this new year, i want to know, is there a better approach to log things. That search made this blog. In this blog i jot down notes on logging. Lets log it.
Throughout this blog, i try to generalize things. Not biased to a particular language. But here and there you can see me biased towards Python. Also this is my opinion. Not a hard rule.
Which is a best logger ?
Iβm not here to argue about which logger is the best, they all have their problems. But the worst one is usually the one you build yourself. Sure, existing loggers arenβt perfect, but trying to create your own is often a much bigger mistake.
1. Why Logging Matters
Logging provides visibility into your applicationβs behavior, helping to,
Diagnose and troubleshoot issues (This is most common usecase)
Monitor application health and performance (Metrics)
Meet compliance and auditing requirements (Audit Logs)
Enable debugging in production environments (we all do this.)
However, poorly designed logging strategies can lead to excessive log volumes, higher costs, and difficulty in pinpointing actionable insights.
2. Logging Best Practices
a. Use Structured Logs
Long story short, instead of unstructured plain text, use JSON or other structured formats. This makes parsing and querying easier, especially in log aggregation tools.
Define and adhere to appropriate logging levels to avoid log bloat:
DEBUG: Detailed information for debugging.
INFO: General operational messages.
WARNING: Indications of potential issues.
ERROR: Application errors that require immediate attention.
CRITICAL: Severe errors leading to application failure.
c. Avoid Sensitive Data
Sanitize your logs to exclude sensitive information like passwords, PII, or API keys. Instead, mask or hash such data. Donβt add token even for testing.
d. Include Contextual Information
Incorporate metadata like request IDs, user IDs, or transaction IDs to trace specific events effectively.
3. Log Ingestion at Scale
As applications scale, log ingestion can become a bottleneck. Hereβs how to manage it,
a. Centralized Logging
Stream logs to centralized systems like Elasticsearch, Logstash, Kibana (ELK), or cloud-native services like AWS CloudWatch, Azure Monitor, or Google Cloud Logging.
b. Optimize Log Volume
Log only necessary information.
Use log sampling to reduce verbosity in high-throughput systems.
Rotate logs to limit disk usage.
c. Use Asynchronous Logging
Asynchronous loggers improve application performance by delegating logging tasks to separate threads or processes. (Not Suitable all time. It has its own problems)
d. Method return values are usually important
If you have a log in the method and donβt include the return value of the method, youβre missing important information. Make an effort to include that at the expense of slightly less elegant looking code.
e. Include filename in error messages
Mention the path/to/file:line-number to pinpoint the location of the issue.
3. Logging Donβts
a. Donβt Log Everything at the Same Level
Logging all messages at the INFO or DEBUG level creates noise and makes it difficult to identify critical issues.
b. Donβt Hardcode Log Messages
Avoid static, vague, or generic log messages. Use dynamic and descriptive messages that include relevant context.
# Bad Example
Error occurred.
# Good Example
Error occurred while processing payment for user_id=12345, transaction_id=abc-6789.
c. Donβt Log Sensitive or Regulated Data
Exposing personally identifiable information (PII), passwords, or other sensitive data in logs can lead to compliance violations (e.g., GDPR, HIPAA).
d. Donβt Ignore Log Rotation
Failing to implement log rotation can result in disk space exhaustion, especially in high traffic systems (Log Retention).
e. Donβt Overlook Log Correlation
Logs without request IDs, session IDs, or contextual metadata make it difficult to correlate related events.
f. Donβt Forget to Monitor Log Costs
Logging everything without considering storage and processing costs can lead to financial inefficiency in large-scale systems.
g. Keep the log message short
Long and verbose messages are a cost. The cost is in reading time and ingestion time.
h. Never use log message in loop
This might seem obvious, but just to be clear -> logging inside a loop, even if the log level isnβt visible by default, can still hurt performance. Itβs best to avoid this whenever possible.
If you absolutely need to log something at a hidden level and decide to break this guideline, keep it short and straightforward.
i. Log item you already βhaveβ
We should avoid this,
logger.info("Reached X and value of method is {}", method());
Here, just for the logging purpose, we are calling the method() again. Even if the method is cheap. Youβre effectively running the method regardless of the respective logging levels!
j. Dont log iterables
Even if itβs a small list. The concern is that the list might grow and βovercrowdβ the log. Writing the content of the list to the log can balloon it up and slow processing noticeably. Also kills time in debugging.
k. Donβt Log What the Framework Logs for You
There are great things to log. E.g. the name of the current thread, the time, etc. But those are already written into the log by default almost everywhere. Donβt duplicate these efforts.
l.Donβt log Method Entry/Exit
Log only important events in the system. Entering or exiting a method isnβt an important event. E.g. if I have a method that enables feature X the log should be βFeature X enabledβ and not βenable_feature_X enteredβ. I have done this a lot.
m. Dont fill the method
A complex method might include multiple points of failure, so it makes sense that weβd place logs in multiple points in the method so we can detect the failure along the way. Unfortunately, this leads to duplicate logging and verbosity.
Errors will typically map to error handling code which should be logged in generically. So all error conditions should already be covered.
This creates situations where we sometimes need to change the flow/behavior of the code, so logging will be more elegant.
n. Donβt use AOP logging
AOP (Aspect-Oriented Programming) logging allows you to automatically add logs at specific points in your application, such as when methods are entered or exited.
In Python, AOP-style logging can be implemented using decorators or middleware that inject logs into specific points, such as method entry and exit. While it might seem appealing for detailed tracing, the same problems apply as in other languages like Java.
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def log_method_entry_exit(func):
def wrapper(*args, **kwargs):
logger.info(f"Entering: {func.__name__} with args={args} kwargs={kwargs}")
result = func(*args, **kwargs)
logger.info(f"Exiting: {func.__name__} with result={result}")
return result
return wrapper
# Example usage
@log_method_entry_exit
def example_function(x, y):
return x + y
example_function(5, 3)
Why Avoid AOP Logging in Python
Performance Impact:
Injecting logs into every method increases runtime overhead, especially if used extensively in large-scale systems.
In Python, where function calls already add some overhead, this can significantly affect performance.
Log Verbosity:
If this decorator is applied to every function or method in a system, it produces an enormous amount of log data.
Debugging becomes harder because the meaningful logs are lost in the noise of entry/exit logs.
Limited Usefulness:
During local development, tools like Python debuggers (pdb), profilers (cProfile, line_profiler), or tracing libraries like trace are far more effective for inspecting function behavior and performance.
CI Issues:
Enabling such verbose logging during CI test runs can make tracking test failures more difficult because the logs are flooded with entry/exit messages, obscuring the root cause of failures.
Use Python-specific tools like pdb, ipdb, or IDE-integrated debuggers to inspect code locally.
o. Dont Double log
Itβs pretty common to log an error when weβre about to throw an error. However, since most error code is generic, itβs likely thereβs a log in the generic error handling code.
4. Ensuring Scalability
To keep your logging system robust and scalable,
Monitor Log Storage: Set alerts for log storage thresholds.
Implement Compression: Compress log files to reduce storage costs.
Automate Archival and Deletion: Regularly archive old logs and purge obsolete data.
Benchmark Logging Overhead: Measure the performance impact of logging on your application.
5. Logging for Metrics
Below, is the list of items that i wish can be logged for metrics.
General API Metrics
General API Metrics on HTTP methods, status codes, latency/duration, request size.
Total requests per endpoint over time. Requests per minute/hour.
System Metrics on CPU and Memory usage during request processing (this will be auto captured).
Usage Metrics
Traffic analysis on peak usage times.
Most/Least used endpoints.
6. Mapped Diagnostic Context (MDC)
MDC is the one, i longed for most. Also went into trouble by implementing without a middleware.
Mapped Diagnostic Context (MDC) is a feature provided by many logging frameworks, such as Logback, Log4j, and SLF4J. It allows developers to attach contextual information (key-value pairs) to the logging events, which can then be automatically included in log messages.
This context helps in differentiating and correlating log messages, especially in multi-threaded applications.
Why Use MDC?
Enhanced Log Clarity: By adding contextual information like user IDs, session IDs, or transaction IDs, MDC enables logs to provide more meaningful insights.
Easier Debugging: When logs contain thread-specific context, tracing the execution path of a specific transaction or user request becomes straightforward.
Reduced Log Ambiguity: MDC ensures that logs from different threads or components do not get mixed up, avoiding confusion.
Common Use Cases
Web Applications: Logging user sessions, request IDs, or IP addresses to trace the lifecycle of a request.
Microservices: Propagating correlation IDs across services for distributed tracing.
Background Tasks: Tracking specific jobs or tasks in asynchronous operations.
Limitations (Curated from other blogs. I havent tried yet )
Thread Boundaries: MDC is thread-local, so its context does not automatically propagate across threads (e.g., in asynchronous executions). For such scenarios, you may need to manually propagate the MDC context.
Overhead: Adding and managing MDC context introduces a small runtime overhead, especially in high-throughput systems.
Configuration Dependency: Proper MDC usage often depends on correctly configuring the logging framework.