❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Learning Notes #66 – What is SBOM ? Software Bill of Materials

26 January 2025 at 09:16

Yesterday, i came to know about SBOM, from my friend Prasanth Baskar. Let’s say you’re building a website.

You decide to use a popular open-source tool to handle user logins. Here’s the catch,

  • That library uses another library to store data.
  • That tool depends on another library to handle passwords.

Now, if one of those libraries has a bug or security issue, how do you even know it’s there? In this blog, i will jot down my understanding on SBOM with Trivy.

What is SBOM ?

A Software Bill of Materials (SBOM) is a list of everything that makes up a piece of software.

Think of it as,

  • A shopping list for all the tools, libraries, and pieces used to build the software.
  • A recipe card showing what’s inside and how it’s structured.

For software, this means,

  • Components: These are the β€œingredients,” such as open-source libraries, frameworks, and tools.
  • Versions: Just like you might want to know if the cake uses almond flour or regular flour, knowing the version of a software component matters.
  • Licenses: Did the baker follow the rules for the ingredients they used? Software components also come with licenses that dictate how they can be used.

So How come its Important ?

1. Understanding What You’re Using

When you download or use software, especially something complex, you often don’t know what’s inside. An SBOM helps you understand what components are being used are they secure? Are they trustworthy?

2. Finding Problems Faster

If someone discovers that a specific ingredient is badβ€”like flour with bacteria in itβ€”you’d want to know if that’s in your cake. Similarly, if a software library has a security issue, an SBOM helps you figure out if your software is affected and needs fixing.

For example,

When the Log4j vulnerability made headlines, companies that had SBOMs could quickly identify whether they used Log4j and take action.

3. Building Trust

Imagine buying food without a label or list of ingredients.

You’d feel doubtful, right ? Similarly, an SBOM builds trust by showing users exactly what’s in the software they’re using.

4. Avoiding Legal Trouble

Some software components come with specific rules or licenses about how they can be used. An SBOM ensures these rules are followed, avoiding potential legal headaches.

How to Create an SBOM?

For many developers, creating an SBOM manually would be impossible because modern software can have hundreds (or even thousands!) of components.

Thankfully, there are tools that automatically create SBOMs. Examples include,

  • Trivy: A lightweight tool to generate SBOMs and find vulnerabilities.
  • CycloneDX: A popular SBOM format supported by many tools https://cyclonedx.org/
  • SPDX: Another format designed to make sharing SBOMs easier https://spdx.dev/

These tools can scan your software and automatically list out every component, its version, and its dependencies.

We will see example on generating a SBOM file for nginx using trivy.

How Trivy Works ?

On running trivy scan,

1. It downloads Trivy DB including vulnerability information.

2. Pull Missing layers in cache.

3. Analyze layers and stores information in cache.

4. Detect security issues and write to SBOM file.

Note: a CVE refers to a Common Vulnerabilities and Exposures identifier. A CVE is a unique code used to catalog and track publicly known security vulnerabilities and exposures in software or systems.

How to Generate SBOMs with Trivy

Step 1: Install Trivy in Ubuntu

sudo apt-get install wget gnupg
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy

More on Installation: https://github.com/aquasecurity/trivy/blob/main/docs/getting-started/installation.md

Step 2: Generate an SBOM

Trivy allows you to create SBOMs in formats like CycloneDX or SPDX.

trivy image --format cyclonedx --output sbom.json nginx:latest

It generates the SBOM file.

It can be incorporated into Github CI/CD.

Learning Notes #52 – Hybrid Origin Failover Pattern

12 January 2025 at 06:29

Today, i learnt about failover patterns from AWS https://aws.amazon.com/blogs/networking-and-content-delivery/three-advanced-design-patterns-for-high-available-applications-using-amazon-cloudfront/ . In this blog i jot down my understanding on this pattern for future reference,

Hybrid origin failover is a strategy that combines two distinct approaches to handle origin failures effectively, balancing speed and resilience.

The Need for Origin Failover

When an application’s primary origin server becomes unavailable, the ability to reroute traffic to a secondary origin ensures continuity. The failover process determines how quickly and effectively this switch happens. Broadly, there are two approaches to implement origin failover:

  1. Stateful Failover with DNS-based Routing
  2. Stateless Failover with Application Logic

Each has its strengths and limitations, which the hybrid approach aims to mitigate.

Stateful Failover

Stateful failover is a system that allows a standby server to take over for a failed server and continue active sessions. It’s used to create a resilient network infrastructure and avoid service interruptions.

This method relies on a DNS service with health checks to detect when the primary origin is unavailable. Here’s how it works,

  1. Health Checks: The DNS service continuously monitors the health of the primary origin using health checks (e.g., HTTP, HTTPS).
  2. DNS Failover: When the primary origin is marked unhealthy, the DNS service resolves the origin’s domain name to the secondary origin’s IP address.
  3. TTL Impact: The failover process honors the DNS Time-to-Live (TTL) settings. A low TTL ensures faster propagation, but even in the most optimal configurations, this process introduces a delayβ€”often around 60 to 70 seconds.
  4. Stateful Behavior: Once failover occurs, all traffic is routed to the secondary origin until the primary origin is marked healthy again.

Implementation from AWS (as-is from aws blog)

The first approach is usingΒ Amazon Route 53 Failover routing policy with health checks on the origin domain name that’s configured as the origin in CloudFront. When the primary origin becomes unhealthy, Route 53 detects it, and then starts resolving the origin domain name with the IP address of the secondary origin. CloudFront honors the origin DNS TTL, which means that traffic will start flowing to the secondary origin within the DNS TTLs.Β The most optimal configuration (Fast Check activated, a failover threshold of 1, and 60 second DNS TTL) means that the failover will take 70 seconds at minimum to occur. When it does, all of the traffic is switched to the secondary origin, since it’s a stateful failover. Note that this design can be further extended with Route 53 Application Recovery Control for more sophisticated application failover across multiple AWS Regions, Availability Zones, and on-premises.

The second approach is using origin failover, a native feature of CloudFront. This capability of CloudFront tries for the primary origin of every request, and if a configured 4xx or 5xx error is received, then CloudFront attempts a retry with the secondary origin. This approach is simple to configure and provides immediate failover. However, it’s stateless, which means every request must fail independently, thus introducing latency to failed requests. For transient origin issues, this additional latency is an acceptable tradeoff with the speed of failover, but it’s not ideal when the origin is completely out of service. Finally, this approach only works for the GET/HEAD/OPTIONS HTTP methods, because other HTTP methods are not allowed on a CloudFront cache behavior with Origin Failover enabled.

Advantages

  • Works for all HTTP methods and request types.
  • Ensures complete switchover, minimizing ongoing failures.

Disadvantages

  • Relatively slower failover due to DNS propagation time.
  • Requires a reliable health-check mechanism.

Approach 2: Stateless Failover with Application Logic

This method handles failover at the application level. If a request to the primary origin fails (e.g., due to a 4xx or 5xx HTTP response), the application or CDN immediately retries the request with the secondary origin.

How It Works

  1. Primary Request: The application sends a request to the primary origin.
  2. Failure Handling: If the response indicates a failure (configurable for specific error codes), the request is retried with the secondary origin.
  3. Stateless Behavior: Each request operates independently, so failover happens on a per-request basis without waiting for a stateful switchover.

Implementation from AWS (as-is from aws blog)

The hybrid origin failover pattern combines both approaches to get the best of both worlds. First, you configure both of your origins with a Failover Policy in Route 53 behind a single origin domain name. Then, you configure an origin failover group with the single origin domain name as primary origin, and the secondary origin domain name as secondary origin. This means that when the primary origin becomes unavailable, requests are immediately retried with the secondary origin until the stateful failover of Route 53 kicks in within tens of seconds, after which requests go directly to the secondary origin without any latency penalty. Note that this pattern only works with the GET/HEAD/OPTIONS HTTP methods.

Advantages

  • Near-instantaneous failover for failed requests.
  • Simple to configure and doesn’t depend on DNS TTL.

Disadvantages

  • Adds latency for failed requests due to retries.
  • Limited to specific HTTP methods like GET, HEAD, and OPTIONS.
  • Not suitable for scenarios where the primary origin is entirely down, as every request must fail first.

The Hybrid Origin Failover Pattern

The hybrid origin failover pattern combines the strengths of both approaches, mitigating their individual limitations. Here’s how it works:

  1. DNS-based Stateful Failover: A DNS service with health checks monitors the primary origin and switches to the secondary origin if the primary becomes unhealthy. This ensures a complete and stateful failover within tens of seconds.
  2. Application-level Stateless Failover: Simultaneously, the application or CDN is configured to retry failed requests with a secondary origin. This provides an immediate failover mechanism for transient or initial failures.

Implementation Steps

  1. DNS Configuration
    • Set up health checks on the primary origin.
    • Define a failover policy in the DNS service, which resolves the origin domain name to the secondary origin when the primary is unhealthy.
  2. Application Configuration
    • Configure the application or CDN to use an origin failover group.
    • Specify the primary origin domain as the primary origin and the secondary origin domain as the backup.

Behavior

  • Initially, if the primary origin encounters issues, requests are retried immediately with the secondary origin.
  • Meanwhile, the DNS failover switches all traffic to the secondary origin within tens of seconds, eliminating retry latencies for subsequent requests.

Benefits of Hybrid Origin Failover

  1. Faster Failover: Immediate retries for failed requests minimize initial impact, while DNS failover ensures long-term stability.
  2. Reduced Latency: After DNS failover, subsequent requests don’t experience retry delays.
  3. High Resilience: Combines stateful and stateless failover for robust redundancy.
  4. Simplicity and Scalability: Leverages existing DNS and application/CDN features without complex configurations.

Limitations and Considerations

  1. HTTP Method Constraints: Stateless failover works only for GET, HEAD, and OPTIONS methods, limiting its use for POST or PUT requests.
  2. TTL Impact: Low TTLs reduce propagation delays but increase DNS query rates, which could lead to higher costs.
  3. Configuration Complexity: Combining DNS and application-level failover requires careful setup and testing to avoid misconfigurations.
  4. Secondary Origin Capacity: Ensure the secondary origin can handle full traffic loads during failover.

Learning Notes #25 – Valet Key Pattern | Cloud Patterns

1 January 2025 at 17:20

Today, I learnt about Valet Key Pattern, which helps clients to directly access the resources without the server using a token. In this blog, i jot down notes on valet key pattern for better understanding.

The Valet Key Pattern is a security design pattern used to provide limited access to a resource or service without exposing full access credentials or permissions. It is akin to a physical valet key for a car, which allows the valet to drive the car without accessing the trunk or glove box. This pattern is widely employed in distributed systems, cloud services, and API design to ensure secure and controlled resource sharing.

Why Use the Valet Key Pattern?

Modern systems often require sharing access to specific resources while minimizing security risks. For instance:

  • A mobile app needs to upload files to a storage bucket but shouldn’t manage the entire bucket.
  • A third-party service requires temporary access to a user’s resource, such as a document or media file.
  • A system needs to allow time-bound or operation-restricted access to sensitive data.

In these scenarios, the Valet Key Pattern provides a practical solution by issuing a scoped, temporary, and revocable token (valet key) that grants specific permissions.

Core Principles of the Valet Key Pattern

  1. Scoped Access: The valet key grants access only to specific resources or operations.
  2. Time-Limited: The access token is typically valid for a limited duration to minimize exposure.
  3. Revocable: The issuing entity can revoke the token if necessary.
  4. Minimal Permissions: Permissions are restricted to the least privilege required to perform the intended task.

How the Valet Key Pattern Works

1. Resource Owner Issues a Valet Key

The resource owner (or controlling entity) generates a token with limited permissions. This token is often a signed JSON Web Token (JWT) or a pre-signed URL in the case of cloud storage.

2. Token Delivery to the Client

The token is securely delivered to the client or third-party application requiring access. For instance, the token might be sent via HTTPS or embedded in an API response.

3. Client Uses the Valet Key

The client includes the token in subsequent requests to access the resource. The resource server validates the token, checks its permissions, and allows or denies the requested operation accordingly.

4. Expiry or Revocation

Once the token expires or is revoked, it becomes invalid, ensuring the client can no longer access the resource.

Examples of the Valet Key Pattern in Action

1. Cloud Storage (Pre-signed URLs)

Amazon S3, Google Cloud Storage, and Azure Blob Storage allow generating pre-signed URLs that enable temporary, scoped access to specific files. For example, a user can upload a file using a URL valid for 15 minutes without needing direct access credentials.

2. API Design

APIs often issue temporary access tokens for limited operations. OAuth 2.0 tokens, for instance, can be scoped to allow access to specific endpoints or resources.

3. Media Sharing Platforms

Platforms like YouTube or Dropbox use the Valet Key Pattern to provide limited access to files. A shareable link often embeds permissions and expiration details.

Implementation Steps

1. Define Permissions Scope

Identify the specific operations or resources the token should allow. Use the principle of least privilege to limit permissions.

2. Generate Secure Tokens

Create tokens with cryptographic signing to ensure authenticity. Include metadata such as:

  • Resource identifiers
  • Permissions
  • Expiry time
  • Issuer information

3. Validate Tokens

The resource server must validate incoming tokens by checking the signature, expiration, and permissions.

4. Monitor and Revoke

Maintain a mechanism to monitor token usage and revoke them if misuse is detected.

Best Practices

  1. Use HTTPS: Always transmit tokens over secure channels to prevent interception.
  2. Minimize Token Lifetime: Short-lived tokens reduce the risk of misuse.
  3. Implement Auditing: Log token usage for monitoring and troubleshooting.
  4. Employ Secure Signing: Use robust cryptographic algorithms to sign tokens and prevent tampering.

Challenges

  • Token Management: Requires robust infrastructure for token generation, validation, and revocation.
  • Revocation Delays: Invalidation mechanisms may not instantly propagate in distributed systems.

Learning Notes #6 Bloom Filters – A Probabilistic Data Structure

23 December 2024 at 14:24

I have came across reading Bloom Filters when i wanted to implement username check likewise in instagram. Today i came back to refresh on bloom filters and note it for my future self.

What is a Bloom Filter ?

A Bloom filter is a space-efficient, probabilistic data structure designed to test whether an element is part of a set. It can return two types of results

  • True: The element is probably in the set.
  • False: The element is definitely not in the set.

Notably, Bloom filters do not store the actual elements themselves, and there is a chance of false positives, but never false negatives.

If it says, the given word is not present then we can be 100% sure about it. This is the benefit we are getting out of Bloom Filters.

But setting up a bloom filter is not an easy task. You will soon get to know.

How Does a Bloom Filter Work?

A Bloom filter uses a bit array of size and independent hash functions. Here’s how it operates,

  1. Adding an Element
    • Compute the hash values for the element for each hash functions.
    • Map these hash values to positions in the bit array.
    • Set the corresponding bits to 1.
  2. Querying an Element
    • Compute the hash values for the element for each hash functions.
    • Check the corresponding bits in the bit array.
    • If all bits are 1, the element is probably in the set. If any bit is 0, the element is definitely not in the set.

As you can imagine, when we are continously adding element to array (considering the array size is smaller), then the percentage of false positives will increase. On the other hand choosing the correct numbers of hash functions also matters.

Setting Parameters

To effectively use a Bloom filter, it’s important to set the parameters appropriately

  1. Bit Array Size (m):
    • The size of the bit array determines the capacity and accuracy of the filter.
    • A larger m reduces the false positive rate but requires more memory.
  2. Number of Hash Functions (k):
    • The number of hash functions affects the distribution of bits set to 1.
    • An optimal k minimizes the false positive rate for a given m and number of elements (n).
  3. Number of Elements (n):
    • Estimate the number of elements to be stored to configure m and k appropriately.

Someone derived a formula

Bit Array Size

The false positive rate represents the probability that a non-existing element is incorrectly identified as present in the Bloom filter. It depends on the size of the bit array (m), the number of hash functions (k), and the number of elements inserted (n). To achieve a desired false positive rate, we can calculate the optimal bit array size using the formula

Here, p denotes the desired false positive rate.

Optimal Number of Hash Functions

The optimal number of hash functions (k) is determined by the size of the bit array and the number of elements to be inserted. It can be calculated using the formula

This ensures an equal distribution of hash values across the bit array, minimizing collisions and maximizing the accuracy of the filter.

Probability of False Positives

The probability of false positives (P_fp) is influenced by the number of hash functions (k), the bit array size (m), and the number of elements inserted (n). It can be estimated using the formula.

Putting all together (Python Code)

Setting the fpr (false positive rate) to 0.1 %, let’s calculate bit array size, no. of hash functions.


import math

# Expected number of items in the collection
n = 300_000

# Acceptable false-positive rate (0.01 = 1%)
fpr = 0.01

# Optimal size (number of elements in the bit array)
# m = -((n * ln(p)) / (ln(2)^2))
m = -(n * math.log(fpr)) / (math.log(2) ** 2)

# Optimal number of hash functions
# k = (m / n) * ln(2)
k = (m / n) * math.log(2)

print(f"Optimal Bloom filter size: {math.ceil(m)} bits")
print(f"Optimal number of hash functions: {math.ceil(k)}")

Practical Considerations

  • Hash Functions:
    • Choose independent and uniformly distributed hash functions to minimize collisions.
    • Common choices include MurmurHash and FNV.
  • Performance:
    • More hash functions increase computational cost but can reduce the false positive rate.
    • Balance the number of hash functions to achieve acceptable performance.
  • Capacity Planning:
    • Overestimating n leads to wasted space; underestimating increases the false positive rate.
    • Plan for future growth to maintain efficiency.

Online Calculator : https://hur.st/bloomfilter/?utm_source=parottasalna.com

References

  1. https://ayushgupta2959.medium.com/understanding-bloom-filters-part-4-storage-requirements-and-false-positive-probabilities-9ec003bf4af
  2. https://stackoverflow.com/questions/658439/how-many-hash-functions-does-my-bloom-filter-need
  3. https://systemdesign.one/bloom-filters-explained/
  4. https://llimllib.github.io/bloomfilter-tutorial/

Kanchilug Monthly Meet – Oct ,13 2024

By: Hariharan
13 October 2024 at 16:30

Lightning Talk by Krishna

Topic : Desktop Customization : picom, pywal

Resource : Krishna S

SSD File System Comparison : https://www.linux.org/threads/comparison-of-file-systems-for-an-ssd.28780/

https://etherpad.wikimedia.org/p/KanchiLUGMMOct13

Notes on Choosing HDD to SSD Linux Installation migration

Chat GPT Prompt : how to backup current linux installation configuration files and packages list as a archive and apply on a new installation

Result

Backing up your current Linux installation’s configuration files and package lists is a great way to ensure you can easily restore or replicate your setup on a new installation. Here’s how you can do it:

Step 1: Backup Configuration Files

  1. Identify Configuration Files: Most configuration files are located in /etc. You may also have application-specific configurations in your home directory (like .bashrc, .vimrc, etc.).
  2. Create an Archive:
    You can create a compressed archive (like a tarball) of your configuration files. Run the following command:
   sudo tar -czvf config_backup.tar.gz /etc /home/your_username/.bashrc /home/your_username/.vimrc

Adjust the paths based on what configuration files you want to back up.

Step 2: Backup Installed Packages List

The method to back up the installed package list depends on your Linux distribution.

For Debian-based systems (like Ubuntu):

  1. Backup Installed Packages:
   dpkg --get-selections > package_list.txt
  1. Backup Sources List:
   sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup

For Red Hat-based systems (like Fedora, CentOS):

  1. Backup Installed Packages:
   rpm -qa > package_list.txt
  1. Backup Repositories:
   sudo cp -r /etc/yum.repos.d/ /path/to/backup/repo_backup/

Step 3: Create a Complete Backup Archive

Combine everything into one archive:

tar -czvf full_backup.tar.gz config_backup.tar.gz package_list.txt /etc/apt/sources.list.backup

Step 4: Restore on a New Installation

  1. Transfer the Backup: Copy the full_backup.tar.gz to your new installation.
  2. Extract the Backup:
   tar -xzvf full_backup.tar.gz
  1. Restore Configuration Files:
   sudo tar -xzvf config_backup.tar.gz -C /
  1. Restore Package List: For Debian-based systems:
   sudo dpkg --set-selections < package_list.txt
   sudo apt-get dselect-upgrade

#### For Red Hat-based systems:
You can reinstall packages from the list:

   xargs -a package_list.txt sudo dnf install -y
  1. Restore Sources List (Debian-based):
   sudo cp /etc/apt/sources.list.backup /etc/apt/sources.list
   sudo apt update

Step 5: Finalize and Reboot

After restoring everything, it’s a good idea to reboot your system to ensure all configurations take effect.

Notes:

  • Make sure to review any new configurations or installed packages after the restoration, as dependencies and versions may vary.
  • For user-specific configurations, you may want to back up the entire home directory or specific application directories.

This process should help you smoothly transition to a new Linux installation with your desired configurations and packages!

Chat GPT Prompt : I am in a plan of migrating linux os to a new file system which is xfs or zfs brtfs which on is good

Choosing between XFS, ZFS, and Btrfs depends on your specific needs and use cases. Here’s a breakdown of each file system’s strengths and considerations to help you decide:

XFS

Pros:

  • Performance: Excellent performance, especially for large files and high-performance workloads.
  • Scalability: Designed for scalability, making it suitable for high-capacity storage systems.
  • Mature: A well-established file system with a long history of stability.

Cons:

  • Features: Lacks some advanced features like snapshots and built-in RAID compared to ZFS and Btrfs.
  • Management: Does not have as many management tools as ZFS.

ZFS

Pros:

  • Data Integrity: Strong focus on data integrity with built-in checksumming.
  • Snapshots and Clones: Supports efficient snapshots and clones, allowing for easy backups and recovery.
  • Compression: Offers built-in compression options, which can save space.
  • RAID: Supports various RAID levels natively (ZRAID), enhancing redundancy.

Cons:

  • Memory Usage: Can require more RAM for optimal performance.
  • Complexity: More complex to set up and manage compared to traditional file systems.

Btrfs

Pros:

  • Snapshots and Subvolumes: Supports snapshots and subvolumes, making it flexible for backups and system management.
  • Built-in RAID: Offers support for multiple RAID levels.
  • Compression and Deduplication: Supports on-the-fly compression and deduplication.

Cons:

  • Stability: Still considered less mature than XFS and ZFS; while it has improved significantly, it may not be as stable for all workloads.
  • Performance: Can have variable performance depending on the workload and configuration.

Recommendations

  • For General Use: If you want stability and performance for general workloads, XFS is a great choice.
  • For Data Integrity and Advanced Features: If data integrity, snapshots, and advanced features are critical, ZFS is likely the best option.
  • For Flexibility and Modern Features: If you want modern features like snapshots and subvolumes with some flexibility, consider Btrfs.

Final Thoughts

Evaluate your specific use cases (e.g., data integrity, performance, ease of management) and the resources available on your system (e.g., RAM for ZFS). If possible, testing each file system in a non-production environment can also help you determine which one best meets your needs.

Learning Fundamentals of Linux from scratch day-3-6 : Basic shell commands

11 February 2024 at 22:38

Learning more terminal commands, on Kaniyam- https://kaniyam.com/linux-course-feb-2024/ 

uname prints system information

uname #gives system information
uname -a # all the information a- all
uname -s # only kernel information
uname -r # kernel release type

uptime

uptime #displays time up no of users, load
uptime -p #prettify

Directory commands

pwd #present working directory
mkdir dir1 #make directory directory name
mkdir testing2 testing3 #make multiple directories with name1, name2
mkdir -v folder1 folder3 #v indicates verbose
mkdir {foldr1,foldr2,foldr3} #creates 3 directories if folder already exists, returns message
mkdir -p -v parent/dad/mom #-p creates parent/child/child2 directories
rmdir #only removes only empty directories
cd #change directory
cd .. #go to parent directory
cd #go to home directory
locate filename.txt #returns file location if it exists
updatedb #updatedb utility

who

who #username, terminal, when logged in, ipaddress date tty7 direct, tty2 or 3 virtual terminal
who -r #run level 0-6 (how logged in)

Word count (wc)

wc # number of lines, number of words, number of bytes
wc -l #number of lines
wc -m # number of characters

copy command cp

cp filename1 newfilename #copies a file (creates a backup)
cp filename location #copies to location
cp filename1 filename2 location #
cp -r #perform copy recursively
cp -r foldername destination

Move command mv

mv oldname newname #rename a file
mv filename destination #cut and paste (move)

Piping

cat /filename | wc #pipes the output of  first command as input to second command
grep #pattern matching
cat /filename | grep word_to_search

vim text editor

vim filename.txt #opens a new file in the text editor mode
# i insert text into file, escape,wq! for save and exit
# escape q! quit without writing

find command

find . -name secret.txt #find the file name in this current location '.' indicates current location
find . -type d -name foldername #d indicates that you are finding a document

Environment (env)

env #environment variable
export name= kaniyam #assign value to name variable
printenv name #prints the value of env variable

Diskfree (df)

df # displays diskspace
df -h #human readable
df -h --total s

Less

less filename.txt #reads the file one screen at a time

Sort

sort filename #sort the content in alphabetical order
sort file1.txt > file2.txt #sort output in a new file
sort file1.txt file2.txt #multiple file sorting
sort -u file.txt #remove duplicates in the file

Unique

uniq file.txt #remove redundant string (careful about 'next line characters or whitespaces')
uniq -c file.txt #prints strings and counts

Cut command

cut -c1 file.txt #cut first character from every line
cut -c1-3 #first 3 chars from beginning of the line.

Format command

fmt fie.txt#collecting words and fill as a paragraph
fmt -u file.txt #remove additional whitespaces

Head and Tail commands

head file.txt #first 10 lines of a file
head -n 11 state.txt #-n specifies number of lines
tail file.txt # last 10 lines of a file
tail file.txt | sort
tail -f file.txt #real time log of last line

Numbering

nl sample.txt #number the lines and displays
nl -s ".." file.txt # adds a string after the numbering

Split

split file.txt #split larger file to smaller file
split fsfs.txt #by default it splits at 1k lines
split -l2 file.txt split2op #splits every two lines

last list of users who logged in

last #information about who logged in to the machine
last -5
last -f #list of users who logged out

tac command (opposite of cat)

tac #concatenate and print in reverse order
tac file.txt > file2.txt # reversed order stored in diff file

Translate command (tr)

tr [a-z] [A-Z] #translate from standard input and writes to output
tr [:lower:] [:upper:] < sample.txt > trans.txt #trans;ate from file sample.txt and stores output to trans.txt lower to upper translation

sed command (some simple use cases), there are others

sed 's/unix/linux' sample #filter and transform the text 'search for 'unix' in file and transform it to linux' only first instance
sed 's/unix/linux/g' sample #'g' indicates global
sed 's/unix/linux/gi' sample #'i' for ignore

Paste command

paste file.txt file.txt #joins the file horizontally default delimiter is tab character
paste -d '|' file.txt file.txt #joins with delimiter

❌
❌