Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Parotta Salna
Benefits of Binary Insertion Sort Explained
18 February 2025 at 14:30

Benefits of Binary Insertion Sort Explained

Parotta Salna

By: Mr.ParottaSalna

18 February 2025 at 14:30

Introduction

Binary insertion sort is a sorting algorithm similar to insertion sort, but instead of using linear search to find the position where the element should be inserted, we use binary search.

Thus, we reduce the number of comparisons for inserting one element from O(N) (Time complexity in Insertion Sort) to O(log N).

Best of two worlds

Binary insertion sort is a combination of insertion sort and binary search.

Insertion sort is sorting technique that works by finding the correct position of the element in the array and then inserting it into its correct position. Binary search is searching technique that works by finding the middle of the array for finding the element.

As the complexity of binary search is of logarithmic order, the searching algorithm’s time complexity will also decrease to of logarithmic order. Implementation of binary Insertion sort. this program is a simple Insertion sort program but instead of the standard searching technique binary search is used.

How Binary Insertion Sort works ?

Process flow

In binary insertion sort, we divide the array into two subarrays — sorted and unsorted. The first element of the array is in the sorted subarray, and the rest of the elements are in the unsorted one.

We then iterate from the second element to the last element. For the i-th iteration, we make the current element our “key.” This key is the element that we have to add to our existing sorted subarray.

Example

Consider the array 29, 10, 14, 37, 14

First Pass

Key = 1

Since we consider the first element is in the sorted array, we will be starting from the second element. Then we apply the binary search on the sorted array.

In this scenario, we can see that the middle element in sorted array (29) is greater than the key element 10. So the position of the key element is 0. Then we can shift the remaining elements by 1 position.

Increment the value of key.

Second Pass

Key = 2

Now the key element is 14. We will apply binary search in the sorted array to find the position of the key element.

In this scenario, by applying binary search, we can see key element to be placed at index 1 (between 10 and 29). Then we can shift the remaining elements by 1 position.

Third Pass

Key = 3

Now the key element is 37. We will apply binary search in the sorted array to find the position of the key element.

In this scenario, by applying binary search, we can see key element is placed in its correct position.

Fourth Pass

Key = 4

Now the key element is 14. We will apply binary search in the sorted array to find the position of the key element.

In this scenario, by applying binary search, we can see key element to be placed at index 2 (between 14 and 29). Then we can shift the remaining elements by 1 position.

Now we can see all the elements are sorted.

def binary_search(arr, key, start, end):
    if start == end:
        if arr[start] > key:
            return start
        else:
            return start+1
 
    if start > end:
        return start
 
    mid = (start+end)//2
    if arr[mid] < key:
        return binary_search(arr, key, mid+1, end)
    elif arr[mid] > key:
        return binary_search(arr, key, start, mid-1)
    else:
        return mid
 
def insertion_sort(arr):
    total_num = len(arr)
    for i in range(1, total_num):
        key = arr[i]
        j = binary_search(arr, key, 0, i-1)
        arr = arr[:j] + [key] + arr[j:i] + arr[i+1:]
    return arr
 

sorted_array = insertion_sort([29, 10, 14, 37, 14])
print("Sorted Array : ", sorted_array)

Psuedocode

Consider the array Arr,

Iterate the array from the second element to the last element.
Store the current element Arr[i] in a variable key.
Find the position of the element just greater than Arr[i] in the subarray from Arr[0] to Arr[i-1] using binary search. Say this element is at index pos.
Shift all the elements from index pos to i-1 towards the right.
Arr[pos] = key.

Complexity Analysis

Worst Case

For inserting the i-th element in its correct position in the sorted, finding the position (pos) will take O(log i) steps. However, to insert the element, we need to shift all the elements from pos to i-1. This will take i steps in the worst case (when we have to insert at the starting position).

We make a total of N insertions. so, the worst-case time complexity of binary insertion sort is O(N^2).

This occurs when the array is initially sorted in descending order.

Best Case

The best case will be when the element is already in its sorted position. In this case, we don’t have to shift any of the elements; we can insert the element in O(1).

But we are using binary search to find the position where we need to insert. If the element is already in its sorted position, binary search will take (log i) steps. Thus, for the i-th element, we make (log i) operations, so its best-case time complexity is O(N log N).

This occurs when the array is initially sorted in ascending order.

Average Case

For average-case time complexity, we assume that the elements of the array are jumbled. Thus, on average, we will need O(i /2) steps for inserting the i-th element, so the average time complexity of binary insertion sort is O(N^2).

Space Complexity Analysis

Binary insertion sort is an in-place sorting algorithm. This means that it only requires a constant amount of additional space. We sort the given array by shifting and inserting the elements.

Therefore, the space complexity of this algorithm is O(1) if we use iterative binary search. It will be O(logN) if we use recursive binary search because of O(log N) recursive calls.

Is Binary Insertion Sort a stable algorithm

It is a stable sorting algorithm, the elements with the same values appear in the same order in the final array as they were in the initial array.

Cons and Pros

Binary insertion sort works efficiently for smaller arrays.
This algorithm also works well for almost-sorted arrays, where the elements are near their position in the sorted array.
However, when the size of the array is large, the binary insertion sort doesn’t perform well. We can use other sorting algorithms like merge sort or quicksort in such cases.
Making fewer comparisons is also one of the strengths of this sorting algorithm; therefore, it is efficient to use it when the cost of comparison is high.
Its efficient when the cost of comparison between keys is sufficiently high. For example, if we want to sort an array of strings, the comparison operation of two strings will be high.

Bonus Section

Binary Insertion Sort has a quadratic time complexity just as Insertion Sort. Still, it is usually faster than Insertion Sort in practice, which is apparent when comparison takes significantly more time than swapping two elements.

Parotta Salna
Learning Notes #6 Bloom Filters – A Probabilistic Data Structure
23 December 2024 at 14:24

Learning Notes #6 Bloom Filters – A Probabilistic Data Structure

Parotta Salna

By: Mr.ParottaSalna

23 December 2024 at 14:24

I have came across reading Bloom Filters when i wanted to implement username check likewise in instagram. Today i came back to refresh on bloom filters and note it for my future self.

What is a Bloom Filter ?

A Bloom filter is a space-efficient, probabilistic data structure designed to test whether an element is part of a set. It can return two types of results

True: The element is probably in the set.
False: The element is definitely not in the set.

Notably, Bloom filters do not store the actual elements themselves, and there is a chance of false positives, but never false negatives.

If it says, the given word is not present then we can be 100% sure about it. This is the benefit we are getting out of Bloom Filters.

But setting up a bloom filter is not an easy task. You will soon get to know.

How Does a Bloom Filter Work?

A Bloom filter uses a bit array of size and independent hash functions. Here’s how it operates,

Adding an Element
- Compute the hash values for the element for each hash functions.
- Map these hash values to positions in the bit array.
- Set the corresponding bits to 1.
Querying an Element
- Compute the hash values for the element for each hash functions.
- Check the corresponding bits in the bit array.
- If all bits are 1, the element is probably in the set. If any bit is 0, the element is definitely not in the set.

As you can imagine, when we are continously adding element to array (considering the array size is smaller), then the percentage of false positives will increase. On the other hand choosing the correct numbers of hash functions also matters.

Setting Parameters

To effectively use a Bloom filter, it’s important to set the parameters appropriately

Bit Array Size (m):
- The size of the bit array determines the capacity and accuracy of the filter.
- A larger m reduces the false positive rate but requires more memory.
Number of Hash Functions (k):
- The number of hash functions affects the distribution of bits set to 1.
- An optimal k minimizes the false positive rate for a given m and number of elements (n).
Number of Elements (n):
- Estimate the number of elements to be stored to configure m and k appropriately.

Someone derived a formula

Bit Array Size

The false positive rate represents the probability that a non-existing element is incorrectly identified as present in the Bloom filter. It depends on the size of the bit array (m), the number of hash functions (k), and the number of elements inserted (n). To achieve a desired false positive rate, we can calculate the optimal bit array size using the formula

Here, p denotes the desired false positive rate.

Optimal Number of Hash Functions

The optimal number of hash functions (k) is determined by the size of the bit array and the number of elements to be inserted. It can be calculated using the formula

This ensures an equal distribution of hash values across the bit array, minimizing collisions and maximizing the accuracy of the filter.

Probability of False Positives

The probability of false positives (P_fp) is influenced by the number of hash functions (k), the bit array size (m), and the number of elements inserted (n). It can be estimated using the formula.

Putting all together (Python Code)

Setting the fpr (false positive rate) to 0.1 %, let’s calculate bit array size, no. of hash functions.


import math

# Expected number of items in the collection
n = 300_000

# Acceptable false-positive rate (0.01 = 1%)
fpr = 0.01

# Optimal size (number of elements in the bit array)
# m = -((n * ln(p)) / (ln(2)^2))
m = -(n * math.log(fpr)) / (math.log(2) ** 2)

# Optimal number of hash functions
# k = (m / n) * ln(2)
k = (m / n) * math.log(2)

print(f"Optimal Bloom filter size: {math.ceil(m)} bits")
print(f"Optimal number of hash functions: {math.ceil(k)}")

Practical Considerations

Hash Functions:
- Choose independent and uniformly distributed hash functions to minimize collisions.
- Common choices include MurmurHash and FNV.
Performance:
- More hash functions increase computational cost but can reduce the false positive rate.
- Balance the number of hash functions to achieve acceptable performance.
Capacity Planning:
- Overestimating n leads to wasted space; underestimating increases the false positive rate.
- Plan for future growth to maintain efficiency.

Online Calculator : https://hur.st/bloomfilter/?utm_source=parottasalna.com